Understanding Time Series Data

B1705, Week Seven

What is Time Series Data?

  • Data collected or recorded in time order.
  • Allows us to examine trends, seasonality, and random fluctuations.
  • Used for historical analysis, pattern recognition, and forecasting.

What is Time Series Data?

  • It must have a timestamp OR be collected at regular intervals.
  • Otherwise it is considered cross-sectional, not time series.

Time series data is unique because observations are not independent but depend on previous values.

Why time matters

  • Thinking about time in data is really important, though (in my opinion) very often overlooked in sport data analytics.

  • Ignoring the fact that data is a time-series can lead to several significant issues and risks:

Misinterpretation of data

  • Without acknowledging the time-dependent nature of data, you might draw incorrect conclusions.

  • For example, in sports science, ignoring time-series can lead to misjudging an athlete’s performance improvement or decline.

Inaccurate predictions

  • TSA often involves forecasting future values based on past trends.

  • Ignoring the sequential nature of data can result in unreliable and inaccurate predictions, leading to poor decision-making.

Overlooking seasonality and trends

  • Many datasets in sport exhibit seasonal patterns or trends over time.

  • Ignoring these elements can cause a failure to recognise important cyclical behaviors, such as seasonal peaks in athlete performances.

Failure to identify causal relationships

  • Time series data can help in identifying causal relationships.

  • Ignoring the time aspect might lead to overlooking these relationships, potentially leading to ineffective strategies or interventions.

Statistical analysis errors

  • Many statistical tests assume independence of observations.

  • Applying these tests without considering the time component can lead to erroneous statistical inferences.

Components of Time Series Data

Trend

Definition

A trend in time series analysis refers to the long-term movement of data over time, indicating a persistent increase, decrease, or stability in the mean value of a series.

Time Series with Trend

Time Series with No Trend

Types of trend

Seasonality

Definition

  • Recurring patterns in time series.
  • Example: US airline ticket sales peaking in December.

Noise in Time Series

Definition

  • Random fluctuations that obscure patterns.
  • Can be reduced with smoothing techniques.

Visualising Time Series

Moving averages

  • Moving averages smooth out short-term fluctuations in time-series data to identify longer-term trends, like averaging your daily steps over a week to understand your general activity level.
  • In the following figure, the raw data is in blue and the moving average is in red.

Exponential Smoothing

  • This technique gives more weight to recent observations while smoothing time series data, like a weighted average where recent team scores matter more than older ones.
  • In the following figure, the data is in blue and the exponential smoothing is in red.

Autocorrelation (ACF)

  • Shows correlation with past values (lags).

Seasonal Decomposition

Autocorrelation in Time Series

Autocorrelation Function

Measures the correlation between a time series and its past values at different lags, helping identify repeating patterns, seasonality, and persistence in the data.

Time Lag Effects

  • The delayed influence of past observations on current values in a time series.

  • They help identify dependencies over different time steps, such as how a value today may be correlated with values from previous days, months, or years

Partial Autocorrelation Function

  • PACF measures the direct relationship between a time series and its lagged values, removing the influence of intermediate lags.

  • Unlike ACF, which captures both direct and indirect correlations, PACF isolates the pure effect of each lag.

  • PACF is useful for identifying the order of autoregressive (AR) processes in a time series model.

ACF and PACF

  • ACF = Looks at everything, even indirect influence (like Gran → Mum → You → Son).

  • PACF = Only looks at direct influence (like You → Son, ignoring the middle).

  • PACF helps us figure out which time steps actually matter, without getting distracted by passed-down effects.

PACF (Partial Autocorrelation)

Forecasting

  • Forecasting in time-series analysis means predicting future data points based on past trends, like using past weather patterns to forecast tomorrow’s weather.
  • For example, one model we often use is ARIMA (Autoregressive Integrated Moving Average):
    • ARIMA is a complex forecasting method for time series data that combines trends, seasonality, and other factors to predict future points.
    • Here’s an example where the existing data is shown in blue, and the forecast can be seen at the far right:

Conclusion

  • Time series analysis helps understand historical trends and make predictions.
  • Key concepts: Trend, seasonality, noise, autocorrelation.
  • Visualisation techniques: Line plots, ACF, PACF, decomposition.

Preparing Data for Time Series Analysis

TSA in R

Now, we’ll briefly review how to prepare for TSA in R.

Step One: We load the data into a dataframe data

Load and inspect data
data <- read.csv('https://www.dropbox.com/scl/fi/755z2zrppejfazkun5h7t/tsa_01.csv?rlkey=e5welqld5idyeb0ccwa44whfj&dl=1')
head(data)
  X       Date     Value
1 1 2018-01-01 0.3197622
2 2 2018-02-01 0.9509367
3 3 2018-03-01 2.0793542
4 4 2018-04-01 1.3012796
5 5 2018-05-01 1.0646439
6 6 2018-06-01 1.4575325

Step Two: Convert the data to a time series object ts_data

R needs to be told that our data is in the form of a time-series.

We have to convert it in order to use the time series functions. ts is often used for this purpose.

# Convert the data to a time series object called ts_data
ts_data <- ts(data$Value, start = c(2018, 1), frequency = 12)

Step Three: Plot the data and apply a simple moving average

We can now visually inspect our time series data. The moving average shows the general pattern within our data.

Show code
# Plot the time series data
plot(ts_data, main="Time Series Data", xlab="Time", ylab="Value")

# Apply a simple moving average
library(stats)
moving_avg <- filter(ts_data, rep(1/5, 5), sides = 2)
lines(moving_avg, col="red")

Step Four: Use ARIMA to forecast the future

Time series analysis is often used to forecast what might happen in the future. We can use R to do this, for example by using the ARIMA model:

Show code
# Forecasting using ARIMA
library(forecast)
arima_model <- auto.arima(ts_data)
forecasted_data <- forecast(arima_model, h=12) # forecast for the next year
# Plot the forecast
plot(forecasted_data)