Understanding Time Series Data

B1705, Week Seven

What is Time Series Data?

Data collected or recorded in time order.
Allows us to examine trends, seasonality, and random fluctuations.
Used for historical analysis, pattern recognition, and forecasting.

What is Time Series Data?

It must have a timestamp OR be collected at regular intervals.
Otherwise it is considered cross-sectional, not time series.

Time series data is unique because observations are not independent but depend on previous values.

Why time matters

Thinking about time in data is really important, though (in my opinion) very often overlooked in sport data analytics.
Ignoring the fact that data is a time-series can lead to several significant issues and risks:

Misinterpretation of data

Without acknowledging the time-dependent nature of data, you might draw incorrect conclusions.
For example, in sports science, ignoring time-series can lead to misjudging an athlete’s performance improvement or decline.

Inaccurate predictions

TSA often involves forecasting future values based on past trends.
Ignoring the sequential nature of data can result in unreliable and inaccurate predictions, leading to poor decision-making.

Overlooking seasonality and trends

Many datasets in sport exhibit seasonal patterns or trends over time.
Ignoring these elements can cause a failure to recognise important cyclical behaviors, such as seasonal peaks in athlete performances.

Failure to identify causal relationships

Time series data can help in identifying causal relationships.
Ignoring the time aspect might lead to overlooking these relationships, potentially leading to ineffective strategies or interventions.

Statistical analysis errors

Many statistical tests assume independence of observations.
Applying these tests without considering the time component can lead to erroneous statistical inferences.

Components of Time Series Data

Trend

Definition

A trend in time series analysis refers to the long-term movement of data over time, indicating a persistent increase, decrease, or stability in the mean value of a series.

Time Series with Trend

Time Series with No Trend

Types of trend

Seasonality

Definition

Recurring patterns in time series.
Example: US airline ticket sales peaking in December.

Noise in Time Series

Definition

Random fluctuations that obscure patterns.
Can be reduced with smoothing techniques.

Visualising Time Series

Line Plots for Trends

Moving averages

Moving averages smooth out short-term fluctuations in time-series data to identify longer-term trends, like averaging your daily steps over a week to understand your general activity level.
In the following figure, the raw data is in blue and the moving average is in red.

Exponential Smoothing

This technique gives more weight to recent observations while smoothing time series data, like a weighted average where recent team scores matter more than older ones.
In the following figure, the data is in blue and the exponential smoothing is in red.

Autocorrelation (ACF)

Shows correlation with past values (lags).

Seasonal Decomposition

Autocorrelation in Time Series

Autocorrelation Function

Measures the correlation between a time series and its past values at different lags, helping identify repeating patterns, seasonality, and persistence in the data.

Time Lag Effects

The delayed influence of past observations on current values in a time series.
They help identify dependencies over different time steps, such as how a value today may be correlated with values from previous days, months, or years

Partial Autocorrelation Function

PACF measures the direct relationship between a time series and its lagged values, removing the influence of intermediate lags.
Unlike ACF, which captures both direct and indirect correlations, PACF isolates the pure effect of each lag.
PACF is useful for identifying the order of autoregressive (AR) processes in a time series model.

ACF and PACF

ACF = Looks at everything, even indirect influence (like Gran → Mum → You → Son).
PACF = Only looks at direct influence (like You → Son, ignoring the middle).
PACF helps us figure out which time steps actually matter, without getting distracted by passed-down effects.

PACF (Partial Autocorrelation)

Forecasting

Forecasting in time-series analysis means predicting future data points based on past trends, like using past weather patterns to forecast tomorrow’s weather.

For example, one model we often use is ARIMA (Autoregressive Integrated Moving Average):
- ARIMA is a complex forecasting method for time series data that combines trends, seasonality, and other factors to predict future points.
- Here’s an example where the existing data is shown in blue, and the forecast can be seen at the far right:

Conclusion

Time series analysis helps understand historical trends and make predictions.
Key concepts: Trend, seasonality, noise, autocorrelation.
Visualisation techniques: Line plots, ACF, PACF, decomposition.

Preparing Data for Time Series Analysis

TSA in R

Now, we’ll briefly review how to prepare for TSA in R.

Step One: We load the data into a dataframe `data`

Load and inspect data

data <- read.csv('https://www.dropbox.com/scl/fi/755z2zrppejfazkun5h7t/tsa_01.csv?rlkey=e5welqld5idyeb0ccwa44whfj&dl=1')
head(data)

  X       Date     Value
1 1 2018-01-01 0.3197622
2 2 2018-02-01 0.9509367
3 3 2018-03-01 2.0793542
4 4 2018-04-01 1.3012796
5 5 2018-05-01 1.0646439
6 6 2018-06-01 1.4575325

Step Two: Convert the data to a time series object `ts_data`

R needs to be told that our data is in the form of a time-series.

We have to convert it in order to use the time series functions. ts is often used for this purpose.

# Convert the data to a time series object called ts_data
ts_data <- ts(data$Value, start = c(2018, 1), frequency = 12)

Step Three: Plot the data and apply a simple moving average

We can now visually inspect our time series data. The moving average shows the general pattern within our data.

Show code

# Plot the time series data
plot(ts_data, main="Time Series Data", xlab="Time", ylab="Value")

# Apply a simple moving average
library(stats)
moving_avg <- filter(ts_data, rep(1/5, 5), sides = 2)
lines(moving_avg, col="red")

Step Four: Use ARIMA to forecast the future

Time series analysis is often used to forecast what might happen in the future. We can use R to do this, for example by using the ARIMA model:

Show code

# Forecasting using ARIMA
library(forecast)
arima_model <- auto.arima(ts_data)
forecasted_data <- forecast(arima_model, h=12) # forecast for the next year

# Plot the forecast
plot(forecasted_data)

Understanding Time Series Data

What is Time Series Data?

What is Time Series Data?

Why time matters

Components of Time Series Data

Trend

Definition

Time Series with Trend

Time Series with No Trend

Types of trend

Seasonality

Definition

Noise in Time Series

Definition

Visualising Time Series

Line Plots for Trends

Moving averages

Exponential Smoothing

Autocorrelation (ACF)

Seasonal Decomposition

Autocorrelation in Time Series

Autocorrelation Function

Time Lag Effects

Partial Autocorrelation Function

ACF and PACF

PACF (Partial Autocorrelation)

Forecasting

Conclusion

Preparing Data for Time Series Analysis

TSA in R

Step One: We load the data into a dataframe data

Step Two: Convert the data to a time series object ts_data

Step Three: Plot the data and apply a simple moving average

Step Four: Use ARIMA to forecast the future

Step One: We load the data into a dataframe `data`

Step Two: Convert the data to a time series object `ts_data`