Load and inspect data
X Date Value
1 1 2018-01-01 0.3197622
2 2 2018-02-01 0.9509367
3 3 2018-03-01 2.0793542
4 4 2018-04-01 1.3012796
5 5 2018-05-01 1.0646439
6 6 2018-06-01 1.4575325
Time series data is unique because observations are not independent but depend on previous values.
Thinking about time in data is really important, though (in my opinion) very often overlooked in sport data analytics.
Ignoring the fact that data is a time-series can lead to several significant issues and risks:
Misinterpretation of data
Without acknowledging the time-dependent nature of data, you might draw incorrect conclusions.
For example, in sports science, ignoring time-series can lead to misjudging an athlete’s performance improvement or decline.
Inaccurate predictions
TSA often involves forecasting future values based on past trends.
Ignoring the sequential nature of data can result in unreliable and inaccurate predictions, leading to poor decision-making.
Overlooking seasonality and trends
Many datasets in sport exhibit seasonal patterns or trends over time.
Ignoring these elements can cause a failure to recognise important cyclical behaviors, such as seasonal peaks in athlete performances.
Failure to identify causal relationships
Time series data can help in identifying causal relationships.
Ignoring the time aspect might lead to overlooking these relationships, potentially leading to ineffective strategies or interventions.
Statistical analysis errors
Many statistical tests assume independence of observations.
Applying these tests without considering the time component can lead to erroneous statistical inferences.
A trend in time series analysis refers to the long-term movement of data over time, indicating a persistent increase, decrease, or stability in the mean value of a series.
Measures the correlation between a time series and its past values at different lags, helping identify repeating patterns, seasonality, and persistence in the data.
The delayed influence of past observations on current values in a time series.
They help identify dependencies over different time steps, such as how a value today may be correlated with values from previous days, months, or years
PACF measures the direct relationship between a time series and its lagged values, removing the influence of intermediate lags.
Unlike ACF, which captures both direct and indirect correlations, PACF isolates the pure effect of each lag.
PACF is useful for identifying the order of autoregressive (AR) processes in a time series model.
ACF = Looks at everything, even indirect influence (like Gran → Mum → You → Son).
PACF = Only looks at direct influence (like You → Son, ignoring the middle).
PACF helps us figure out which time steps actually matter, without getting distracted by passed-down effects.
Now, we’ll briefly review how to prepare for TSA in R.
data
X Date Value
1 1 2018-01-01 0.3197622
2 2 2018-02-01 0.9509367
3 3 2018-03-01 2.0793542
4 4 2018-04-01 1.3012796
5 5 2018-05-01 1.0646439
6 6 2018-06-01 1.4575325
ts_data
R needs to be told that our data is in the form of a time-series.
We have to convert it in order to use the time series functions. ts
is often used for this purpose.
We can now visually inspect our time series data. The moving average shows the general pattern within our data.
Time series analysis is often used to forecast what might happen in the future. We can use R to do this, for example by using the ARIMA model: