ARIMA - Practical Examples

Dataset

Create a synthetic time series representing a team’s performance index over 100 matches.

This includes:

  • A linear upward trend (improving performance over time),
  • Seasonality (to mimic cyclical effects such as home/away or opponent quality fluctuations), and
  • Random noise.

At match 60 we simulate an intervention (e.g., a new training program) that is assumed to boost performance by 5%.

Code
# Total number of matches (time points)
n <- 100
matches <- 1:n

# Baseline performance: a constant plus a linear trend and seasonal (cyclical) component.
baseline <- 50 + 0.15 * matches + 5 * sin(2 * pi * matches / 12)

# Add random noise
noise <- rnorm(n, mean = 0, sd = 3)
performance <- baseline + noise

# Create a data frame for the performance series
data <- data.frame(Match = matches, Performance = performance)

# Create an intervention variable: before match 60 = 0, from match 60 onward = 1
data$Intervention <- ifelse(data$Match >= 60, 1, 0)

# Apply a 5% performance boost from match 60 onward (this is our simulated intervention effect)
data$Performance[data$Match >= 60] <- data$Performance[data$Match >= 60] * 1.05

# Plot the synthetic performance series
ggplot(data, aes(x = Match, y = Performance)) +
  geom_line(color = "blue") +
  geom_point() +
  geom_vline(xintercept = 60, linetype = "dashed", color = "red") +
  ggtitle("Synthetic Team Performance Index (with Intervention at Match 60)") +
  theme_minimal()

Forecasting Future Performance with ARIMA

  • ARIMA (AutoRegressive Integrated Moving Average) is widely used for time series forecasting.

  • I convert performance data into a time series (with a seasonal frequency of 12) and use auto.arima() to select a suitable model.

  • Then, forecast team’s performance for next 10 matches.

Code
# Convert performance data into a time series object (frequency = 12 to capture seasonality)
ts_perf <- ts(data$Performance, frequency = 12)

# Fit an ARIMA model automatically
fit <- auto.arima(ts_perf)
summary(fit)
Series: ts_perf 
ARIMA(0,0,0)(2,1,0)[12] with drift 

Coefficients:
         sar1     sar2   drift
      -0.7743  -0.3114  0.2025
s.e.   0.1093   0.1179  0.0158

sigma^2 = 11.63:  log likelihood = -235.08
AIC=478.16   AICc=478.65   BIC=488.07

Training set error measures:
                     ME     RMSE      MAE        MPE     MAPE      MASE
Training set -0.1182734 3.143716 2.312712 -0.4820955 3.931975 0.5655298
                    ACF1
Training set 0.002584806
Code
# Forecast performance for the next 10 matches
horizon <- 10
fc <- forecast(fit, h = horizon)

# Plot the forecast
autoplot(fc) +
  ggtitle("ARIMA Forecast of Future Performance") +
  theme_minimal()

Explanation

Model Fitting:

  • ARIMA models combine autoregressive (AR) terms, differencing (I), and moving average (MA) terms to capture various time series behaviors.

Forecasting:

  • The forecast gives both point estimates and prediction intervals, which can be interpreted as future performance predictions.

Anomaly Detection with ARIMA Residuals

Once an ARIMA model is fitted, the residuals (rem: differences between actual and fitted values) can indicate anomalies; match results that deviate significantly from model’s expectation.

Code
# Compute fitted values and residuals
data$Fitted <- fitted(fit)
data$Residuals <- data$Performance - data$Fitted

# Plot actual vs. fitted performance
ggplot(data, aes(x = Match)) +
  geom_line(aes(y = Performance, color = "Actual")) +
  geom_line(aes(y = Fitted, color = "Fitted"), linetype = "dashed") +
  ggtitle("Actual vs Fitted Performance") +
  scale_color_manual(values = c("Actual" = "blue", "Fitted" = "red")) +
  theme_minimal()

Code
# Calculate residual standard deviation
res_sd <- sd(data$Residuals, na.rm = TRUE)

# Flag anomalies (residuals with absolute value greater than 2 standard deviations)
data$Anomaly <- abs(data$Residuals) > 2 * res_sd

# Plot residuals and highlight anomalies
ggplot(data, aes(x = Match, y = Residuals)) +
  geom_line() +
  geom_point(aes(color = Anomaly), size = 2) +
  ggtitle("ARIMA Residuals & Anomaly Detection") +
  scale_color_manual(values = c("FALSE" = "black", "TRUE" = "orange")) +
  theme_minimal()
Don't know how to automatically pick scale for object of type <ts>. Defaulting
to continuous.

Explanation

Residual Analysis:

  • The residuals should be randomly distributed. Large residuals (beyond 2 standard deviations) indicate matches where performance was unusually high/low. These are considered ‘anomalies’.

Modeling Future Effect of an Intervention with ARIMAX

To forecast the future effect of a persistent intervention, we can use ARIMAX model, an ARIMA model that includes exogenous variables.

Here I include an intervention variable (which is 1 after match 60) as an exogenous regressor and forecast future performance (assuming intervention continues).

Code
# Fit ARIMAX model using intervention variable as exogenous regressor
fit_arimax <- auto.arima(ts_perf, xreg = data$Intervention)
summary(fit_arimax)
Series: ts_perf 
Regression with ARIMA(1,0,0) errors 

Coefficients:
         ar1  intercept     xreg
      0.6055    54.9035  10.7108
s.e.  0.0799     1.2529   1.8796

sigma^2 = 16.01:  log likelihood = -279.25
AIC=566.5   AICc=566.92   BIC=576.92

Training set error measures:
                     ME     RMSE      MAE        MPE     MAPE      MASE
Training set 0.03191643 3.940303 3.143826 -0.4278754 5.432209 0.7687629
                    ACF1
Training set -0.02935568
Code
# Forecast future performance for next 10 matches
# For forecast assume that intervention persists (exogenous variable = 1)
future_intervention <- rep(1, horizon)
fc_arimax <- forecast(fit_arimax, h = horizon, xreg = future_intervention)

# Plot ARIMAX forecast
autoplot(fc_arimax) +
  ggtitle("ARIMAX Forecast with Sustained Intervention") +
  theme_minimal()

Explanation

Exogenous Regressor

  • The intervention variable is added to the ARIMA model so the model can explicitly account for the change in performance due to the intervention.

Forecasting with Intervention

  • When forecasting, we specify that the intervention continues (set to 1), and the model projects future performance accordingly.

Retrospective Analysis: Assessing Intervention Effect

Compare the ARIMA model’s predictions (what would have been expected without the intervention) to the actual performance following the intervention.

The difference between predicted and actual performance quantifies the impact of the intervention.

Code
# Use match indices (instead of window() with time) for clarity:
# Pre-intervention data (matches 1 - 59)
pre_intervention_data <- data$Performance[1:59]
fit_pre <- auto.arima(pre_intervention_data)

# Forecast performance for matches 60 to 100 (i.e., 41 matches)
h_retrospective <- 100 - 59  # 41 matches
fc_pre <- forecast(fit_pre, h = h_retrospective)

# Extract predicted performance for matches 60 to 100
predicted_pre <- fc_pre$mean

# Actual performance for matches 60 to 100 from our dataset
actual_post <- data$Performance[60:100]

# Calculate intervention effect (difference: Actual - Predicted)
intervention_effect <- actual_post - predicted_pre

# Create data frame summarising effect
effect_df <- data.frame(
  Match = 60:100,
  Predicted = as.numeric(predicted_pre),
  Actual = as.numeric(actual_post),
  Effect = as.numeric(intervention_effect)
)

# Plot the intervention effect
ggplot(effect_df, aes(x = Match, y = Effect)) +
  geom_line(color = "purple") +
  geom_hline(yintercept = 0, linetype = "dashed", color = "grey") +
  ggtitle("Difference Between Predicted (No Intervention) and Actual Performance") +
  ylab("Intervention Effect (Actual - Predicted)") +
  theme_minimal()

Code
# Display summary statistics of the intervention effect
summary(effect_df$Effect)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.1554  5.2648  9.0076  9.2010 12.4020 20.5878 

Explanation

Retrospective Forecasting:

  • We use only the pre-intervention data (matches 1–59) to forecast what performance would have been had the intervention not occurred.

Intervention Effect:

  • By comparing these “no-intervention” forecasts to the actual performance post-intervention, we quantify the effect (improvement or otherwise) attributable to the intervention.

Interpretation:

  • A positive effect indicates that performance was higher than expected (consistent with a beneficial intervention).