Create a synthetic time series representing a team’s performance index over 100 matches.
This includes:
A linear upward trend (improving performance over time),
Seasonality (to mimic cyclical effects such as home/away or opponent quality fluctuations), and
Random noise.
At match 60 we simulate an intervention (e.g., a new training program) that is assumed to boost performance by 5%.
Code
# Total number of matches (time points)n <-100matches <-1:n# Baseline performance: a constant plus a linear trend and seasonal (cyclical) component.baseline <-50+0.15* matches +5*sin(2* pi * matches /12)# Add random noisenoise <-rnorm(n, mean =0, sd =3)performance <- baseline + noise# Create a data frame for the performance seriesdata <-data.frame(Match = matches, Performance = performance)# Create an intervention variable: before match 60 = 0, from match 60 onward = 1data$Intervention <-ifelse(data$Match >=60, 1, 0)# Apply a 5% performance boost from match 60 onward (this is our simulated intervention effect)data$Performance[data$Match >=60] <- data$Performance[data$Match >=60] *1.05# Plot the synthetic performance seriesggplot(data, aes(x = Match, y = Performance)) +geom_line(color ="blue") +geom_point() +geom_vline(xintercept =60, linetype ="dashed", color ="red") +ggtitle("Synthetic Team Performance Index (with Intervention at Match 60)") +theme_minimal()
Forecasting Future Performance with ARIMA
ARIMA (AutoRegressive Integrated Moving Average) is widely used for time series forecasting.
I convert performance data into a time series (with a seasonal frequency of 12) and use auto.arima() to select a suitable model.
Then, forecast team’s performance for next 10 matches.
Code
# Convert performance data into a time series object (frequency = 12 to capture seasonality)ts_perf <-ts(data$Performance, frequency =12)# Fit an ARIMA model automaticallyfit <-auto.arima(ts_perf)summary(fit)
Series: ts_perf
ARIMA(0,0,0)(2,1,0)[12] with drift
Coefficients:
sar1 sar2 drift
-0.7743 -0.3114 0.2025
s.e. 0.1093 0.1179 0.0158
sigma^2 = 11.63: log likelihood = -235.08
AIC=478.16 AICc=478.65 BIC=488.07
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set -0.1182734 3.143716 2.312712 -0.4820955 3.931975 0.5655298
ACF1
Training set 0.002584806
Code
# Forecast performance for the next 10 matcheshorizon <-10fc <-forecast(fit, h = horizon)# Plot the forecastautoplot(fc) +ggtitle("ARIMA Forecast of Future Performance") +theme_minimal()
Explanation
Model Fitting:
ARIMA models combine autoregressive (AR) terms, differencing (I), and moving average (MA) terms to capture various time series behaviors.
Forecasting:
The forecast gives both point estimates and prediction intervals, which can be interpreted as future performance predictions.
Anomaly Detection with ARIMA Residuals
Once an ARIMA model is fitted, the residuals (rem: differences between actual and fitted values) can indicate anomalies; match results that deviate significantly from model’s expectation.
Code
# Compute fitted values and residualsdata$Fitted <-fitted(fit)data$Residuals <- data$Performance - data$Fitted# Plot actual vs. fitted performanceggplot(data, aes(x = Match)) +geom_line(aes(y = Performance, color ="Actual")) +geom_line(aes(y = Fitted, color ="Fitted"), linetype ="dashed") +ggtitle("Actual vs Fitted Performance") +scale_color_manual(values =c("Actual"="blue", "Fitted"="red")) +theme_minimal()
Code
# Calculate residual standard deviationres_sd <-sd(data$Residuals, na.rm =TRUE)# Flag anomalies (residuals with absolute value greater than 2 standard deviations)data$Anomaly <-abs(data$Residuals) >2* res_sd# Plot residuals and highlight anomaliesggplot(data, aes(x = Match, y = Residuals)) +geom_line() +geom_point(aes(color = Anomaly), size =2) +ggtitle("ARIMA Residuals & Anomaly Detection") +scale_color_manual(values =c("FALSE"="black", "TRUE"="orange")) +theme_minimal()
Don't know how to automatically pick scale for object of type <ts>. Defaulting
to continuous.
Explanation
Residual Analysis:
The residuals should be randomly distributed. Large residuals (beyond 2 standard deviations) indicate matches where performance was unusually high/low. These are considered ‘anomalies’.
Modeling Future Effect of an Intervention with ARIMAX
To forecast the future effect of a persistent intervention, we can use ARIMAX model, an ARIMA model that includes exogenous variables.
Here I include an intervention variable (which is 1 after match 60) as an exogenous regressor and forecast future performance (assuming intervention continues).
Code
# Fit ARIMAX model using intervention variable as exogenous regressorfit_arimax <-auto.arima(ts_perf, xreg = data$Intervention)summary(fit_arimax)
Series: ts_perf
Regression with ARIMA(1,0,0) errors
Coefficients:
ar1 intercept xreg
0.6055 54.9035 10.7108
s.e. 0.0799 1.2529 1.8796
sigma^2 = 16.01: log likelihood = -279.25
AIC=566.5 AICc=566.92 BIC=576.92
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set 0.03191643 3.940303 3.143826 -0.4278754 5.432209 0.7687629
ACF1
Training set -0.02935568
Code
# Forecast future performance for next 10 matches# For forecast assume that intervention persists (exogenous variable = 1)future_intervention <-rep(1, horizon)fc_arimax <-forecast(fit_arimax, h = horizon, xreg = future_intervention)# Plot ARIMAX forecastautoplot(fc_arimax) +ggtitle("ARIMAX Forecast with Sustained Intervention") +theme_minimal()
Explanation
Exogenous Regressor
The intervention variable is added to the ARIMA model so the model can explicitly account for the change in performance due to the intervention.
Forecasting with Intervention
When forecasting, we specify that the intervention continues (set to 1), and the model projects future performance accordingly.
Compare the ARIMA model’s predictions (what would have been expected without the intervention) to the actual performance following the intervention.
The difference between predicted and actual performance quantifies the impact of the intervention.
Code
# Use match indices (instead of window() with time) for clarity:# Pre-intervention data (matches 1 - 59)pre_intervention_data <- data$Performance[1:59]fit_pre <-auto.arima(pre_intervention_data)# Forecast performance for matches 60 to 100 (i.e., 41 matches)h_retrospective <-100-59# 41 matchesfc_pre <-forecast(fit_pre, h = h_retrospective)# Extract predicted performance for matches 60 to 100predicted_pre <- fc_pre$mean# Actual performance for matches 60 to 100 from our datasetactual_post <- data$Performance[60:100]# Calculate intervention effect (difference: Actual - Predicted)intervention_effect <- actual_post - predicted_pre# Create data frame summarising effecteffect_df <-data.frame(Match =60:100,Predicted =as.numeric(predicted_pre),Actual =as.numeric(actual_post),Effect =as.numeric(intervention_effect))# Plot the intervention effectggplot(effect_df, aes(x = Match, y = Effect)) +geom_line(color ="purple") +geom_hline(yintercept =0, linetype ="dashed", color ="grey") +ggtitle("Difference Between Predicted (No Intervention) and Actual Performance") +ylab("Intervention Effect (Actual - Predicted)") +theme_minimal()
Code
# Display summary statistics of the intervention effectsummary(effect_df$Effect)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.1554 5.2648 9.0076 9.2010 12.4020 20.5878
Explanation
Retrospective Forecasting:
We use only the pre-intervention data (matches 1–59) to forecast what performance would have been had the intervention not occurred.
Intervention Effect:
By comparing these “no-intervention” forecasts to the actual performance post-intervention, we quantify the effect (improvement or otherwise) attributable to the intervention.
Interpretation:
A positive effect indicates that performance was higher than expected (consistent with a beneficial intervention).