Seasonal and Panel Data Models

B1705, Week Eight

Introduction

Overview

  • Understanding seasonal and panel data models.
  • Explore core concepts with visual demonstrations.
  • Implement SARIMA and panel data models in R.

Conceptual Overview

What is Seasonality?

Definition: A repeating pattern at fixed intervals in a time series.

Common Examples:

  • Retail sales peak every December.
  • Ice cream sales increase in summer.
  • Electricity consumption varies seasonally.

Why does it matter?

  • Ignoring seasonality can lead to misleading trends and poor forecasting.
  • Must be explicitly modeled for accurate predictive insights.

Visualising Seasonal Patterns in Data

We look for periodic cycles, indicating seasonality.

Decomposing Time Series

Example

Code
decomposed <- decompose(ts_data)
autoplot(decomposed)

Why decompose?

  • To separate trend, seasonality, and random noise.

  • Helps diagnose whether differencing or transformation is required.

  • Facilitates clearer insights into the underlying structure of the data:

Elements of decomposition

  • Trend: Long-term progression or decline in data.

  • Seasonality: Regularly repeating fluctuations at fixed intervals.

  • Noise: Random fluctuations around the trend and seasonal components.

Uses of decomposition

Enhance model selection accuracy by clarifying:

  • Whether a time series model should include seasonal differencing.
  • Whether transformations (e.g., logarithmic) are necessary to stabilise variance.

It improves forecasting accuracy by clearly identifying and modeling individual components separately.

Types of decomposition

  • Additive model: When seasonal effects remain constant over time.
  • Multiplicative model: When seasonal effects grow with the trend.
  • Implications for forecasting: Models must be adjusted accordingly.

What do we do next - Trend?

  • Represents the long-term direction (growth, decline, or stability).
  • Analyse for structural changes or shifts in long-term patterns.
  • Model using trend-adjusted methods or regression-based forecasting.
  • Decide if differencing is needed to remove the trend (for ARIMA models).

What do we do next - Seasonal component

  • Captures regular, repeating fluctuations.
  • Incorporate explicitly into seasonal models (e.g., SARIMA or seasonal regression).
  • Determine if seasonal differencing is required (e.g., removing seasonal cycles).
  • Inform adjustments like seasonal indexing or seasonal smoothing.

What do we do next - Noise (Residual) component

  • Represents random variation, unexplained by trend or seasonality. Next steps:
  • Check if residuals resemble white noise using diagnostic tests (e.g., Ljung-Box test).
  • Identify outliers or unusual events.
  • Improve model accuracy by minimising variance of these residuals through model refinement.

Seasonal ARIMA (SARIMA)

What is ARIMA?

Autoregressive Integrated Moving Average (ARIMA) models non-seasonal time series.

Components:

  • AR (\(p\)): Influence of past values.
  • I (\(d\)): Differencing to remove trends.
  • MA (\(q\)): Past errors influence future values.

What is SARIMA?

Seasonal ARIMA (SARIMA) extends ARIMA by incorporating:

  • Seasonal AR (\(P\)): Accounts for seasonal dependencies.
  • Seasonal Differencing (\(D\)): Removes repeating seasonal patterns.
  • Seasonal MA (\(Q\)): Captures seasonal moving averages.
  • Period (\(s\)): Defines the seasonal cycle (e.g., 12 for monthly data).

Fitting a SARIMA Model in R

sarima_model <- auto.arima(ts_data, seasonal = TRUE)
summary(sarima_model)
Series: ts_data 
ARIMA(0,0,0)(0,1,2)[12] 

Coefficients:
         sma1    sma2
      -1.0450  0.1803
s.e.   0.1534  0.1146

sigma^2 = 0.8684:  log likelihood = -154.03
AIC=314.07   AICc=314.3   BIC=322.11

Training set error measures:
                     ME      RMSE       MAE       MPE     MAPE      MASE
Training set -0.0389954 0.8758172 0.6589655 -0.300084 3.419388 0.5895063
                   ACF1
Training set 0.04151066

Fitting a SARIMA Model in R

auto.arima() selects optimal parameters.

Challenges:

  • Overfitting vs. underfitting.
  • Computational complexity.
  • Interpreting parameter significance.

Example: Predicting attendance at football matches using past attendance data.

Forecasting with SARIMA

forecast_sarima <- forecast(sarima_model, h = 12)
autoplot(forecast_sarima) + ggtitle("SARIMA Model Forecast")
  • Interpretation: Seasonality clearly reflected in predictions.

  • Sports Example: Predicting monthly goal-scoring rates for a football team.

Advanced SARIMA Diagnostics

Autocorrelation (ACF) and Partial Autocorrelation (PACF)

  • ACF: Identifies potential MA components.
  • PACF: Identifies potential AR components.
par(mfrow=c(1,2))
acf(ts_data, main = "ACF of Seasonal Data")
pacf(ts_data, main = "PACF of Seasonal Data")
par(mfrow=c(1,1))
  • Note: Use ACF/PACF to guide manual ARIMA specification.

Residual Analysis

  • Ensures model residuals have no remaining autocorrelation.
checkresiduals(sarima_model)

    Ljung-Box test

data:  Residuals from ARIMA(0,0,0)(0,1,2)[12]
Q* = 19.484, df = 22, p-value = 0.6153

Model df: 2.   Total lags used: 24

Residual Analysis

  • Residuals should resemble white noise.
  • Ljung-Box test (provided by checkresiduals).

Introduction to Panel Data

What is Panel Data?

  • Data that follows multiple entities over time.

Examples:

  • GDP of multiple countries over 10 years.
  • Monthly sales of 100 companies over 5 years.
  • Performance of multiple teams across multiple seasons.

Why use panel data?

  • Captures both cross-sectional and time-dependent effects.
  • Improves estimation accuracy by reducing omitted variable bias.
  • Helps analyse long-term trends and relationships.

Example data

  • Panel data accounts for both entity-specific and temporal effects.

Fixed vs. Random Effects

  • Fixed Effects (FE): Controls for entity-specific differences.
  • Random Effects (RE): Assumes entity-specific effects are random.

Model Selection in R

  • FE Model: Captures unique characteristics of each entity.
  • RE Model: Generalises across entities.
fe_model <- plm(response ~ year, data = panel_data, index = c("entity", "year"), model = "within")
re_model <- plm(response ~ year, data = panel_data, index = c("entity", "year"), model = "random")

Hausman Test

  • Decides between FE and RE.
library(lmtest)
phtest(fe_model, re_model)

    Hausman Test

data:  response ~ year
chisq = 5.6566e-15, df = 4, p-value = 1
alternative hypothesis: one model is inconsistent
  • If p-value < 0.05, prefer FE.
  • If p-value > 0.05, prefer RE.

Conclusion

  • SARIMA and Panel data models offer powerful tools for forecasting and inference.
  • Proper model selection and diagnostics are critical.

Next Steps

  • Explore hybrid and Bayesian methods for more advanced analyses.