Seasonal and Panel Data Models

B1705, Week Eight

Introduction

Overview

Understanding seasonal and panel data models.
Explore core concepts with visual demonstrations.
Implement SARIMA and panel data models in R.

Conceptual Overview

What is Seasonality?

Definition: A repeating pattern at fixed intervals in a time series.

Common Examples:

Retail sales peak every December.
Ice cream sales increase in summer.
Electricity consumption varies seasonally.

Why does it matter?

Ignoring seasonality can lead to misleading trends and poor forecasting.
Must be explicitly modeled for accurate predictive insights.

Visualising Seasonal Patterns in Data

We look for periodic cycles, indicating seasonality.

Decomposing Time Series

Example

Code

decomposed <- decompose(ts_data)
autoplot(decomposed)

Why decompose?

To separate trend, seasonality, and random noise.
Helps diagnose whether differencing or transformation is required.
Facilitates clearer insights into the underlying structure of the data:

Elements of decomposition

Trend: Long-term progression or decline in data.
Seasonality: Regularly repeating fluctuations at fixed intervals.
Noise: Random fluctuations around the trend and seasonal components.

Uses of decomposition

Enhance model selection accuracy by clarifying:

Whether a time series model should include seasonal differencing.
Whether transformations (e.g., logarithmic) are necessary to stabilise variance.

It improves forecasting accuracy by clearly identifying and modeling individual components separately.

Types of decomposition

Additive model: When seasonal effects remain constant over time.
Multiplicative model: When seasonal effects grow with the trend.
Implications for forecasting: Models must be adjusted accordingly.

What do we do next - Trend?

Represents the long-term direction (growth, decline, or stability).
Analyse for structural changes or shifts in long-term patterns.
Model using trend-adjusted methods or regression-based forecasting.
Decide if differencing is needed to remove the trend (for ARIMA models).

What do we do next - Seasonal component

Captures regular, repeating fluctuations.
Incorporate explicitly into seasonal models (e.g., SARIMA or seasonal regression).
Determine if seasonal differencing is required (e.g., removing seasonal cycles).
Inform adjustments like seasonal indexing or seasonal smoothing.

What do we do next - Noise (Residual) component

Represents random variation, unexplained by trend or seasonality. Next steps:
Check if residuals resemble white noise using diagnostic tests (e.g., Ljung-Box test).
Identify outliers or unusual events.
Improve model accuracy by minimising variance of these residuals through model refinement.

Seasonal ARIMA (SARIMA)

What is ARIMA?

Autoregressive Integrated Moving Average (ARIMA) models non-seasonal time series.

Components:

AR (\(p\)): Influence of past values.
I (\(d\)): Differencing to remove trends.
MA (\(q\)): Past errors influence future values.

What is SARIMA?

Seasonal ARIMA (SARIMA) extends ARIMA by incorporating:

Seasonal AR (\(P\)): Accounts for seasonal dependencies.
Seasonal Differencing (\(D\)): Removes repeating seasonal patterns.
Seasonal MA (\(Q\)): Captures seasonal moving averages.
Period (\(s\)): Defines the seasonal cycle (e.g., 12 for monthly data).

Fitting a SARIMA Model in R

sarima_model <- auto.arima(ts_data, seasonal = TRUE)
summary(sarima_model)

Series: ts_data 
ARIMA(0,0,0)(0,1,2)[12] 

Coefficients:
         sma1    sma2
      -1.0450  0.1803
s.e.   0.1534  0.1146

sigma^2 = 0.8684:  log likelihood = -154.03
AIC=314.07   AICc=314.3   BIC=322.11

Training set error measures:
                     ME      RMSE       MAE       MPE     MAPE      MASE
Training set -0.0389954 0.8758172 0.6589655 -0.300084 3.419388 0.5895063
                   ACF1
Training set 0.04151066

Fitting a SARIMA Model in R

auto.arima() selects optimal parameters.

Challenges:

Overfitting vs. underfitting.
Computational complexity.
Interpreting parameter significance.

Example: Predicting attendance at football matches using past attendance data.

Forecasting with SARIMA

forecast_sarima <- forecast(sarima_model, h = 12)
autoplot(forecast_sarima) + ggtitle("SARIMA Model Forecast")

Interpretation: Seasonality clearly reflected in predictions.
Sports Example: Predicting monthly goal-scoring rates for a football team.

Advanced SARIMA Diagnostics

Autocorrelation (ACF) and Partial Autocorrelation (PACF)

ACF: Identifies potential MA components.
PACF: Identifies potential AR components.

par(mfrow=c(1,2))
acf(ts_data, main = "ACF of Seasonal Data")
pacf(ts_data, main = "PACF of Seasonal Data")

par(mfrow=c(1,1))

Note: Use ACF/PACF to guide manual ARIMA specification.

Residual Analysis

Ensures model residuals have no remaining autocorrelation.

checkresiduals(sarima_model)


    Ljung-Box test

data:  Residuals from ARIMA(0,0,0)(0,1,2)[12]
Q* = 19.484, df = 22, p-value = 0.6153

Model df: 2.   Total lags used: 24

Residual Analysis

Residuals should resemble white noise.
Ljung-Box test (provided by checkresiduals).

Introduction to Panel Data

What is Panel Data?

Data that follows multiple entities over time.

Examples:

GDP of multiple countries over 10 years.
Monthly sales of 100 companies over 5 years.
Performance of multiple teams across multiple seasons.

Why use panel data?

Captures both cross-sectional and time-dependent effects.
Improves estimation accuracy by reducing omitted variable bias.
Helps analyse long-term trends and relationships.

Example data

Panel data accounts for both entity-specific and temporal effects.

Fixed vs. Random Effects

Fixed Effects (FE): Controls for entity-specific differences.
Random Effects (RE): Assumes entity-specific effects are random.

Model Selection in R

FE Model: Captures unique characteristics of each entity.
RE Model: Generalises across entities.

fe_model <- plm(response ~ year, data = panel_data, index = c("entity", "year"), model = "within")
re_model <- plm(response ~ year, data = panel_data, index = c("entity", "year"), model = "random")

Hausman Test

Decides between FE and RE.

library(lmtest)
phtest(fe_model, re_model)


    Hausman Test

data:  response ~ year
chisq = 5.6566e-15, df = 4, p-value = 1
alternative hypothesis: one model is inconsistent

If p-value < 0.05, prefer FE.
If p-value > 0.05, prefer RE.

Conclusion

SARIMA and Panel data models offer powerful tools for forecasting and inference.
Proper model selection and diagnostics are critical.

Next Steps

Explore hybrid and Bayesian methods for more advanced analyses.