Mixed-Effects Models in Sport Data Analytics

B1705 · Week Six 2025-26

Starting point

Start with regression

We’ve used regression to answer questions like:

“Does training load predict performance?”

Remember: regression returns a coefficient telling us how much the outcome changes per unit of the predictor.


This can be useful, but regression makes a big assumption that sport data almost never satisfies: independence.

Independence

Standard regression assumes every observation is independent.


Independence: knowing one observation tells you nothing about any other.


Rolling a die
Each roll is completely unrelated to the last =pure independence.

Athlete wellness over 10 days
If day 3 = 71, you’d expect day 4 to be nearby, not 95, not 30. Not independent!

Example: “wellness” isn’t independent


Sport data often violates independence


Over Time

  • Performance today reflects yesterday’s training, fatigue, and recovery.

  • Adjacent observations share history.

Within Teams

  • Players share a coach, tactics, facilities, and schedule.

  • Teammates resemble each other more than players from other teams.

Within Individuals

  • Measuring the same athlete 50 times gives 50 observations, not 50 independent pieces of information.

Why’s this a problem?


Problem 1: Inflated confidence

  • We think we have more information than we do.

  • Standard errors shrink, p-values become too optimistic, and we end up with false certainty.

Problem 2: Missing important questions

  • We can only ask about the average effect.

  • We cannot ask whether athletes respond differently, or whether team context matters.

Solution: Fit the model to the data

Rather than removing structure to fit our model, we create a model that fits the structure.


Mixed-effects models are built for data where observations are connected through time, groups, or repeated measurement.


They’re often where the most interesting sport questions lie.



Quick discussion: Think of a dataset from your own work. Where does the dependence come from? How many “levels” are there?

Walkthrough: Build the dataset


We’ll use a simulated dataset throughout this walkthrough. Run the code below.

Code
library(tidyverse)
library(lme4)

set.seed(42)

# 20 athletes · 60 daily observations each · 1,200 rows
athlete_data <- map_dfr(1:20, function(i) {
  u_i <- rnorm(1, 0, 8.2)   # athlete's baseline deviation
  v_i <- rnorm(1, 0, 0.06)  # athlete's individual load sensitivity
  load <- runif(60, 10, 90)
  tibble(
    athlete  = paste0("A", str_pad(i, 2, "left", "0")),
    day      = 1:60,
    load     = round(load, 1),
    wellness = round(72.3 + (-0.12 + v_i) * load + u_i + rnorm(60, 0, 6.5), 1)
  )
})


What to notice: each athlete has 60 rows. Their wellness scores are not independent. They share a personal baseline (u_i) and personal slope (v_i).

Walkthrough: Visualise the dependence

Before any model-building we plot the raw data and see the structure:

Code
# Plot 6 athletes to see individual trajectories
athlete_data |>
  filter(athlete %in% paste0("A", str_pad(1:6, 2, "left", "0"))) |>
  ggplot(aes(x = load, y = wellness, colour = athlete)) +
  geom_point(alpha = 0.4, size = 1.5) +
  geom_smooth(method = "lm", se = FALSE, linewidth = 1.2) +
  facet_wrap(~athlete, nrow = 2) +
  labs(x = "Training Load", y = "Wellness",
       title = "Six athletes, each with their own baseline and trend",
       subtitle = "These are NOT six independent datasets...they come from the same population") +
  theme(legend.position = "none")

Note: baselines differ (random intercepts) and slopes differ (random slopes). This is exactly what mixed-effects models capture.

The Building Blocks

What does “mixed” mean?

Fixed Effects

  • Relationships that apply across the whole population.

  • Like a standard regression coefficient: a general tendency.

“Higher training load is associated with lower wellness, on average.”

Random Effects

  • How individual groups deviate from those general tendencies.

  • The population average does not apply identically to everyone.

“Some athletes tolerate high loads better than others.”

One model gives us two types of information: what is generally true + how much individuals vary.

Think about weather forecasting

Fixed effects = the seasonal forecast

“February in Glasgow: average 5 degrees.”

A population-level claim. Generally true, not perfectly accurate for any one day.

Random effects = the local adjustment

“Cold front from the north today, expect 3 degrees.”

An individual-level correction using specific context.


Mixed-effects models work the same way. We start with the *general, then adjust for the individual.

Fixed effects: The “population signal”


\(\beta_1 = -0.12\) for training load on wellness

“Across all athletes, each unit increase in training load is associated with a 0.12 point decrease in wellness.”


This does not claim that every athlete shows exactly -0.12.

It’s the average pattern: the signal running through the whole system.

The fixed effect captures the centre; the random effects capture the spread around it.

Random intercepts: different baselines

Some athletes consistently score higher than average for their load level. Others score lower. The random intercept gives each athlete their own baseline.

Random slopes: different responses

Some athletes’ wellness drops sharply with load; others are resilient. The random slope gives each athlete their own load-sensitivity.

Population trend vs individual lines

When to use random intercepts vs slopes

Random intercepts only

  • Individuals differ in level but respond similarly to the predictor.
  • Fewer than ~10 observations per individual.
  • Model with slopes fails to converge.

Add random slopes when…

  • Theory predicts individuals respond differently.
  • At least 10 observations per individual.
  • Model converges stably.

A random slope is a substantive claim that responsiveness genuinely varies between individuals. It should be justified, not added by default.

The Intraclass Correlation (ICC)

Before fitting any model we should ask: how much does group membership actually matter?

\[\text{ICC} = \frac{\sigma^2_{\text{between}}}{\sigma^2_{\text{between}} + \sigma^2_{\text{within}}}\]

The ICC is the proportion of total variation attributable to stable differences between groups, rather than to within-group change over time.


A high ICC means individuals resemble themselves over time more than they resemble others. Ignoring this artificially inflates the effective sample size.

Reading the ICC

Walkthrough: Null Model and ICC

We start with the null model…no predictors added.

Code
library(performance)

# Null model: no predictors, just the grouping structure
m_null <- lmer(wellness ~ 1 + (1 | athlete),
               data = athlete_data,
               REML = TRUE)

# Extract the ICC
icc(m_null)
# Intraclass Correlation Coefficient

    Adjusted ICC: 0.581
  Unadjusted ICC: 0.581
Code
# Also useful: see where the variance lives
as.data.frame(VarCorr(m_null)) |>
  select(grp, vcov, sdcor) |>
  mutate(pct = round(vcov / sum(vcov) * 100, 1))
       grp     vcov    sdcor  pct
1  athlete 69.16701 8.316670 58.1
2 Residual 49.95413 7.067823 41.9

Interpret: ICC close to 0.5 means roughly half the variation is stable between-athlete differences. This strongly justifies using mixed-effects models over standard regression.

What happens without mixed effects?

Code
# Standard regression: ignores the athlete grouping entirely
m_naive <- lm(wellness ~ load, data = athlete_data)

# Mixed-effects model: accounts for athlete grouping
m_mixed <- lmer(wellness ~ load + (1 | athlete),
                data = athlete_data, REML = FALSE)

# Compare the standard error for load
summary(m_naive)$coefficients["load", ]
     Estimate    Std. Error       t value      Pr(>|t|) 
-1.262089e-01  1.267996e-02 -9.953416e+00  1.767297e-22 
Code
# SE artificially small because 1,200 rows treated as independent

tidy(m_mixed, effects = "fixed")
# A tibble: 2 × 5
  effect term        estimate std.error statistic
  <chr>  <chr>          <dbl>     <dbl>     <dbl>
1 fixed  (Intercept)   74.1     1.85         40.2
2 fixed  load          -0.114   0.00808     -14.1
Code
# SE correctly reflects that we have 20 athletes, not 1,200 independent obs

Key point: the naive model treats every row as an independent observation. The mixed model correctly recognises 20 independent athletes with 60 correlated observations each. The difference in standard errors shows the cost of ignoring dependence.

Discussion: What does ICC = 0.62 mean?

Scenario: An S&C coach has 20 athletes, measured daily for 8 weeks. The null model gives ICC = 0.62.

  1. What does ICC = 0.62 tell us about this dataset?
  2. The coach runs a paired t-test on average pre/post scores instead. What goes wrong?
  3. Does a high ICC make mixed-effects models more or less important to use?

Key point: A high ICC is the justification for mixed-effects models, rather than a problem to fix. It tells us the grouping structure is real and consequential.

Nested and Grouped Structures

Sport data is often hierarchical

Why nesting matters

Reflect: Players on the same team share a coach, tactics, facilities, and schedule.

Ignoring nesting misattributes team-level variation to individual differences. We appear to have more independent information than we do, and risk answering the wrong question with false confidence.

Two sources of variation:

  • Between entities: stable level differences (Athlete A is fitter than B).
  • Within entities: change over time (A improves through pre-season).

Standard regression lumps both into one coefficient.

Mixed-effects models separate them, letting us ask questions about each.

Crossed vs nested effects

Nested: Players belong to one team only.

# Players within teams
lmer(performance ~ x +
       (1 | team/player),
     data = d)

Crossed: Any player can face any opponent.

# Player AND opponent both matter
lmer(performance ~ x +
       (1 | player) +
       (1 | opponent),
     data = d)

Crossed example:

A player’s match performance is shaped by their own form and by the specific opponent’s quality. These two factors cross because any player can face any team.

Rule of thumb: if you can draw a clean tree diagram, the structure is nested. If lines cross between groups, it is crossed.

Walkthrough: Check the structure

Before writing the model formula, check whether your grouping is nested or crossed.

Code
# How many observations per athlete?
athlete_data |>
  count(athlete) |>
  summary()

# Are athletes always the same individuals? (repeated measures)
# Yes → (1 | athlete) or (1 + load | athlete)

# If athletes were nested within teams:
# lmer(wellness ~ load + (1 | team/athlete), data = team_data)
# Shorthand for (1 | team) + (1 | team:athlete)

# If match opponent also matters (crossed):
# lmer(performance ~ x + (1 | player) + (1 | opponent), data = match_data)

For our walkthrough data: one grouping level only (athlete), repeated measures. The formula will be (1 | athlete) or (1 + load | athlete).

Activity: Spot the structure

For each dataset: (a) what creates dependence? (b) nested, crossed, or none? (c) ICC high or low?

  1. Weekly sprint times, 12 athletes, 6-month season
  2. Heart rate at maximal test, 200 people, 10 clubs, one measure each
  3. Pass completion per match, 20 players, same team, full season
  4. Injury (yes/no), 500 amateur runners, one observation each, nationwide
Dependence Structure ICC
1 Repeated within athlete Nested High
2 Within clubs Nested Moderate
3 Repeated + shared team Nested + crossed High
4 None None ~Zero

Choosing the Right Model

But…not all outcomes are continuous

So far we’ve dealt with continuous outcomes such as wellness, jump height, and sprint time.

But sport data comes in many forms:

Outcome type Example
Binary (yes/no) Selected for XI? Injured? Won?
Count (0, 1, 2…) Goals, fouls, tackles per match
Proportion Pass completion (23 from 30)
Ordered RPE rating 1 to 10

Choosing the wrong model family produces biased estimates and misleading uncertainty.

GLMMs: Two key additions

Generalised Linear Mixed Models extend the framework to non-continuous outcomes by adding two components:

Distribution

Chosen to match the outcome:

  • Binary → Binomial
  • Counts → Poisson
  • Overdispersed counts → Negative Binomial

Connects the linear predictor to the outcome scale:

  • Binomial → Logit (log-odds)
  • Poisson / NB → Log
  • Gaussian → Identity (standard LMM)

The random effects structure (intercepts, slopes, nesting) works identically regardless of which distribution you choose.

Model Selection Guide

Outcome Model Link Sport example
Continuous Linear Mixed (LMM) Identity Jump height over season
Binary (0/1) Logistic GLMM Logit Starting XI selection
Count Poisson / Neg. Bin. Log Goals or fouls per match
Proportion (k/n) Binomial GLMM Logit Pass completion rate
Ordered categories Ordinal GLMM Cumulative logit RPE ratings
Time to event Survival model Various Days to first injury

Walkthrough: Fit and compare models

Build up from a null model. Use ML for comparison, REML for final estimates.

Code
# Build up complexity one step at a time
# Use REML = FALSE when comparing models with different fixed effects
m0 <- lmer(wellness ~ 1          + (1           | athlete),
           data = athlete_data, REML = FALSE)   # null: no predictors

m1 <- lmer(wellness ~ load       + (1           | athlete),
           data = athlete_data, REML = FALSE)   # fixed slope, random intercepts

m2 <- lmer(wellness ~ load       + (1 + load    | athlete),
           data = athlete_data, REML = FALSE)   # random intercepts AND slopes

# Likelihood ratio tests: does each addition improve the model?
anova(m0, m1, m2)
Data: athlete_data
Models:
m0: wellness ~ 1 + (1 | athlete)
m1: wellness ~ load + (1 | athlete)
m2: wellness ~ load + (1 + load | athlete)
   npar    AIC    BIC  logLik deviance   Chisq Df Pr(>Chisq)    
m0    3 8192.4 8207.7 -4093.2   8186.4                          
m1    4 8010.0 8030.4 -4001.0   8002.0 184.392  1  < 2.2e-16 ***
m2    6 7990.0 8020.6 -3989.0   7978.0  23.973  2  6.229e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
# AIC comparison: lower = better balance of fit and complexity
AIC(m0, m1, m2)
   df      AIC
m0  3 8192.387
m1  4 8009.995
m2  6 7990.023

Expect: m1 substantially better than m0 (load matters). m2 may improve on m1 if athletes genuinely differ in slope. If isSingular(m2) returns TRUE, stick with m1.

Walkthrough: Final model with CIs and p-values

Once you’ve chosen the model, refit with REML and extract everything needed for reporting.

Code
library(lmerTest)   # adds Satterthwaite p-values to lmer
library(broom.mixed)

# Refit chosen model with REML = TRUE for unbiased variance estimates
m_final <- lmer(wellness ~ load + (1 + load | athlete),
                data = athlete_data, REML = TRUE)

# Fixed effects with p-values and 95% CIs
tidy(m_final, effects = "fixed", conf.int = TRUE)
# A tibble: 2 × 9
  effect term     estimate std.error statistic    df  p.value conf.low conf.high
  <chr>  <chr>       <dbl>     <dbl>     <dbl> <dbl>    <dbl>    <dbl>     <dbl>
1 fixed  (Interc…   74.2      1.95       38.0   10.1 3.20e-12   69.8     78.5   
2 fixed  load       -0.114    0.0150     -7.57  18.2 4.99e- 7   -0.145   -0.0821
Code
# Random effects: SDs and correlation
tidy(m_final, effects = "ran_pars")
# A tibble: 4 × 4
  effect   group    term                  estimate
  <chr>    <chr>    <chr>                    <dbl>
1 ran_pars athlete  sd__(Intercept)         8.51  
2 ran_pars athlete  cor__(Intercept).load   0.276 
3 ran_pars athlete  sd__load                0.0570
4 ran_pars Residual sd__Observation         6.40  
Code
# Profile likelihood CIs (more accurate than Wald for smaller samples)
confint(m_final, method = "profile")
                  2.5 %      97.5 %
.sig01       4.92019446  9.75255657
.sig02      -0.25298993  0.74584423
.sig03       0.03371625  0.08513364
.sigma       6.16278253  6.68540883
(Intercept) 70.92944226 77.40390589
load        -0.14361879 -0.08360662
Code
# Marginal and conditional R-squared
r2(m_final)
# R2 for Mixed Models

  Conditional R2: 0.715
     Marginal R2: 0.050

Marginal R-squared = variance explained by fixed effects alone. Conditional R-squared = fixed + random. The gap between the two tells you how much individual variation the random effects capture.

Applied Examples

Three case studies


Case Study 1

Athlete Monitoring

Does training load affect wellness? Do athletes differ in their response?

Linear Mixed Model

Case Study 2

Team Selection

What drives starting XI selection, accounting for player and team differences?

Logistic GLMM

Case Study 3

Match Events

How do foul rates vary by position, adjusted for playing time?

Negative Binomial GLMM

Case Study 1: The Question

“How does training load affect wellness, and do athletes differ in their response?”

Two questions requiring two types of answer:

Question Answer from
What is the general load/wellness relationship? Fixed effect
How much do athletes vary in their response? Random effects

Data: 20 athletes, 120 daily observations each, 2,400 rows total.

Case Study 1: Fitting the Model

Code
library(lme4)

set.seed(42)
athlete_data <- map_dfr(1:20, function(i) {
  u_i <- rnorm(1, 0, 8.2); v_i <- rnorm(1, 0, 0.06)
  load <- runif(120, 10, 90)
  tibble(athlete  = i, load = load,
         wellness = 72.3 + (-0.12 + v_i) * load + u_i + rnorm(120, 0, 6.5))
})

# Random intercepts AND slopes: each athlete gets their own baseline + sensitivity
m1 <- lmer(wellness ~ load + (1 + load | athlete),
           data = athlete_data, REML = TRUE)

Case Study 1: Reading the Output

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 athlete  (Intercept)  67.24    8.20        <- baseline variation between athletes
          load          0.004   0.06    0.12 <- load-sensitivity variation
 Residual              42.25    6.50        <- within-athlete noise

Fixed effects:
              Estimate Std. Error t value
(Intercept)    72.30      1.87     38.7  <- population average at load = 0
load            -0.12      0.014   -8.6  <- each load unit: -0.12 wellness points

Random effects: SD(Intercept) = 8.2, meaning athletes vary by roughly 8 points in baseline wellness.

Fixed effects: load = -0.12 is the population average; t = -8.6 provides strong evidence of a real effect.

Case Study 1: Getting p-values

Code
# lme4 omits p-values by design
# lmerTest adds them using the Satterthwaite approximation
library(lmerTest)

m1_test <- lmer(wellness ~ load + (1 + load | athlete),
                data = athlete_data, REML = TRUE)

summary(m1_test)                          # p-values now included
confint(m1_test, method = "profile")      # profile likelihood CIs

library(broom.mixed)
tidy(m1_test, effects = "fixed",  conf.int = TRUE)
tidy(m1_test, effects = "ran_pars")

library(performance)
icc(m1_test)

Case Study 1: What This Tells Us

Population story (fixed effects)

Load is associated with lower wellness: -0.12 per unit (95% CI: -0.15 to -0.09).

A consistent signal coaches can act on.

Individual story (random effects)

Baseline SD = 8 points. Load-sensitivity SD = 0.06.

Some athletes tolerate high loads far better than others.

Challenge: The random slope SD is 0.06, while the population slope is -0.12. What does this mean practically? At what load level do two athletes at opposite extremes of the slope distribution diverge most?

Case Study 2: Team Selection

“What drives starting XI selection, and do patterns differ between players and teams?”

Binary outcome (selected: yes/no) → Logistic GLMM

\[\log\!\left(\frac{p_{ij}}{1-p_{ij}}\right) = \beta_0 + \beta_1 \cdot \text{Form} + \beta_2 \cdot \text{Rest} + u_i^{\text{player}} + u_j^{\text{team}}\]

Crossed random effects: random intercepts for player (squad role, manager preference) AND team (selection philosophy). These are crossed rather than nested because the dataset spans multiple seasons.

Case Study 2: Fit and Extract

Code
# Simulate: 30 players · 20 matches each · outcome = scored (yes/no)
set.seed(7)
selection_data <- map_dfr(1:30, function(i) {
  u_i <- rnorm(1, 0, 1.0)
  tibble(
    player   = paste0("P", str_pad(i, 2, "left", "0")),
    form     = runif(20, 0, 10),
    rest     = sample(1:5, 20, replace = TRUE),
    log_odds = -2.4 + 0.65 * form - 0.08 * rest + u_i,
    selected = rbinom(20, 1, plogis(log_odds))
  )
})

# Fit logistic GLMM
m_logit <- glmer(selected ~ form + rest + (1 | player),
                 data   = selection_data,
                 family = binomial)

# Odds ratios (exponentiate coefficients)
exp(fixef(m_logit)) |> round(3)
(Intercept)        form        rest 
      0.109       1.884       0.921 

Always follow up with predicted probabilities on the next slide.

Case Study 2: Predicted Probabilities

Code
library(broom.mixed)

# Fixed-effect estimates and CIs as odds ratios
tidy(m_logit, effects = "fixed", conf.int = TRUE, exponentiate = TRUE)

# Predicted probability for specific scenarios
# re.form = NA gives population-average predictions (ignores player random effects)
new_scenarios <- tibble(
  form   = c(3, 5, 7, 9),
  rest   = 3
)

new_scenarios |>
  mutate(
    prob = predict(m_logit,
                   newdata = new_scenarios,
                   type    = "response",
                   re.form = NA)
  )
# Form 3 → low probability · Form 9 → high probability
# The absolute change shows what OR alone cannot

Case Study 2: Key Results

Parameter Log-Odds OR Meaning
Intercept -2.4 0.09 Baseline (form = 0, rest = 0)
Form (per point) +0.65 1.92 Each point nearly doubles the odds
Rest days -0.08 0.92 Each day reduces odds by ~8%
Player SD 1.2 Large between-player variation

Form = 5, rest = 3: \(p = \text{logistic}(-2.4 + 0.65 \times 5 - 0.08 \times 3) = \mathbf{0.71}\)

A 71% chance of selection. This communicates far more than OR = 1.92 alone.

Case Study 2: Selection Probability by Form

Case Study 3: Foul Rates by Position

“How do foul rates vary by position, adjusted for playing time and individual player differences?”

Count outcome (fouls per match), with playing time varying widely → Negative Binomial GLMM with offset.

Code
# Simulate: 30 players · 20 matches · outcome = fouls (count)
set.seed(99)
foul_data <- map_dfr(1:30, function(i) {
  u_i      <- rnorm(1, 0, 0.4)
  position <- sample(c("Defender","Midfielder","Forward"), 1)
  tibble(
    player   = paste0("P", str_pad(i, 2, "left", "0")),
    position = position,
    minutes  = sample(seq(20, 90, by = 5), 20, replace = TRUE),
    fouls    = rnbinom(20,
                 mu   = exp(0.8 +
                            ifelse(position=="Midfielder", -0.27, 0) +
                            ifelse(position=="Forward",   -0.45, 0) +
                            u_i) * (minutes / 90),
                 size = 2)
  )
})

# Step 1: try Poisson
m_pois <- glmer(fouls ~ position + offset(log(minutes / 90)) + (1 | player),
                data = foul_data, family = poisson)

check_overdispersion(m_pois)
# Overdispersion test

       dispersion ratio =   1.445
  Pearson's Chi-Squared = 861.042
                p-value = < 0.001

Case Study 3: Upgrading to Negative Binomial

Code
library(glmmTMB)
library(performance)

# Overdispersion confirmed, so upgrade to Negative Binomial
m_nb <- glmmTMB(fouls ~ position + offset(log(minutes / 90)) + (1 | player),
                data   = foul_data,
                family = nbinom2)

# Rate ratios (exponentiate log coefficients)
exp(fixef(m_nb)$cond) |> round(3)
# Defenders = reference · Midfielders and Forwards show rate ratios below 1

# Predicted fouls per 90 by position (population average)
predict(m_nb,
        newdata  = tibble(position = c("Defender","Midfielder","Forward"),
                          minutes  = 90,
                          player   = NA),
        type    = "response",
        re.form = NA) |>
  round(2)

Rate ratio interpretation: if Midfielder RR = 0.76, they commit 24% fewer fouls per 90 minutes than Defenders, adjusting for playing time and individual player differences.

Discussion: The Offset

  1. A midfielder plays 90 min and commits 2 fouls. A substitute plays 20 min and commits 2 fouls. Without an offset, what foul “rate” does each appear to have?

  2. With offset(log(minutes/90)), how does the model interpret these two observations?

  3. Why is the offset coefficient fixed at 1 rather than estimated by the model?

Hint for (3): Fixing at 1 is equivalent to directly modelling fouls per 90. Estimating it would let the model re-weight the exposure, defeating the purpose of the adjustment.

Reporting and Diagnostics

What to report: fixed effects

Do report:

“Load decreased wellness by 0.12 points per unit (95% CI: -0.15 to -0.09).”

“Higher form nearly doubled selection odds (OR = 1.92, 95% CI: 1.65 to 2.23).”

Avoid:

“Load was significant (p < .001).”

A p-value says nothing about effect size or practical importance.

Checklist: effect size with confidence interval; predicted probability or rate (not just OR); acknowledge that “on average” hides individual variation; interpret on the original outcome scale.

What to report: random effects

Key points to report:

Report the SD, not just that random effects exist.

“Athletes varied substantially in baseline wellness (SD = 8.2 pts, ICC = 0.49).”

Relate to the ICC: it contextualises where variation lives.

Visualise with a caterpillar plot to show who differs most.

Shrinkage

Shrinkage is the process by which the model pulls extreme individual estimates back toward the population mean. Athletes with fewer observations are pulled further. This is not a flaw; it protects against overfitting to small samples.

Diagnostic Checks

Four things to check:

  1. Residuals vs fitted: no systematic pattern.
  2. Q-Q plot: residuals close to the diagonal.
  3. Convergence: isSingular() returns FALSE.
  4. Overdispersion: for count models, use check_overdispersion().

Diagnostics in R

library(performance)
library(see)

# All diagnostics in one call
check_model(m1)

# Individual checks
isSingular(m1)                # TRUE = model over-specified, simplify
check_overdispersion(m_pois)  # for count GLMMs
check_heteroscedasticity(m1)
check_normality(m1)

# Variance explained
r2(m1)
# Marginal  R-squared = variance from fixed effects only
# Conditional R-squared = variance from fixed + random effects

Singular fit: isSingular() returning TRUE means the model is trying to estimate more than the data supports. The usual fix is to remove the random slope.

REML vs ML

Use REML for…

Final estimates of variance components: random effect SDs and ICC.

REML accounts for fixed-effect degrees of freedom, giving less biased variance estimates.

lmer(..., REML = TRUE) is the default.

Use ML for…

Comparing models that differ in their fixed effects (AIC, likelihood ratio tests).

REML likelihoods are not comparable across different fixed-effect structures.

lmer(..., REML = FALSE).

Workflow: compare with ML → select best model → refit with REML → report.

Quick Reference: The Full Workflow

library(lme4); library(lmerTest); library(performance); library(broom.mixed)

# 1. Check if grouping structure matters
m_null <- lmer(outcome ~ 1 + (1 | group), data = d, REML = TRUE)
icc(m_null)

# 2. Fit your model (choose family by outcome type)
m1 <- lmer(outcome ~ predictor + (1 + predictor | group), data = d, REML = TRUE)

# 3. Check
isSingular(m1)
check_model(m1)

# 4. Compare models (use ML, then refit with REML)
anova(m_simple, m1)

# 5. Report
tidy(m1, effects = "fixed",   conf.int = TRUE)
tidy(m1, effects = "ran_pars")
r2(m1)