MANOVA and MANCOVA - Demonstration

B1705, Week Six

Introduction

MANOVA (Multivariate Analysis of Variance) and MANCOVA (Multivariate Analysis of Covariance) are statistical techniques used to compare group means on multiple dependent variables.

Introduction

MANOVA is an extension of ANOVA that handles multiple outcome variables simultaneously.
MANCOVA extends MANOVA by including one or more continuous covariates, allowing us to control for their influence.

Use case

For example, we might want to explore whether different types of training programs (Control vs. Aerobic vs. Strength) influence two performance measures (SkillTest1 for accuracy and SkillTest2 for agility).
With > 1 DVs, this is a case for MANOVA.

Use case

But, we might also want to explore how the inclusion of a covariate (e.g., Age) might alter the group comparisons.
That’s a case for MANCOVA.

Exploring the Data

Basic Inspection

Show code

# Quick look at data structure
head(df)

  ID TrainingGroup Age SkillTest1 SkillTest2
1  1       Control  32         77         36
2  2       Control  31         77         33
3  3       Control  20         68         34
4  4       Control  27         68         38
5  5       Control  35         80         35
6  6       Control  28         79         47

Show code

summary(df)

       ID         TrainingGroup      Age          SkillTest1      SkillTest2  
 Min.   : 1.00   Aerobic :20    Min.   :18.00   Min.   :49.00   Min.   :27.0  
 1st Qu.:15.75   Control :20    1st Qu.:23.75   1st Qu.:60.00   1st Qu.:38.0  
 Median :30.50   Strength:20    Median :27.00   Median :64.00   Median :41.0  
 Mean   :30.50                  Mean   :27.05   Mean   :65.27   Mean   :41.6  
 3rd Qu.:45.25                  3rd Qu.:31.00   3rd Qu.:71.00   3rd Qu.:45.0  
 Max.   :60.00                  Max.   :35.00   Max.   :83.00   Max.   :62.0

Notes

We have 3 groups (“Control”, “Aerobic”, “Strength”) of participants. There are 60 total participants (20 in each group).
Each participant has two performance measures (DVs): SkillTest1 and SkillTest2.
Age serves as a continuous covariate that may affect performance.

Visualising Distributions

We can create boxplots to compare the distribution of SkillTest1 and SkillTest2 across the three training groups…

Visualising Distributions

Show code

par(mfrow = c(1, 2))
boxplot(SkillTest1 ~ TrainingGroup, data = df,
        main = "SkillTest1 by TrainingGroup",
        ylab = "SkillTest1 Score", xlab = "Training Group")
boxplot(SkillTest2 ~ TrainingGroup, data = df,
        main = "SkillTest2 by TrainingGroup",
        ylab = "SkillTest2 Score", xlab = "Training Group")

We might also want to see how Age relates to each skill test.

Show code

plot(df$Age, df$SkillTest1, 
     col = df$TrainingGroup, pch = 19, 
     xlab = "Age", ylab = "SkillTest1 Score", 
     main = "Age vs. SkillTest1")
legend("topleft", legend = levels(df$TrainingGroup), 
       col = 1:3, pch = 19, bty = "n")

MANOVA

Conceptual Overview

MANOVA evaluates whether there is a statistically significant difference in the mean vector of multiple dependent variables across groups.
In our example, the dependent variables are SkillTest1 and SkillTest2, and the factor is TrainingGroup.

Fitting the MANOVA Model

We use the base R function manova() to fit a MANOVA model.
Note that we place the two outcome variables in cbind() and specify TrainingGroup as the predictor.

Show code

# Fit the MANOVA
manova_model <- manova(cbind(SkillTest1, SkillTest2) ~ TrainingGroup, data = df)

Summarise model using different test statistics

Wilks

summary(manova_model, test = "Wilks")

              Df   Wilks approx F num Df den Df    Pr(>F)    
TrainingGroup  2 0.31673   21.752      4    112 2.664e-13 ***
Residuals     57                                             
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Pillai

summary(manova_model, test = "Pillai")

              Df  Pillai approx F num Df den Df    Pr(>F)    
TrainingGroup  2 0.74357   16.867      4    114 6.892e-11 ***
Residuals     57                                             
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation

Look at the p-values to see if the effect of TrainingGroup is significant on both dependent variables jointly.
A small p-value (e.g., < 0.05) suggests at least one group differs from the others in the joint distribution of SkillTest1 and SkillTest2.

Post-hoc Investigation

But where do the differences lie?

Using post-hoc tests

If the MANOVA is significant, we should do post-hoc checks to understand where the differences lie.

Univariate ANOVAs for each skill test

summary.aov(manova_model)

 Response SkillTest1 :
              Df Sum Sq Mean Sq F value    Pr(>F)    
TrainingGroup  2 2698.5 1349.27  48.212 5.552e-13 ***
Residuals     57 1595.2   27.99                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 Response SkillTest2 :
              Df Sum Sq Mean Sq F value    Pr(>F)    
TrainingGroup  2  775.6  387.80  15.624 3.889e-06 ***
Residuals     57 1414.8   24.82                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This gives us separate ANOVAs for SkillTest1 and SkillTest2 by TrainingGroup.
Pairwise Comparisons (e.g., using TukeyHSD()) on each ANOVA model let us see which groups differ significantly for each skill test…

Show code

# Tukey's HSD for each dependent variable
# 1. Fit separate aov() models for each outcome
aov_skill1 <- aov(SkillTest1 ~ TrainingGroup, data = df)
aov_skill2 <- aov(SkillTest2 ~ TrainingGroup, data = df)

# Apply TukeyHSD on each model, specifying factor of interest
tukey_skill1 <- TukeyHSD(aov_skill1, "TrainingGroup")
tukey_skill2 <- TukeyHSD(aov_skill2, "TrainingGroup")

# Inspect pairwise comparisons and p-values
tukey_skill1

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = SkillTest1 ~ TrainingGroup, data = df)

$TrainingGroup
                  diff        lwr        upr     p adj
Control-Aerobic   13.3   9.274302  17.325698 0.0000000
Strength-Aerobic  -1.7  -5.725698   2.325698 0.5697806
Strength-Control -15.0 -19.025698 -10.974302 0.0000000

Show code

tukey_skill2

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = SkillTest2 ~ TrainingGroup, data = df)

$TrainingGroup
                 diff        lwr        upr     p adj
Control-Aerobic  -4.1 -7.8912393 -0.3087607 0.0311497
Strength-Aerobic  4.7  0.9087607  8.4912393 0.0115073
Strength-Control  8.8  5.0087607 12.5912393 0.0000020

Notes

We re-fit (or explicitly fit) the univariate models:

aov_skill1 for SkillTest1 ~ TrainingGroup
aov_skill2 for SkillTest2 ~ TrainingGroup

We run TukeyHSD() on each model, specifying “TrainingGroup” as the factor.
We review the results (tukey_skill1 and tukey_skill2) to identify which group means differ significantly for each outcome measure.

MANCOVA

Including a covariate…

Overview

MANCOVA is similar to MANOVA but includes one or more covariates.

By including a covariate (e.g., Age), we can:

Control for the effect of age on the dependent variables.
See whether group differences remain after adjusting for age.

What’s a covariate?

A variable that is not the primary focus of a study but may influence the outcome
Include it to help to control for variability in statistical models
e.g., age, BMI, previous injury history, home vs. away, temperature

Fitting the MANCOVA Model

We add Age as an additional predictor in our formula:

Show code

mancova_model <- manova(cbind(SkillTest1, SkillTest2) ~ TrainingGroup + Age, data = df)
summary(mancova_model, test = "Wilks")

              Df   Wilks approx F num Df den Df    Pr(>F)    
TrainingGroup  2 0.27371  25.0635      4    110 9.133e-15 ***
Age            1 0.80176   6.7994      2     55  0.002298 ** 
Residuals     56                                             
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Show code

summary(mancova_model, test = "Pillai")

              Df  Pillai approx F num Df den Df   Pr(>F)    
TrainingGroup  2 0.79128  18.3301      4    112 1.31e-11 ***
Age            1 0.19824   6.7994      2     55 0.002298 ** 
Residuals     56                                            
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

What’s this doing?

Now the model tests whether TrainingGroup still has a significant effect on the combined outcomes after controlling for Age.
We can also see if Age itself is significant in the multivariate model.

Post-hoc for MANCOVA

As before, we can do post-hoc tests to check where the differences lie:

Show code

summary.aov(mancova_model)

 Response SkillTest1 :
              Df  Sum Sq Mean Sq F value    Pr(>F)    
TrainingGroup  2 2698.53 1349.27  58.743 1.778e-14 ***
Age            1  308.95  308.95  13.451 0.0005466 ***
Residuals     56 1286.25   22.97                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 Response SkillTest2 :
              Df  Sum Sq Mean Sq F value    Pr(>F)    
TrainingGroup  2  775.60  387.80 15.7384 3.769e-06 ***
Age            1   34.94   34.94  1.4181    0.2387    
Residuals     56 1379.86   24.64                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This provides separate univariate ANCOVA models for SkillTest1 and SkillTest2.

It shows the effect of TrainingGroup while controlling for Age.
It shows the effect of Age.
(If an interaction TrainingGroup:Age is theoretically relevant, you could include it in the model.)

Reporting the MANCOVA

A multivariate analysis of variance (MANOVA) was conducted to examine the effects of Training Group and Age on performance in two skill tests.

Results indicated a significant main effect of Training Group on both SkillTest1 (F(2, 56) = 58.74, p < .001) and SkillTest2 (F(2, 56) = 15.74, p < .001), suggesting that training group membership significantly influenced performance on both measures.

Age was also a significant predictor for SkillTest1 (F(1, 56) = 13.45, p = .001), but not for SkillTest2 (F(1, 56) = 1.42, p = .24).
These findings suggest that while training group consistently affects skill performance, age may play a role in some but not all skill outcomes.

Checking Assumptions

Assumptions

Both MANOVA and MANCOVA share a set of assumptions:

Multivariate normality of dependent variables (within each group).
Homogeneity of covariance matrices (the covariance structure of the dependent variables is the same across groups).

Linearity and Independence

For MANCOVA specifically, linear relationship between the covariate(s) and each dependent variable.
In practice, diagnostic plots or tests (e.g., Box’s M test from the heplots package) help verify the homogeneity of covariance matrices.

Visualising Results

Mean Profiles by Group

We can visualise how the groups differ on each skill test (adjusted or unadjusted for age) in various ways.

One simple approach is to look at group means with error bars…

Show code

library(dplyr)
library(ggplot2)

df_summary <- df %>% 
  group_by(TrainingGroup) %>% 
  summarise(
    mean_SkillTest1 = mean(SkillTest1),
    sd_SkillTest1   = sd(SkillTest1),
    mean_SkillTest2 = mean(SkillTest2),
    sd_SkillTest2   = sd(SkillTest2)
  )

# Plot SkillTest1 means
p1 <- ggplot(df, aes(x = TrainingGroup, y = SkillTest1)) +
  stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1), geom = "errorbar", width = 0.2) +
  stat_summary(fun = mean, geom = "point", size = 3, color = "blue") +
  labs(title = "Group Means for SkillTest1", y = "Mean SkillTest1") +
  theme_minimal()

# Plot SkillTest2 means
p2 <- ggplot(df, aes(x = TrainingGroup, y = SkillTest2)) +
  stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1), geom = "errorbar", width = 0.2) +
  stat_summary(fun = mean, geom = "point", size = 3, color = "red") +
  labs(title = "Group Means for SkillTest2", y = "Mean SkillTest2") +
  theme_minimal()

p1

Notes

Blue points: Mean SkillTest1 per group with ±1 SD error bars.
Red points: Mean SkillTest2 per group with ±1 SD error bars.

For MANCOVA results, we might prefer plotting adjusted means (or least-squares means) using packages like emmeans for more advanced visuals.

Emmeans Example

Code

library(emmeans)

# Fit linear model (ANCOVA) for SkillTest1 with TrainingGroup as a factor and Age as covariate
mod_skill1 <- lm(SkillTest1 ~ TrainingGroup + Age, data = df)

# Use emmeans to get estimated marginal means (adjusted means)
emm_skill1 <- emmeans(mod_skill1, specs = "TrainingGroup")

# Plot the emmeans, including comparisons (i.e., confidence intervals)
plot(emm_skill1, comparison = TRUE)

Summary

MANOVA answers: “Do the training groups differ on the combined outcome of SkillTest1 and SkillTest2?”
If significant, inspect univariate ANOVAs to determine which outcome(s) drive the difference.

MANCOVA adds a covariate (e.g., Age):
Determines whether group differences persist after adjusting for a continuous variable.
Reveals whether Age itself has a significant effect on the combined outcomes.

Post-hoc Tests…

If a group effect is found, consider pairwise comparisons (e.g., Control vs. Aerobic, etc.) for each skill test to see which groups differ significantly.