Exploratory Factor Analysis - Demonstration

B1705, Week Five

Introduction

Introduction to Exploratory Factor Analysis

  • Exploratory Factor Analysis (EFA) helps uncover latent variables (hidden patterns) in data.

Main goals

  • Reduce dimensionality: Summarise many variables into fewer factors.
  • Discover patterns: Identify relationships between variables.

Example

EFA and Principal Components Analysis

What’s the difference?

  • PCA is like sorting LEGO bricks into piles by shape and size to make storage easier, without worrying about why they look that way.

  • Factor Analysis is like figuring out which bricks belong to a castle, spaceship, or race car set—finding hidden themes behind the groupings.

What’s the difference? (2)

  • PCA is about dimensionality reduction - it aims to find a smaller set of uncorrelated components in the data.

  • EFA is a latent variable model that identifies underlying factors explaining relationships between observed variables.

  • PCA does not assume an underlying causal structure - EFA assumes latent factors cause the observed variables.

  • PCA looks at all the information in the data, including unique patterns and noise, while EFA focuses only on the common patterns shared across multiple variables, ignoring random noise and individual differences.

Principal Components Analysis - Demonstration

Importance of components:
                         PC1    PC2    PC3     PC4
Standard deviation     1.572 0.8929 0.7361 0.43443
Proportion of Variance 0.618 0.1993 0.1355 0.04718
Cumulative Proportion  0.618 0.8173 0.9528 1.00000

Key Concepts in EFA

Correlation Matrix

Shows relationships between variables; basis for identifying factors.

Factors

Represent latent variables explaining observed data patterns.

Factor Loadings

Measure how strongly each variable contributes to a factor.

Rotation

Simplifies interpretation of factors (Varimax = uncorrelated; Oblimin = correlated).

Residuals

Differences between observed and model-predicted correlations.

Variance Explained

Indicates how much of the data variability is captured by the factors.

Dataset

# A tibble: 6 × 10
  Sprint_Speed Endurance Strength Dribbling Passing Shooting Focus Confidence
         <dbl>     <dbl>    <dbl>     <dbl>   <dbl>    <dbl> <dbl>      <dbl>
1         9.05      9.86     9.47      7.32    7.56     7.79  5          5   
2         6.53      5.63     6.52      7.46    7.41     7.62  5.26       5.54
3         7.56      7.58     7.54      5.29    5        5     6.20       6.56
4         8.32      7.98     8.00      8.64    8.09     8.12  6.93       7.21
5         7.53      7.26     7.32      5       5        5     5          5   
6         6.81      6.24     6.76      6.91    5.87     5.97  5          5   
# ℹ 2 more variables: Stress_Tolerance <dbl>, Teamwork <dbl>

Exploring Relationships

Correlation Matrix

Code
# Correlation Matrix
library(corrplot)

corr_matrix <- cor(data)
corrplot(corr_matrix, method = "ellipse", type = "upper",
         title = "Correlation Matrix", tl.cex = 0.8, addCoef.col = "black")

Exploratory Factor Analysis

Performing EFA

Code
# Performing EFA
library(psych)

# Extract factors using Principal Axis Factoring (PAF)
fa_result <- fa(data, nfactors = 2, rotate = "none")

Initial Results

Code
print(fa_result)
Factor Analysis using method =  minres
Call: fa(r = data, nfactors = 2, rotate = "none")
Standardized loadings (pattern matrix) based upon correlation matrix
                  MR1   MR2    h2    u2 com
Sprint_Speed     0.14  0.94 0.901 0.099 1.0
Endurance        0.19  0.93 0.899 0.101 1.1
Strength         0.20  0.94 0.929 0.071 1.1
Dribbling        0.13 -0.12 0.031 0.969 2.0
Passing          0.18 -0.10 0.041 0.959 1.6
Shooting         0.13 -0.10 0.029 0.971 1.9
Focus            0.93 -0.11 0.869 0.131 1.0
Confidence       0.92 -0.12 0.859 0.141 1.0
Stress_Tolerance 0.83 -0.15 0.716 0.284 1.1
Teamwork         0.91 -0.13 0.850 0.150 1.0

                       MR1  MR2
SS loadings           3.39 2.73
Proportion Var        0.34 0.27
Cumulative Var        0.34 0.61
Proportion Explained  0.55 0.45
Cumulative Proportion 0.55 1.00

Mean item complexity =  1.3
Test of the hypothesis that 2 factors are sufficient.

df null model =  45  with the objective function =  11.47 with Chi Square =  1087.42
df of  the model are 26  and the objective function was  3.26 

The root mean square of the residuals (RMSR) is  0.22 
The df corrected root mean square of the residuals is  0.29 

The harmonic n.obs is  100 with the empirical chi square  437.9  with prob <  2.2e-76 
The total n.obs was  100  with Likelihood Chi Square =  304.49  with prob <  2.7e-49 

Tucker Lewis Index of factoring reliability =  0.531
RMSEA index =  0.327  and the 90 % confidence intervals are  0.296 0.363
BIC =  184.76
Fit based upon off diagonal values = 0.76
Measures of factor score adequacy             
                                                   MR1  MR2
Correlation of (regression) scores with factors   0.98 0.98
Multiple R square of scores with factors          0.95 0.97
Minimum correlation of possible factor scores     0.91 0.93

Scree Plot

Visualises eigenvalues to determine the number of factors to retain.

Look for the “elbow point” where the eigenvalues drop off significantly.

Code
# Scree Plot

eigenvalues <- eigen(cor(data))$values  # Calculate eigenvalues from the correlation matrix
plot(
  1:length(eigenvalues),
  eigenvalues,
  type = "b",
  main = "Scree Plot",
  xlab = "Factor Number",
  ylab = "Eigenvalue",
  pch = 19,
  col = "blue"
)
abline(h = 1, col = "red", lty = 2)  # Kaiser criterion line at eigenvalue = 1

Deciding on the Number of Factors

Based on the scree plot, retain factors above the “elbow point” where the eigenvalues drop significantly. Kaiser Criterion: Retain factors with eigenvalues > 1. In this dataset, three factors are suggested.

Code
# Performing EFA

library(psych)

# Extract factors using Principal Axis Factoring (PAF)

fa_result <- fa(data, nfactors = 3, rotate = "none")

Factor Rotation

Factor Rotation (Varimax)

Varimax Rotation assumes factors are uncorrelated, maximising clarity of factor loadings.

Code
# Varimax Rotation
fa_varimax <- fa(data, nfactors = 3, rotate = "varimax")
print(fa_varimax)
Factor Analysis using method =  minres
Call: fa(r = data, nfactors = 3, rotate = "varimax")
Standardized loadings (pattern matrix) based upon correlation matrix
                   MR1   MR2   MR3   h2    u2 com
Sprint_Speed     -0.02  0.96  0.01 0.93 0.073   1
Endurance         0.03  0.95 -0.01 0.90 0.097   1
Strength          0.05  0.96 -0.04 0.92 0.077   1
Dribbling         0.02 -0.03  0.93 0.86 0.138   1
Passing           0.07  0.01  0.95 0.91 0.092   1
Shooting          0.02 -0.01  0.91 0.83 0.165   1
Focus             0.93  0.05  0.07 0.87 0.132   1
Confidence        0.94  0.03  0.03 0.88 0.123   1
Stress_Tolerance  0.85 -0.02  0.03 0.73 0.271   1
Teamwork          0.94  0.02  0.02 0.87 0.125   1

                       MR1  MR2  MR3
SS loadings           3.35 2.75 2.61
Proportion Var        0.33 0.28 0.26
Cumulative Var        0.33 0.61 0.87
Proportion Explained  0.38 0.32 0.30
Cumulative Proportion 0.38 0.70 1.00

Mean item complexity =  1
Test of the hypothesis that 3 factors are sufficient.

df null model =  45  with the objective function =  11.47 with Chi Square =  1087.42
df of  the model are 18  and the objective function was  0.12 

The root mean square of the residuals (RMSR) is  0.01 
The df corrected root mean square of the residuals is  0.01 

The harmonic n.obs is  100 with the empirical chi square  0.41  with prob <  1 
The total n.obs was  100  with Likelihood Chi Square =  11.16  with prob <  0.89 

Tucker Lewis Index of factoring reliability =  1.017
RMSEA index =  0  and the 90 % confidence intervals are  0 0.043
BIC =  -71.73
Fit based upon off diagonal values = 1
Measures of factor score adequacy             
                                                   MR1  MR2  MR3
Correlation of (regression) scores with factors   0.98 0.99 0.98
Multiple R square of scores with factors          0.96 0.97 0.95
Minimum correlation of possible factor scores     0.92 0.94 0.91

Visualising Loadings

Code
loadings <- as.data.frame(fa_varimax$loadings[])
colnames(loadings) <- c("Factor1", "Factor2", "Factor3")
loadings$Variable <- rownames(loadings)

library(reshape2)
loadings_long <- melt(loadings, id.vars = "Variable")

Visualising Loadings

Code
library(ggplot2)

ggplot(loadings_long, aes(x = value, y = Variable, fill = variable)) +
  geom_bar(stat = "identity") +
  facet_wrap(~variable, scales = "free_y") +
  labs(title = "Factor Loadings (Varimax)", x = "Loading", y = "Variable", fill = "Factor") +
  theme_minimal() +
  theme(axis.text.y = element_text(size = 10))

Visualising Loadings

Code
ggplot(loadings_long, aes(x = variable, y = Variable, fill = value)) +
  geom_tile() +
  scale_fill_gradient2(low = "blue", mid = "white", high = "red", midpoint = 0) +
  labs(title = "Factor Loadings Heatmap", x = "Factor", y = "Variable", fill = "Loading") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Factor Rotation (Oblimin)

Oblimin Rotation allows factors to be correlated, useful when latent variables overlap.

Code
# Oblimin Rotation
fa_oblimin <- fa(data, nfactors = 3, rotate = "oblimin")
print(fa_oblimin)
Factor Analysis using method =  minres
Call: fa(r = data, nfactors = 3, rotate = "oblimin")
Standardized loadings (pattern matrix) based upon correlation matrix
                   MR1   MR2   MR3   h2    u2 com
Sprint_Speed     -0.04  0.96  0.03 0.93 0.073   1
Endurance         0.01  0.95  0.00 0.90 0.097   1
Strength          0.03  0.96 -0.03 0.92 0.077   1
Dribbling        -0.02 -0.02  0.93 0.86 0.138   1
Passing           0.03  0.02  0.95 0.91 0.092   1
Shooting         -0.01  0.00  0.91 0.83 0.165   1
Focus             0.93  0.03  0.03 0.87 0.132   1
Confidence        0.94  0.01 -0.01 0.88 0.123   1
Stress_Tolerance  0.86 -0.04 -0.01 0.73 0.271   1
Teamwork          0.94 -0.01 -0.02 0.87 0.125   1

                       MR1  MR2  MR3
SS loadings           3.35 2.75 2.60
Proportion Var        0.34 0.28 0.26
Cumulative Var        0.34 0.61 0.87
Proportion Explained  0.38 0.32 0.30
Cumulative Proportion 0.38 0.70 1.00

 With factor correlations of 
     MR1   MR2   MR3
MR1 1.00  0.04  0.08
MR2 0.04  1.00 -0.03
MR3 0.08 -0.03  1.00

Mean item complexity =  1
Test of the hypothesis that 3 factors are sufficient.

df null model =  45  with the objective function =  11.47 with Chi Square =  1087.42
df of  the model are 18  and the objective function was  0.12 

The root mean square of the residuals (RMSR) is  0.01 
The df corrected root mean square of the residuals is  0.01 

The harmonic n.obs is  100 with the empirical chi square  0.41  with prob <  1 
The total n.obs was  100  with Likelihood Chi Square =  11.16  with prob <  0.89 

Tucker Lewis Index of factoring reliability =  1.017
RMSEA index =  0  and the 90 % confidence intervals are  0 0.043
BIC =  -71.73
Fit based upon off diagonal values = 1
Measures of factor score adequacy             
                                                   MR1  MR2  MR3
Correlation of (regression) scores with factors   0.98 0.99 0.98
Multiple R square of scores with factors          0.96 0.97 0.95
Minimum correlation of possible factor scores     0.92 0.94 0.91

Visualising Loadings

Code
loadings_oblimin <- as.data.frame(fa_oblimin$loadings[])
colnames(loadings_oblimin) <- c("Factor1", "Factor2", "Factor3")
loadings_oblimin$Variable <- rownames(loadings_oblimin)
loadings_long_oblimin <- melt(loadings_oblimin, id.vars = "Variable")

ggplot(loadings_long_oblimin, aes(y = Variable, x = value, fill = variable)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Factor Loadings (Oblimin)", x = "Loading", y = "Variable", fill = "Factor") +
  theme_minimal() +
  theme(axis.text.y = element_text(size = 10))  # Adjust font size for readability

Visualising Loadings

Code
ggplot(loadings_long_oblimin, aes(x = value, y = Variable, fill = variable)) +
  geom_bar(stat = "identity") +
  facet_wrap(~variable, scales = "free_y") +
  labs(title = "Factor Loadings (Oblimin)", x = "Loading", y = "Variable", fill = "Factor") +
  theme_minimal() +
  theme(axis.text.y = element_text(size = 10))  # Adjust for readability

Exploring the Model

Residual Analysis

Residuals represent unexplained correlations after extracting factors.

Goal: Minimise residuals for a better model fit.

Code
# Residual Correlation Matrix
residuals <- fa_varimax$residual
corrplot(residuals, method = "color", type = "lower",
         title = "Residual Correlation Matrix", tl.cex = 0.8)

Variance Explained

Measures how much of the data variability is captured by the factors.

Aim for cumulative variance explained ≥60%.

Code
# Variance Explained by Factors
variance_data <- data.frame(
  Factor = 1:length(fa_varimax$Vaccounted[1, ]),
  Variance_Explained = fa_varimax$Vaccounted[2, ] * 100
)

ggplot(variance_data, aes(x = Factor, y = Variance_Explained)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  labs(title = "Variance Explained by Factors", x = "Factor", y = "Percentage of Variance Explained") +
  theme_minimal()

Interpreting Factors

Factor Loadings

High loadings (≥0.40) indicate strong relationships between variables and factors.

Each variable ideally loads strongly on one factor.

Cross-Loadings

Variables loading on multiple factors complicate interpretation. Address by refining the model or removing problematic variables.

Code
# Interpreting the factors
print(fa_varimax)
Factor Analysis using method =  minres
Call: fa(r = data, nfactors = 3, rotate = "varimax")
Standardized loadings (pattern matrix) based upon correlation matrix
                   MR1   MR2   MR3   h2    u2 com
Sprint_Speed     -0.02  0.96  0.01 0.93 0.073   1
Endurance         0.03  0.95 -0.01 0.90 0.097   1
Strength          0.05  0.96 -0.04 0.92 0.077   1
Dribbling         0.02 -0.03  0.93 0.86 0.138   1
Passing           0.07  0.01  0.95 0.91 0.092   1
Shooting          0.02 -0.01  0.91 0.83 0.165   1
Focus             0.93  0.05  0.07 0.87 0.132   1
Confidence        0.94  0.03  0.03 0.88 0.123   1
Stress_Tolerance  0.85 -0.02  0.03 0.73 0.271   1
Teamwork          0.94  0.02  0.02 0.87 0.125   1

                       MR1  MR2  MR3
SS loadings           3.35 2.75 2.61
Proportion Var        0.33 0.28 0.26
Cumulative Var        0.33 0.61 0.87
Proportion Explained  0.38 0.32 0.30
Cumulative Proportion 0.38 0.70 1.00

Mean item complexity =  1
Test of the hypothesis that 3 factors are sufficient.

df null model =  45  with the objective function =  11.47 with Chi Square =  1087.42
df of  the model are 18  and the objective function was  0.12 

The root mean square of the residuals (RMSR) is  0.01 
The df corrected root mean square of the residuals is  0.01 

The harmonic n.obs is  100 with the empirical chi square  0.41  with prob <  1 
The total n.obs was  100  with Likelihood Chi Square =  11.16  with prob <  0.89 

Tucker Lewis Index of factoring reliability =  1.017
RMSEA index =  0  and the 90 % confidence intervals are  0 0.043
BIC =  -71.73
Fit based upon off diagonal values = 1
Measures of factor score adequacy             
                                                   MR1  MR2  MR3
Correlation of (regression) scores with factors   0.98 0.99 0.98
Multiple R square of scores with factors          0.96 0.97 0.95
Minimum correlation of possible factor scores     0.92 0.94 0.91

h2 (Communalities)

The proportion of variance in each observed variable that is explained by the retained factors.

A high h2 value (close to 1) means the variable is well-explained by the factors.

A low h2 value indicates the variable is not well-explained by the factors, and it might not fit well in the factor model.

If a variable has a communalities value of 0.85, it means 85% of its variance is accounted for by the extracted factors.

u2 (Uniquenesses)

The proportion of variance in each observed variable that is not explained by the retained factors.

A high u2 value (close to 1) means much of the variable’s variance is unique and not shared with other variables through the factors.

A low u2 value indicates that most of the variable’s variance is explained by the factors.

h2 and u2

Relationship: h2 + u2 = 1 for each variable.

Communalities (h2) are useful for checking whether variables contribute meaningfully to the factor structure.

Variables with very low h2 values might need to be excluded or re-examined.

Uniquenesses (u2) help assess how much variance in a variable remains unexplained by the factor model.

Summary

Exploratory Factor Analysis (EFA) helps reduce data complexity.

Key steps:

  1. Get a dataset and calculate the correlation matrix.

  2. Extract factors using Principal Axis Factoring.

  3. Perform rotations (Varimax/Oblimin) to improve interpretability.

  4. Analyse residuals and explained variance.