Reduce dimensionality: Summarise many variables into fewer factors.
Discover patterns: Identify relationships between variables.
Example
EFA and Principal Components Analysis
What’s the difference?
PCA is like sorting LEGO bricks into piles by shape and size to make storage easier, without worrying about why they look that way.
Factor Analysis is like figuring out which bricks belong to a castle, spaceship, or race car set—finding hidden themes behind the groupings.
What’s the difference? (2)
PCA is about dimensionality reduction - it aims to find a smaller set of uncorrelated components in the data.
EFA is a latent variable model that identifies underlying factors explaining relationships between observed variables.
PCA does not assume an underlying causal structure - EFA assumes latent factors cause the observed variables.
PCA looks at all the information in the data, including unique patterns and noise, while EFA focuses only on the common patterns shared across multiple variables, ignoring random noise and individual differences.
Principal Components Analysis - Demonstration
Importance of components:
PC1 PC2 PC3 PC4
Standard deviation 1.572 0.8929 0.7361 0.43443
Proportion of Variance 0.618 0.1993 0.1355 0.04718
Cumulative Proportion 0.618 0.8173 0.9528 1.00000
Key Concepts in EFA
Correlation Matrix
Shows relationships between variables; basis for identifying factors.
Factors
Represent latent variables explaining observed data patterns.
Factor Loadings
Measure how strongly each variable contributes to a factor.
Rotation
Simplifies interpretation of factors (Varimax = uncorrelated; Oblimin = correlated).
Residuals
Differences between observed and model-predicted correlations.
Variance Explained
Indicates how much of the data variability is captured by the factors.
# Performing EFAlibrary(psych)# Extract factors using Principal Axis Factoring (PAF)fa_result <-fa(data, nfactors =2, rotate ="none")
Initial Results
Code
print(fa_result)
Factor Analysis using method = minres
Call: fa(r = data, nfactors = 2, rotate = "none")
Standardized loadings (pattern matrix) based upon correlation matrix
MR1 MR2 h2 u2 com
Sprint_Speed 0.14 0.94 0.901 0.099 1.0
Endurance 0.19 0.93 0.899 0.101 1.1
Strength 0.20 0.94 0.929 0.071 1.1
Dribbling 0.13 -0.12 0.031 0.969 2.0
Passing 0.18 -0.10 0.041 0.959 1.6
Shooting 0.13 -0.10 0.029 0.971 1.9
Focus 0.93 -0.11 0.869 0.131 1.0
Confidence 0.92 -0.12 0.859 0.141 1.0
Stress_Tolerance 0.83 -0.15 0.716 0.284 1.1
Teamwork 0.91 -0.13 0.850 0.150 1.0
MR1 MR2
SS loadings 3.39 2.73
Proportion Var 0.34 0.27
Cumulative Var 0.34 0.61
Proportion Explained 0.55 0.45
Cumulative Proportion 0.55 1.00
Mean item complexity = 1.3
Test of the hypothesis that 2 factors are sufficient.
df null model = 45 with the objective function = 11.47 with Chi Square = 1087.42
df of the model are 26 and the objective function was 3.26
The root mean square of the residuals (RMSR) is 0.22
The df corrected root mean square of the residuals is 0.29
The harmonic n.obs is 100 with the empirical chi square 437.9 with prob < 2.2e-76
The total n.obs was 100 with Likelihood Chi Square = 304.49 with prob < 2.7e-49
Tucker Lewis Index of factoring reliability = 0.531
RMSEA index = 0.327 and the 90 % confidence intervals are 0.296 0.363
BIC = 184.76
Fit based upon off diagonal values = 0.76
Measures of factor score adequacy
MR1 MR2
Correlation of (regression) scores with factors 0.98 0.98
Multiple R square of scores with factors 0.95 0.97
Minimum correlation of possible factor scores 0.91 0.93
Scree Plot
Visualises eigenvalues to determine the number of factors to retain.
Look for the “elbow point” where the eigenvalues drop off significantly.
Code
# Scree Ploteigenvalues <-eigen(cor(data))$values # Calculate eigenvalues from the correlation matrixplot(1:length(eigenvalues), eigenvalues,type ="b",main ="Scree Plot",xlab ="Factor Number",ylab ="Eigenvalue",pch =19,col ="blue")abline(h =1, col ="red", lty =2) # Kaiser criterion line at eigenvalue = 1
Deciding on the Number of Factors
Based on the scree plot, retain factors above the “elbow point” where the eigenvalues drop significantly. Kaiser Criterion: Retain factors with eigenvalues > 1. In this dataset, three factors are suggested.
Code
# Performing EFAlibrary(psych)# Extract factors using Principal Axis Factoring (PAF)fa_result <-fa(data, nfactors =3, rotate ="none")
Factor Rotation
Factor Rotation (Varimax)
Varimax Rotation assumes factors are uncorrelated, maximising clarity of factor loadings.
Factor Analysis using method = minres
Call: fa(r = data, nfactors = 3, rotate = "varimax")
Standardized loadings (pattern matrix) based upon correlation matrix
MR1 MR2 MR3 h2 u2 com
Sprint_Speed -0.02 0.96 0.01 0.93 0.073 1
Endurance 0.03 0.95 -0.01 0.90 0.097 1
Strength 0.05 0.96 -0.04 0.92 0.077 1
Dribbling 0.02 -0.03 0.93 0.86 0.138 1
Passing 0.07 0.01 0.95 0.91 0.092 1
Shooting 0.02 -0.01 0.91 0.83 0.165 1
Focus 0.93 0.05 0.07 0.87 0.132 1
Confidence 0.94 0.03 0.03 0.88 0.123 1
Stress_Tolerance 0.85 -0.02 0.03 0.73 0.271 1
Teamwork 0.94 0.02 0.02 0.87 0.125 1
MR1 MR2 MR3
SS loadings 3.35 2.75 2.61
Proportion Var 0.33 0.28 0.26
Cumulative Var 0.33 0.61 0.87
Proportion Explained 0.38 0.32 0.30
Cumulative Proportion 0.38 0.70 1.00
Mean item complexity = 1
Test of the hypothesis that 3 factors are sufficient.
df null model = 45 with the objective function = 11.47 with Chi Square = 1087.42
df of the model are 18 and the objective function was 0.12
The root mean square of the residuals (RMSR) is 0.01
The df corrected root mean square of the residuals is 0.01
The harmonic n.obs is 100 with the empirical chi square 0.41 with prob < 1
The total n.obs was 100 with Likelihood Chi Square = 11.16 with prob < 0.89
Tucker Lewis Index of factoring reliability = 1.017
RMSEA index = 0 and the 90 % confidence intervals are 0 0.043
BIC = -71.73
Fit based upon off diagonal values = 1
Measures of factor score adequacy
MR1 MR2 MR3
Correlation of (regression) scores with factors 0.98 0.99 0.98
Multiple R square of scores with factors 0.96 0.97 0.95
Minimum correlation of possible factor scores 0.92 0.94 0.91
Factor Analysis using method = minres
Call: fa(r = data, nfactors = 3, rotate = "oblimin")
Standardized loadings (pattern matrix) based upon correlation matrix
MR1 MR2 MR3 h2 u2 com
Sprint_Speed -0.04 0.96 0.03 0.93 0.073 1
Endurance 0.01 0.95 0.00 0.90 0.097 1
Strength 0.03 0.96 -0.03 0.92 0.077 1
Dribbling -0.02 -0.02 0.93 0.86 0.138 1
Passing 0.03 0.02 0.95 0.91 0.092 1
Shooting -0.01 0.00 0.91 0.83 0.165 1
Focus 0.93 0.03 0.03 0.87 0.132 1
Confidence 0.94 0.01 -0.01 0.88 0.123 1
Stress_Tolerance 0.86 -0.04 -0.01 0.73 0.271 1
Teamwork 0.94 -0.01 -0.02 0.87 0.125 1
MR1 MR2 MR3
SS loadings 3.35 2.75 2.60
Proportion Var 0.34 0.28 0.26
Cumulative Var 0.34 0.61 0.87
Proportion Explained 0.38 0.32 0.30
Cumulative Proportion 0.38 0.70 1.00
With factor correlations of
MR1 MR2 MR3
MR1 1.00 0.04 0.08
MR2 0.04 1.00 -0.03
MR3 0.08 -0.03 1.00
Mean item complexity = 1
Test of the hypothesis that 3 factors are sufficient.
df null model = 45 with the objective function = 11.47 with Chi Square = 1087.42
df of the model are 18 and the objective function was 0.12
The root mean square of the residuals (RMSR) is 0.01
The df corrected root mean square of the residuals is 0.01
The harmonic n.obs is 100 with the empirical chi square 0.41 with prob < 1
The total n.obs was 100 with Likelihood Chi Square = 11.16 with prob < 0.89
Tucker Lewis Index of factoring reliability = 1.017
RMSEA index = 0 and the 90 % confidence intervals are 0 0.043
BIC = -71.73
Fit based upon off diagonal values = 1
Measures of factor score adequacy
MR1 MR2 MR3
Correlation of (regression) scores with factors 0.98 0.99 0.98
Multiple R square of scores with factors 0.96 0.97 0.95
Minimum correlation of possible factor scores 0.92 0.94 0.91
Visualising Loadings
Code
loadings_oblimin <-as.data.frame(fa_oblimin$loadings[])colnames(loadings_oblimin) <-c("Factor1", "Factor2", "Factor3")loadings_oblimin$Variable <-rownames(loadings_oblimin)loadings_long_oblimin <-melt(loadings_oblimin, id.vars ="Variable")ggplot(loadings_long_oblimin, aes(y = Variable, x = value, fill = variable)) +geom_bar(stat ="identity", position ="dodge") +labs(title ="Factor Loadings (Oblimin)", x ="Loading", y ="Variable", fill ="Factor") +theme_minimal() +theme(axis.text.y =element_text(size =10)) # Adjust font size for readability
Visualising Loadings
Code
ggplot(loadings_long_oblimin, aes(x = value, y = Variable, fill = variable)) +geom_bar(stat ="identity") +facet_wrap(~variable, scales ="free_y") +labs(title ="Factor Loadings (Oblimin)", x ="Loading", y ="Variable", fill ="Factor") +theme_minimal() +theme(axis.text.y =element_text(size =10)) # Adjust for readability
Exploring the Model
Residual Analysis
Residuals represent unexplained correlations after extracting factors.
Measures how much of the data variability is captured by the factors.
Aim for cumulative variance explained ≥60%.
Code
# Variance Explained by Factorsvariance_data <-data.frame(Factor =1:length(fa_varimax$Vaccounted[1, ]),Variance_Explained = fa_varimax$Vaccounted[2, ] *100)ggplot(variance_data, aes(x = Factor, y = Variance_Explained)) +geom_bar(stat ="identity", fill ="skyblue") +labs(title ="Variance Explained by Factors", x ="Factor", y ="Percentage of Variance Explained") +theme_minimal()
Interpreting Factors
Factor Loadings
High loadings (≥0.40) indicate strong relationships between variables and factors.
Each variable ideally loads strongly on one factor.
Cross-Loadings
Variables loading on multiple factors complicate interpretation. Address by refining the model or removing problematic variables.
Code
# Interpreting the factorsprint(fa_varimax)
Factor Analysis using method = minres
Call: fa(r = data, nfactors = 3, rotate = "varimax")
Standardized loadings (pattern matrix) based upon correlation matrix
MR1 MR2 MR3 h2 u2 com
Sprint_Speed -0.02 0.96 0.01 0.93 0.073 1
Endurance 0.03 0.95 -0.01 0.90 0.097 1
Strength 0.05 0.96 -0.04 0.92 0.077 1
Dribbling 0.02 -0.03 0.93 0.86 0.138 1
Passing 0.07 0.01 0.95 0.91 0.092 1
Shooting 0.02 -0.01 0.91 0.83 0.165 1
Focus 0.93 0.05 0.07 0.87 0.132 1
Confidence 0.94 0.03 0.03 0.88 0.123 1
Stress_Tolerance 0.85 -0.02 0.03 0.73 0.271 1
Teamwork 0.94 0.02 0.02 0.87 0.125 1
MR1 MR2 MR3
SS loadings 3.35 2.75 2.61
Proportion Var 0.33 0.28 0.26
Cumulative Var 0.33 0.61 0.87
Proportion Explained 0.38 0.32 0.30
Cumulative Proportion 0.38 0.70 1.00
Mean item complexity = 1
Test of the hypothesis that 3 factors are sufficient.
df null model = 45 with the objective function = 11.47 with Chi Square = 1087.42
df of the model are 18 and the objective function was 0.12
The root mean square of the residuals (RMSR) is 0.01
The df corrected root mean square of the residuals is 0.01
The harmonic n.obs is 100 with the empirical chi square 0.41 with prob < 1
The total n.obs was 100 with Likelihood Chi Square = 11.16 with prob < 0.89
Tucker Lewis Index of factoring reliability = 1.017
RMSEA index = 0 and the 90 % confidence intervals are 0 0.043
BIC = -71.73
Fit based upon off diagonal values = 1
Measures of factor score adequacy
MR1 MR2 MR3
Correlation of (regression) scores with factors 0.98 0.99 0.98
Multiple R square of scores with factors 0.96 0.97 0.95
Minimum correlation of possible factor scores 0.92 0.94 0.91
h2 (Communalities)
The proportion of variance in each observed variable that is explained by the retained factors.
A high h2 value (close to 1) means the variable is well-explained by the factors.
A low h2 value indicates the variable is not well-explained by the factors, and it might not fit well in the factor model.
If a variable has a communalities value of 0.85, it means 85% of its variance is accounted for by the extracted factors.
u2 (Uniquenesses)
The proportion of variance in each observed variable that is not explained by the retained factors.
A high u2 value (close to 1) means much of the variable’s variance is unique and not shared with other variables through the factors.
A low u2 value indicates that most of the variable’s variance is explained by the factors.
h2 and u2
Relationship: h2 + u2 = 1 for each variable.
Communalities (h2) are useful for checking whether variables contribute meaningfully to the factor structure.
Variables with very low h2 values might need to be excluded or re-examined.
Uniquenesses (u2) help assess how much variance in a variable remains unexplained by the factor model.
Summary
Exploratory Factor Analysis (EFA) helps reduce data complexity.
Key steps:
Get a dataset and calculate the correlation matrix.
Extract factors using Principal Axis Factoring.
Perform rotations (Varimax/Oblimin) to improve interpretability.