Show solution code
library(tidyverse)
soccer_data <- read_csv("https://www.dropbox.com/scl/fi/gr913s3yifeoxkw6rr4wj/soccer_performance_data.csv?rlkey=nsyojhmg3ork1hhbzttjug180&dl=1")
head(soccer_data)
str(soccer_data)Familiarise yourself with the dataset by loading it into R and confirming it is appropriate for factor analysis.
The dataset is available here:
https://www.dropbox.com/scl/fi/gr913s3yifeoxkw6rr4wj/soccer_performance_data.csv?rlkey=nsyojhmg3ork1hhbzttjug180&dl=1
library(tidyverse) and read_csv() to load the file.head().str().library(tidyverse)
soccer_data <- read_csv("https://www.dropbox.com/scl/fi/gr913s3yifeoxkw6rr4wj/soccer_performance_data.csv?rlkey=nsyojhmg3ork1hhbzttjug180&dl=1")
head(soccer_data)
str(soccer_data)Get an overview of your dataset to understand distribution and relationships among variables.
summary() or describe() (from the psych package).cor().library(corrplot)) to visualise the correlation matrix.library(psych)
summary(soccer_data)
corr_matrix <- cor(soccer_data)
library(corrplot)
corrplot(corr_matrix, method = "ellipse", type = "upper",
title = "Correlation Matrix", tl.cex = 0.8, addCoef.col = "black")Decide on an initial guess for the number of factors to extract by examining eigenvalues.
eigen() on the correlation matrix to obtain eigenvalues.eigenvalues <- eigen(corr_matrix)$values
plot(1:length(eigenvalues), eigenvalues,
type = "b", main = "Scree Plot",
xlab = "Factor Number", ylab = "Eigenvalue",
pch = 19, col = "blue")
abline(h = 1, col = "red", lty = 2)Extract factors using Principal Axis Factoring (PAF) without rotation to see the initial loadings.
fa() from psych with nfactors (e.g., 3) and rotate = “none”.fa_unrotated <- fa(soccer_data, nfactors = 3, rotate = "none")
print(fa_unrotated)Improve factor interpretability under the assumption of uncorrelated factors.
fa() again, this time with rotate = “varimax”.fa_varimax <- fa(soccer_data, nfactors = 3, rotate = "varimax")
print(fa_varimax)Use graphics to more easily see how variables load on each factor.
fa_varimax$loadings[]).ggplot2 to create bar charts or heatmaps of factor loadings.library(ggplot2)
loadings_varimax <- as.data.frame(fa_varimax$loadings[])
colnames(loadings_varimax) <- c("Factor1", "Factor2", "Factor3")
loadings_varimax$Variable <- rownames(loadings_varimax)
ggplot(loadings_varimax, aes(x = Factor1, y = Variable)) +
geom_bar(stat = "identity", fill = "steelblue") +
labs(title = "Factor1 Loadings (Varimax)", x = "Loading", y = "Variable") +
theme_minimal()Evaluate how well the factor model fits by looking at the residual correlations.
fa_varimax$residual).corrplot(). Look for large residuals, indicating potential room for model improvement.residuals_matrix <- fa_varimax$residual
corrplot(residuals_matrix, method = "color", type = "lower",
title = "Residual Correlation Matrix", tl.cex = 0.8)Determine how much of the total variance in the dataset is captured by the extracted factors.
Vaccounted element of the EFA object (e.g., fa_varimax$Vaccounted).variance_explained <- fa_varimax$Vaccounted
print(variance_explained)
# This includes proportion of variance for each factor and cumulative variance.Relate the final factor structure back to your theoretical constructs in football performance.
print(fa_varimax$loadings)