Exploratory Factor Analysis - Practical

Task 1: Load the Dataset

Objective

Familiarise yourself with the dataset by loading it into R and confirming it is appropriate for factor analysis.

The dataset is available here:

https://www.dropbox.com/scl/fi/gr913s3yifeoxkw6rr4wj/soccer_performance_data.csv?rlkey=nsyojhmg3ork1hhbzttjug180&dl=1

Tasks

  • Use library(tidyverse) and read_csv() to load the file.
  • Inspect the first few rows of the dataset with head().
  • Check the structure and data types of each variable using str().
Show solution code
library(tidyverse)

soccer_data <- read_csv("https://www.dropbox.com/scl/fi/gr913s3yifeoxkw6rr4wj/soccer_performance_data.csv?rlkey=nsyojhmg3ork1hhbzttjug180&dl=1")

head(soccer_data)
str(soccer_data)

Reflective Questions / Observations

  • Are all variables numeric, and do they have plausible ranges (5 to 10)?
  • Are there any obvious missing values or errors?
  • How many rows and columns are there?

Task 2: Examine Basic Descriptives and Correlations

Objective

Get an overview of your dataset to understand distribution and relationships among variables.

Tasks

  • Compute summary statistics for each variable using summary() or describe() (from the psych package).
  • Create a correlation matrix of all variables using cor().
  • Use corrplot (library(corrplot)) to visualise the correlation matrix.
Show solution code
library(psych)
summary(soccer_data)
corr_matrix <- cor(soccer_data)

library(corrplot)
corrplot(corr_matrix, method = "ellipse", type = "upper",
         title = "Correlation Matrix", tl.cex = 0.8, addCoef.col = "black")

Reflective Questions / Observations

  • Which variables show strong correlations?
  • Are there visible clusters of variables that might represent underlying factors?

Task 3: Determine an Initial Number of Factors (Scree Plot)

Objective

Decide on an initial guess for the number of factors to extract by examining eigenvalues.

Tasks

  • Use eigen() on the correlation matrix to obtain eigenvalues.
  • Plot the eigenvalues in a scree plot and add a horizontal line at 1.
  • Observe the Kaiser Criterion (eigenvalues > 1) and the “elbow” in the plot.
Show solution code
eigenvalues <- eigen(corr_matrix)$values

plot(1:length(eigenvalues), eigenvalues,
     type = "b", main = "Scree Plot",
     xlab = "Factor Number", ylab = "Eigenvalue",
     pch = 19, col = "blue")
abline(h = 1, col = "red", lty = 2)

Reflective Questions / Observations

  • How many factors have eigenvalues above 1?
  • Where is the “elbow” of the scree plot?

Task 4: Perform an Unrotated Factor Analysis

Objective

Extract factors using Principal Axis Factoring (PAF) without rotation to see the initial loadings.

Tasks

  • Use fa() from psych with nfactors (e.g., 3) and rotate = “none”.
  • Observe loadings, communality (h2), and uniqueness (u2) in the output.
Show solution code
fa_unrotated <- fa(soccer_data, nfactors = 3, rotate = "none")
print(fa_unrotated)

Reflective Questions / Observations

  • Are the factor loadings easy to interpret at this stage?
  • Do certain variables load strongly on only one factor or multiple factors?

Task 5: Apply a Rotation (Varimax)

Objective

Improve factor interpretability under the assumption of uncorrelated factors.

Tasks

  • Run fa() again, this time with rotate = “varimax”.
  • Compare the factor loadings to the unrotated solution.
Show solution code
fa_varimax <- fa(soccer_data, nfactors = 3, rotate = "varimax")
print(fa_varimax)

Reflective Questions / Observations

  • Which variables show strong loadings (> 0.4) on each factor?
  • Are the factors now more clearly differentiated (e.g., physical, technical, mental)?

Task 6: Visualise Factor Loadings

Objective

Use graphics to more easily see how variables load on each factor.

Tasks

  • Extract the loadings into a data frame (use fa_varimax$loadings[]).
  • Use ggplot2 to create bar charts or heatmaps of factor loadings.
Show solution code
library(ggplot2)

loadings_varimax <- as.data.frame(fa_varimax$loadings[])
colnames(loadings_varimax) <- c("Factor1", "Factor2", "Factor3")
loadings_varimax$Variable <- rownames(loadings_varimax)

ggplot(loadings_varimax, aes(x = Factor1, y = Variable)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  labs(title = "Factor1 Loadings (Varimax)", x = "Loading", y = "Variable") +
  theme_minimal()

Reflective Questions / Observations

  • Which variables have particularly high loadings on each factor?
  • Do you see any variables that load on more than one factor?

Task 7: Try a Correlated Rotation (Oblimin)

Objective

Use Oblimin rotation in cases where you believe factors might be correlated.

Tasks

  • Run fa() with rotate = “oblimin”.
  • Review the loadings and the factor correlation matrix (fa_oblimin$Phi).
Show solution code
fa_oblimin <- fa(soccer_data, nfactors = 3, rotate = "oblimin")
print(fa_oblimin)

# Check factor correlation matrix
fa_oblimin$Phi

Reflective Questions / Observations

  • Are the factors correlated (i.e., do you see any correlations > 0.3 or so)?
  • Does the interpretation differ significantly from Varimax rotation?

Task 8: Assess Model Fit via Residuals

Objective

Evaluate how well the factor model fits by looking at the residual correlations.

Tasks

  • Extract the residual correlation matrix (e.g., fa_varimax$residual).
  • Visualise it with corrplot(). Look for large residuals, indicating potential room for model improvement.
Show solution code
residuals_matrix <- fa_varimax$residual

corrplot(residuals_matrix, method = "color", type = "lower",
         title = "Residual Correlation Matrix", tl.cex = 0.8)

Reflective Questions / Observations

  • Do most residuals appear small and near zero?
  • Are there any variable pairs that consistently show high residuals?

Task 9: Evaluate Variance Explained

Objective

Determine how much of the total variance in the dataset is captured by the extracted factors.

Tasks

  • Print the Vaccounted element of the EFA object (e.g., fa_varimax$Vaccounted).
  • Observe the total (cumulative) variance explained by the factors.
Show solution code
variance_explained <- fa_varimax$Vaccounted
print(variance_explained)
# This includes proportion of variance for each factor and cumulative variance.

Reflective Questions / Observations

  • Is the cumulative variance ≥ 60% or somewhere else?
  • If it is lower than desired, might adding or dropping a factor improve the overall variance explained?

Task 10: Interpret and Summarise the Factors

Objective

Relate the final factor structure back to your theoretical constructs in football performance.

Tasks

  • Identify the strongest loadings for each factor in your final (rotated) model.
  • Assign meaningful names or labels to each factor (e.g., “Physical,” “Technical,” “Mental”).
  • Summarise your findings in a short paragraph.
Show solution code
print(fa_varimax$loadings)

Reflective Questions / Observations

  • Do the factors align with your expectations (e.g., “Physical,” “Technical,” “Mental”)?
  • Are there any variables that do not fit neatly within the three-factor model?