Confirmatory Factor Analysis Practical

Note

You will need to install the package semTools for this practical.

Task 1: Load and Inspect Dataset

Objective

Familiarise yourself with the dataset and ensure it is suitable for CFA.

The dataset is available here: https://www.dropbox.com/scl/fi/bmay3bxmaz8e5qednybqy/golf_performance_data.csv?rlkey=ohrstpeuagkfjqzy18vk36n09&dl=1

Tasks

  • Load the golf_performance_data.csv file using read_csv().

  • Examine the first few rows (head()) and the structure (str()).

  • Check for missing or problematic data (e.g., using summary()).

Show solution code
library(tidyverse)

golf_data <- read_csv("https://www.dropbox.com/scl/fi/bmay3bxmaz8e5qednybqy/golf_performance_data.csv?rlkey=ohrstpeuagkfjqzy18vk36n09&dl=1")

head(golf_data)
str(golf_data)
summary(golf_data)

Reflective Questions / Observations

  • Do the variables appear numeric and within the [1, 7] range?
  • Are there any missing values or anomalies?

Task 2: Specify the Theoretical CFA Model

Objective

Define a theoretical model linking observed variables to latent factors (Technical, Mental, Physical), as discussed previously.

Tasks

Decide which indicators belong to each factor. (From our dataset, we hypothesise:

  • Technical: Swing_Technique, Putting_Skill, Drive_Accuracy

  • Mental: Concentration, Confidence, Anxiety_Management

  • Physical: Flexibility, Stamina, Strength

Write down the measurement model in lavaan syntax. For example:

Show solution code
model <- '
  Technical =~ Swing_Technique + Putting_Skill + Drive_Accuracy
  Mental    =~ Concentration + Confidence + Anxiety_Management
  Physical  =~ Flexibility + Stamina + Strength
'

Reflective Questions / Observations

  • Does this model capture your theoretical expectation of how these golf skills are organised?

  • Are there any cross-loadings or alternative structures you might consider?

Task 3: Fit the CFA Model Using lavaan

Objective

Estimate the model parameters (factor loadings, variances, covariances) and evaluate if the data align with your hypothesised model.

Tasks

  • Install and load the lavaan package (if not already installed).

  • Use cfa(model, data = golf_data, estimator = "ML") (the default estimator is maximum likelihood).

  • Store the fitted model object in a variable, e.g., fit_cfa.

Show solution code
library(lavaan)

fit_cfa <- cfa(model, data = golf_data, estimator = "ML")

summary(fit_cfa, fit.measures = TRUE, standardized = TRUE)

Reflective Questions / Observations

  • Look at the standardised loadings. Are they all sufficiently high (e.g., > 0.5)?

  • Check the fit indices (Chi-Square, RMSEA, CFI, TLI). Does the model fit well?

Task 4: Examine the Model Fit Indices

Objective

Determine how well the hypothesized model aligns with the observed data, based on common CFA fit indices.

Tasks

  • Focus on indices such as Chi-Square, CFI, TLI, RMSEA, and SRMR.
  • Compare them to conventional cut-offs (e.g., RMSEA < 0.06, CFI/TLI > 0.90).
  • Evaluate if your model meets acceptable standards or needs refinement.
Show solution code
fitMeasures(fit_cfa, c("chisq", "df", "pvalue", "cfi", "tli", "rmsea", "srmr"))
# Compare these values to recommended thresholds (e.g., RMSEA < 0.06, CFI/TLI > 0.90).

Reflective Questions / Observations

  • Does the chi-square test suggest a significant mismatch between model and data (p < .05)?
  • Are incremental fit indices (CFI, TLI) within acceptable ranges?
  • Is the RMSEA low enough to indicate a good fit?

Task 5: Inspect Factor Loadings

Objective

Verify that each observed variable loads primarily on its intended factor and that loadings are sufficiently strong.

Tasks

  • From the summary output or by using inspect(fit_cfa, what = "std"), look at the standardized loadings.

  • Identify any loadings below 0.4 or 0.5 (if any).

  • Consider whether any indicators might be poorly measuring their intended construct.

Show solution code
standardised_loadings <- inspect(fit_cfa, what = "std")$lambda
print(standardised_loadings)
# Evaluate whether each indicator has a strong loading on its assigned latent factor.

Reflective Questions / Observations

  • Which variables have the highest loadings?

  • Are there any unexpected cross-loadings (i.e., an indicator loading on more than one factor)?

  • Do the loadings support your theoretical structure?

Task 6: Check Modification Indices (Optional Adjustments)

Objective

Identify potential adjustments to improve model fit, if needed.

Tasks

  • Use modindices(fit_cfa) to view modification indices that suggest additional paths or error covariances.

  • Decide if any modifications make theoretical sense (e.g., correlating errors for similar indicators).

  • Remember to avoid purely data-driven modifications that lack theoretical justification!

Show solution code
mi <- modindices(fit_cfa)
# Sort by largest modification indices
mi_sorted <- mi[order(mi$mi, decreasing = TRUE), ]
head(mi_sorted, 10)

Reflective Questions / Observations

  • Do the modification indices suggest any strong correlations among indicator errors?

  • Would adjusting these improve the model and make theoretical sense?

Task 7: Evaluate Reliability and Validity

Objective

Check if each factor is measured reliably and whether factors are distinct (convergent and discriminant validity).

Tasks

  • Compute Composite Reliability (CR) and Average Variance Extracted (AVE) for each factor.

  • Check discriminant validity by comparing the AVEs to the squared correlations between factors.

  • Consider if any factor lacks sufficient convergent validity (low AVE) or discriminant validity (overlaps too much with another factor).

  • A convenient way is to use helper functions from the semTools package (e.g., reliability() or semTools::compRelSEM), or manually calculate from the factor loadings and error variances.

Show solution code
# Using semTools
library(semTools)

# Evaluate composite reliability, AVE, etc.
cfa_reliability <- reliability(fit_cfa)
print(cfa_reliability)

# Check for each factor: is CR > 0.7? Is AVE > 0.5?

Reflective Questions / Observations

  • Do all factors meet the typical thresholds (CR > 0.7, AVE > 0.5)?

  • If a factor’s AVE is below 0.5, does that suggest a measurement problem?

Task 8: Summarise Findings & Potential Model Revisions

Objective

Provide a clear, concise summary of how well the data support your hypothesised model and what, if any, changes might be needed.

Tasks

  • State how the fit indices inform the overall adequacy of the model.

  • Note any problematic loadings or factors.

  • Mention any potential modifications (error correlations or dropping indicators) that might improve the model while remaining theoretically justifiable.

  • Discuss the practical implications: if this model is measuring golf performance, do the factors align well with how coaches or athletes conceptualise “Technical,” “Mental,” and “Physical” skills?