Confirmatory Factor Analysis Practical

Note

You will need to install the package semTools for this practical.

Task 1: Load and Inspect Dataset

Objective

Familiarise yourself with the dataset and ensure it is suitable for CFA.

The dataset is available here: https://www.dropbox.com/scl/fi/bmay3bxmaz8e5qednybqy/golf_performance_data.csv?rlkey=ohrstpeuagkfjqzy18vk36n09&dl=1

Tasks

Load the golf_performance_data.csv file using read_csv().
Examine the first few rows (head()) and the structure (str()).
Check for missing or problematic data (e.g., using summary()).

Show solution code

library(tidyverse)

golf_data <- read_csv("https://www.dropbox.com/scl/fi/bmay3bxmaz8e5qednybqy/golf_performance_data.csv?rlkey=ohrstpeuagkfjqzy18vk36n09&dl=1")

head(golf_data)
str(golf_data)
summary(golf_data)

Reflective Questions / Observations

Do the variables appear numeric and within the [1, 7] range?
Are there any missing values or anomalies?

Task 2: Specify the Theoretical CFA Model

Objective

Define a theoretical model linking observed variables to latent factors (Technical, Mental, Physical), as discussed previously.

Tasks

Decide which indicators belong to each factor. (From our dataset, we hypothesise:

Technical: Swing_Technique, Putting_Skill, Drive_Accuracy
Mental: Concentration, Confidence, Anxiety_Management
Physical: Flexibility, Stamina, Strength

Write down the measurement model in lavaan syntax. For example:

Show solution code

model <- '
  Technical =~ Swing_Technique + Putting_Skill + Drive_Accuracy
  Mental    =~ Concentration + Confidence + Anxiety_Management
  Physical  =~ Flexibility + Stamina + Strength
'

Reflective Questions / Observations

Does this model capture your theoretical expectation of how these golf skills are organised?
Are there any cross-loadings or alternative structures you might consider?

Task 3: Fit the CFA Model Using lavaan

Objective

Estimate the model parameters (factor loadings, variances, covariances) and evaluate if the data align with your hypothesised model.

Tasks

Install and load the lavaan package (if not already installed).
Use cfa(model, data = golf_data, estimator = "ML") (the default estimator is maximum likelihood).
Store the fitted model object in a variable, e.g., fit_cfa.

Show solution code

library(lavaan)

fit_cfa <- cfa(model, data = golf_data, estimator = "ML")

summary(fit_cfa, fit.measures = TRUE, standardized = TRUE)

Reflective Questions / Observations

Look at the standardised loadings. Are they all sufficiently high (e.g., > 0.5)?
Check the fit indices (Chi-Square, RMSEA, CFI, TLI). Does the model fit well?

Task 4: Examine the Model Fit Indices

Objective

Determine how well the hypothesized model aligns with the observed data, based on common CFA fit indices.

Tasks

Focus on indices such as Chi-Square, CFI, TLI, RMSEA, and SRMR.
Compare them to conventional cut-offs (e.g., RMSEA < 0.06, CFI/TLI > 0.90).
Evaluate if your model meets acceptable standards or needs refinement.

Show solution code

fitMeasures(fit_cfa, c("chisq", "df", "pvalue", "cfi", "tli", "rmsea", "srmr"))
# Compare these values to recommended thresholds (e.g., RMSEA < 0.06, CFI/TLI > 0.90).

Reflective Questions / Observations

Does the chi-square test suggest a significant mismatch between model and data (p < .05)?
Are incremental fit indices (CFI, TLI) within acceptable ranges?
Is the RMSEA low enough to indicate a good fit?

Task 5: Inspect Factor Loadings

Objective

Verify that each observed variable loads primarily on its intended factor and that loadings are sufficiently strong.

Tasks

From the summary output or by using inspect(fit_cfa, what = "std"), look at the standardized loadings.
Identify any loadings below 0.4 or 0.5 (if any).
Consider whether any indicators might be poorly measuring their intended construct.

Show solution code

standardised_loadings <- inspect(fit_cfa, what = "std")$lambda
print(standardised_loadings)
# Evaluate whether each indicator has a strong loading on its assigned latent factor.

Reflective Questions / Observations

Which variables have the highest loadings?
Are there any unexpected cross-loadings (i.e., an indicator loading on more than one factor)?
Do the loadings support your theoretical structure?

Task 6: Check Modification Indices (Optional Adjustments)

Objective

Identify potential adjustments to improve model fit, if needed.

Tasks

Use modindices(fit_cfa) to view modification indices that suggest additional paths or error covariances.
Decide if any modifications make theoretical sense (e.g., correlating errors for similar indicators).
Remember to avoid purely data-driven modifications that lack theoretical justification!

Show solution code

mi <- modindices(fit_cfa)
# Sort by largest modification indices
mi_sorted <- mi[order(mi$mi, decreasing = TRUE), ]
head(mi_sorted, 10)

Reflective Questions / Observations

Do the modification indices suggest any strong correlations among indicator errors?
Would adjusting these improve the model and make theoretical sense?

Task 7: Evaluate Reliability and Validity

Objective

Check if each factor is measured reliably and whether factors are distinct (convergent and discriminant validity).

Tasks

Compute Composite Reliability (CR) and Average Variance Extracted (AVE) for each factor.
Check discriminant validity by comparing the AVEs to the squared correlations between factors.
Consider if any factor lacks sufficient convergent validity (low AVE) or discriminant validity (overlaps too much with another factor).
A convenient way is to use helper functions from the semTools package (e.g., reliability() or semTools::compRelSEM), or manually calculate from the factor loadings and error variances.

Show solution code

# Using semTools
library(semTools)

# Evaluate composite reliability, AVE, etc.
cfa_reliability <- reliability(fit_cfa)
print(cfa_reliability)

# Check for each factor: is CR > 0.7? Is AVE > 0.5?

Reflective Questions / Observations

Do all factors meet the typical thresholds (CR > 0.7, AVE > 0.5)?
If a factor’s AVE is below 0.5, does that suggest a measurement problem?

Task 8: Summarise Findings & Potential Model Revisions

Objective

Provide a clear, concise summary of how well the data support your hypothesised model and what, if any, changes might be needed.

Tasks

State how the fit indices inform the overall adequacy of the model.
Note any problematic loadings or factors.
Mention any potential modifications (error correlations or dropping indicators) that might improve the model while remaining theoretically justifiable.
Discuss the practical implications: if this model is measuring golf performance, do the factors align well with how coaches or athletes conceptualise “Technical,” “Mental,” and “Physical” skills?