Show solution code
library(tidyverse)
<- read_csv("https://www.dropbox.com/scl/fi/bmay3bxmaz8e5qednybqy/golf_performance_data.csv?rlkey=ohrstpeuagkfjqzy18vk36n09&dl=1")
golf_data
head(golf_data)
str(golf_data)
summary(golf_data)
You will need to install the package semTools
for this practical.
Familiarise yourself with the dataset and ensure it is suitable for CFA.
The dataset is available here: https://www.dropbox.com/scl/fi/bmay3bxmaz8e5qednybqy/golf_performance_data.csv?rlkey=ohrstpeuagkfjqzy18vk36n09&dl=1
Load the golf_performance_data.csv file using read_csv()
.
Examine the first few rows (head()
) and the structure (str()
).
Check for missing or problematic data (e.g., using summary()
).
library(tidyverse)
<- read_csv("https://www.dropbox.com/scl/fi/bmay3bxmaz8e5qednybqy/golf_performance_data.csv?rlkey=ohrstpeuagkfjqzy18vk36n09&dl=1")
golf_data
head(golf_data)
str(golf_data)
summary(golf_data)
Define a theoretical model linking observed variables to latent factors (Technical, Mental, Physical), as discussed previously.
Decide which indicators belong to each factor. (From our dataset, we hypothesise:
Technical: Swing_Technique, Putting_Skill, Drive_Accuracy
Mental: Concentration, Confidence, Anxiety_Management
Physical: Flexibility, Stamina, Strength
Write down the measurement model in lavaan
syntax. For example:
<- '
model Technical =~ Swing_Technique + Putting_Skill + Drive_Accuracy
Mental =~ Concentration + Confidence + Anxiety_Management
Physical =~ Flexibility + Stamina + Strength
'
Does this model capture your theoretical expectation of how these golf skills are organised?
Are there any cross-loadings or alternative structures you might consider?
Estimate the model parameters (factor loadings, variances, covariances) and evaluate if the data align with your hypothesised model.
Install and load the lavaan
package (if not already installed).
Use cfa(model, data = golf_data, estimator = "ML"
) (the default estimator is maximum likelihood).
Store the fitted model object in a variable, e.g., fit_cfa
.
library(lavaan)
<- cfa(model, data = golf_data, estimator = "ML")
fit_cfa
summary(fit_cfa, fit.measures = TRUE, standardized = TRUE)
Look at the standardised loadings. Are they all sufficiently high (e.g., > 0.5)?
Check the fit indices (Chi-Square, RMSEA, CFI, TLI). Does the model fit well?
Determine how well the hypothesized model aligns with the observed data, based on common CFA fit indices.
fitMeasures(fit_cfa, c("chisq", "df", "pvalue", "cfi", "tli", "rmsea", "srmr"))
# Compare these values to recommended thresholds (e.g., RMSEA < 0.06, CFI/TLI > 0.90).
Verify that each observed variable loads primarily on its intended factor and that loadings are sufficiently strong.
From the summary output or by using inspect(fit_cfa, what = "std"
), look at the standardized loadings.
Identify any loadings below 0.4 or 0.5 (if any).
Consider whether any indicators might be poorly measuring their intended construct.
<- inspect(fit_cfa, what = "std")$lambda
standardised_loadings print(standardised_loadings)
# Evaluate whether each indicator has a strong loading on its assigned latent factor.
Which variables have the highest loadings?
Are there any unexpected cross-loadings (i.e., an indicator loading on more than one factor)?
Do the loadings support your theoretical structure?
Identify potential adjustments to improve model fit, if needed.
Use modindices(fit_cfa)
to view modification indices that suggest additional paths or error covariances.
Decide if any modifications make theoretical sense (e.g., correlating errors for similar indicators).
Remember to avoid purely data-driven modifications that lack theoretical justification!
<- modindices(fit_cfa)
mi # Sort by largest modification indices
<- mi[order(mi$mi, decreasing = TRUE), ]
mi_sorted head(mi_sorted, 10)
Do the modification indices suggest any strong correlations among indicator errors?
Would adjusting these improve the model and make theoretical sense?
Check if each factor is measured reliably and whether factors are distinct (convergent and discriminant validity).
Compute Composite Reliability (CR) and Average Variance Extracted (AVE) for each factor.
Check discriminant validity by comparing the AVEs to the squared correlations between factors.
Consider if any factor lacks sufficient convergent validity (low AVE) or discriminant validity (overlaps too much with another factor).
A convenient way is to use helper functions from the semTools
package (e.g., reliability()
or semTools::compRelSEM
), or manually calculate from the factor loadings and error variances.
# Using semTools
library(semTools)
# Evaluate composite reliability, AVE, etc.
<- reliability(fit_cfa)
cfa_reliability print(cfa_reliability)
# Check for each factor: is CR > 0.7? Is AVE > 0.5?
Do all factors meet the typical thresholds (CR > 0.7, AVE > 0.5)?
If a factor’s AVE is below 0.5, does that suggest a measurement problem?
Provide a clear, concise summary of how well the data support your hypothesised model and what, if any, changes might be needed.
State how the fit indices inform the overall adequacy of the model.
Note any problematic loadings or factors.
Mention any potential modifications (error correlations or dropping indicators) that might improve the model while remaining theoretically justifiable.
Discuss the practical implications: if this model is measuring golf performance, do the factors align well with how coaches or athletes conceptualise “Technical,” “Mental,” and “Physical” skills?