Confirmatory Factor Analysis
Xiaolu Fan, James Uanhoro
cfa.Rmd
Basic CFA Model
We begin with a simple example of confirmatory factor analysis (CFA),
using theminorbsem()
function to fit the model. The
minorbsem
package contains a built-in dataset called
HS
, which is a part of classic Holzinger-Swineford dataset.
This dataset is used in many papers and books on CFA. The dataset
consists of mental ability test scores of seventh and eighth grade
children from two different schools (Pasteur and Grant-White). In our
version of the dataset (obtained from lavaan, Rosseel (2012)), only 9 out of the original 26
tests are included.
We begin by loading the package:
##
## ###############################################################################
## This is minorbsem 0.2.14
##
## All users of R (or SEM) are invited to report bugs, submit functions or ideas
## for functions. An efficient way to do this is to open an issue on GitHub
## https://github.com/jamesuanhoro/minorbsem/issues/.
## ###############################################################################
The first six lines of the dataset:
head(HS)
## id sex ageyr agemo school grade x1 x2 x3 x4 x5 x6 x7 x8
## 1 1 1 13 1 Pasteur 7 3.333333 7.75 0.375 2.333333 5.75 1.2857143 3.391304 5.75
## 2 2 2 13 7 Pasteur 7 5.333333 5.25 2.125 1.666667 3.00 1.2857143 3.782609 6.25
## 3 3 2 13 1 Pasteur 7 4.500000 5.25 1.875 1.000000 1.75 0.4285714 3.260870 3.90
## 4 4 1 13 2 Pasteur 7 5.333333 7.75 3.000 2.666667 4.50 2.4285714 3.000000 5.30
## 5 5 2 12 2 Pasteur 7 4.833333 4.75 0.875 2.666667 4.00 2.5714286 3.695652 6.30
## 6 6 2 14 1 Pasteur 7 5.333333 5.00 2.250 1.000000 3.00 0.8571429 4.347826 6.65
## x9
## 1 6.361111
## 2 7.916667
## 3 4.416667
## 4 4.861111
## 5 5.916667
## 6 7.500000
Data preparation
The scale of data is important for setting priors on model
parameters. The default priors for models fit with
minorbsem
are reasonable when variables have standard
deviations close to 1. For this reason, we first check the standard
deviations of the relevant variables for this analysis:
## x1 x2 x3 x4 x5 x6 x7 x8 x9
## 1 3.333333 7.75 0.375 2.333333 5.75 1.2857143 3.391304 5.75 6.361111
## 2 5.333333 5.25 2.125 1.666667 3.00 1.2857143 3.782609 6.25 7.916667
## 3 4.500000 5.25 1.875 1.000000 1.75 0.4285714 3.260870 3.90 4.416667
## 4 5.333333 7.75 3.000 2.666667 4.50 2.4285714 3.000000 5.30 4.861111
## 5 4.833333 4.75 0.875 2.666667 4.00 2.5714286 3.695652 6.30 5.916667
## 6 5.333333 5.00 2.250 1.000000 3.00 0.8571429 4.347826 6.65 7.500000
apply(item_data, 2, sd) # compute SD of each variable
## x1 x2 x3 x4 x5 x6 x7 x8 x9
## 1.167432 1.177451 1.130979 1.164116 1.290472 1.095603 1.089534 1.012615 1.009152
All variables have standard deviations close to 1, so we can move forward with the data as they are. Otherwise, we would recommend re-scaling the variables.1
Model syntax
The model syntax is lavaan-style:
syntax_basic <- "
Visual =~ x1 + x2 + x3
Verbal =~ x4 + x5 + x6
Speed =~ x7 + x8 + x9"
This model assumes three latent variables (or factors): visual, verbal, and speed. The visual factor has the indicators: x1, x2, and x3; the verbal factor has indicators: x4, x5, and x6; the speed factor has indicators: x7, x8, and x9.
Fit the model
We run the analysis using the minorbsem()
function. By
default, the function assumes that minor factors influence the
covariance between the variables. minorbsem
then prints out
the iterations from Stan – we show these iterations once so the reader
knows what to expect.
fit_cfa <- minorbsem(model = syntax_basic, data = HS)
## Processing user input ...
## User input fully processed :)
## Now to modeling.
## Fitting Stan model ...
## Init values were only set for a subset of parameters.
## Missing init values for the following parameters:
## - chain 1: gdp_alpha, loadings, res_sds_u, phi_mat_chol, res_cor_01, coefs, sigma_loadings_complex, gdp_loadings_complex, Sigma
## - chain 2: gdp_alpha, loadings, res_sds_u, phi_mat_chol, res_cor_01, coefs, sigma_loadings_complex, gdp_loadings_complex, Sigma
## - chain 3: gdp_alpha, loadings, res_sds_u, phi_mat_chol, res_cor_01, coefs, sigma_loadings_complex, gdp_loadings_complex, Sigma
## Running MCMC with 3 chains, at most 2 in parallel...
##
## Chain 1 Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
## Chain 1 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpO4Ubya/model-22a964aae8b9.stan', line 324, column 2 to column 48)
## Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
## Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
## Chain 1
## Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
## Chain 1 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpO4Ubya/model-22a964aae8b9.stan', line 324, column 2 to column 48)
## Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
## Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
## Chain 1
## Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
## Chain 1 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpO4Ubya/model-22a964aae8b9.stan', line 324, column 2 to column 48)
## Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
## Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
## Chain 1
## Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
## Chain 1 Exception: cholesky_decompose: A is not symmetric. A[4,7] = inf, but A[7,4] = inf (in '/tmp/RtmpO4Ubya/model-22a964aae8b9.stan', line 432, column 6 to column 81)
## Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
## Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
## Chain 1
## Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
## Chain 1 Exception: cholesky_decompose: Matrix m is not positive definite (in '/tmp/RtmpO4Ubya/model-22a964aae8b9.stan', line 432, column 6 to column 81)
## Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
## Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
## Chain 1
## Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
## Chain 1 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpO4Ubya/model-22a964aae8b9.stan', line 324, column 2 to column 48)
## Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
## Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
## Chain 1
## Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
## Chain 1 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpO4Ubya/model-22a964aae8b9.stan', line 324, column 2 to column 48)
## Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
## Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
## Chain 1
## Chain 2 Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
## Chain 2 Exception: cholesky_decompose: A is not symmetric. A[1,2] = inf, but A[2,1] = inf (in '/tmp/RtmpO4Ubya/model-22a964aae8b9.stan', line 432, column 6 to column 81)
## Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
## Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
## Chain 2
## Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
## Chain 2 Exception: cholesky_decompose: A is not symmetric. A[1,2] = -inf, but A[2,1] = -inf (in '/tmp/RtmpO4Ubya/model-22a964aae8b9.stan', line 432, column 6 to column 81)
## Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
## Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
## Chain 2
## Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
## Chain 2 Exception: cholesky_decompose: A is not symmetric. A[1,2] = -inf, but A[2,1] = -inf (in '/tmp/RtmpO4Ubya/model-22a964aae8b9.stan', line 432, column 6 to column 81)
## Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
## Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
## Chain 2
## Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
## Chain 2 Exception: cholesky_decompose: Matrix m is not positive definite (in '/tmp/RtmpO4Ubya/model-22a964aae8b9.stan', line 432, column 6 to column 81)
## Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
## Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
## Chain 2
## Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
## Chain 2 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpO4Ubya/model-22a964aae8b9.stan', line 324, column 2 to column 48)
## Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
## Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
## Chain 2
## Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
## Chain 2 Exception: cholesky_decompose: A is not symmetric. A[1,2] = inf, but A[2,1] = inf (in '/tmp/RtmpO4Ubya/model-22a964aae8b9.stan', line 432, column 6 to column 81)
## Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
## Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
## Chain 2
## Chain 1 Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
## Chain 2 Exception: cholesky_decompose: Matrix m is not positive definite (in '/tmp/RtmpO4Ubya/model-22a964aae8b9.stan', line 432, column 6 to column 81)
## Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
## Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
## Chain 2
## Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
## Chain 1 Exception: cholesky_decompose: Matrix m is not positive definite (in '/tmp/RtmpO4Ubya/model-22a964aae8b9.stan', line 432, column 6 to column 81)
## Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
## Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
## Chain 1
## Chain 2 Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1 Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 2 Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
## Chain 2 Exception: cholesky_decompose: Matrix m is not positive definite (in '/tmp/RtmpO4Ubya/model-22a964aae8b9.stan', line 432, column 6 to column 81)
## Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
## Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
## Chain 2
## Chain 1 Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1 Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 2 Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 2 Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1 Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1 Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1 Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 2 Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 2 Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 2 Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1 Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 2 Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1 Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 2 Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1 Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 2 Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1 Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1 finished in 2.8 seconds.
## Chain 2 Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 3 Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
## Chain 3 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpO4Ubya/model-22a964aae8b9.stan', line 324, column 2 to column 48)
## Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
## Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
## Chain 3
## Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
## Chain 3 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpO4Ubya/model-22a964aae8b9.stan', line 324, column 2 to column 48)
## Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
## Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
## Chain 3
## Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
## Chain 3 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpO4Ubya/model-22a964aae8b9.stan', line 324, column 2 to column 48)
## Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
## Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
## Chain 3
## Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
## Chain 3 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpO4Ubya/model-22a964aae8b9.stan', line 324, column 2 to column 48)
## Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
## Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
## Chain 3
## Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
## Chain 3 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpO4Ubya/model-22a964aae8b9.stan', line 324, column 2 to column 48)
## Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
## Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
## Chain 3
## Chain 2 finished in 3.0 seconds.
## Chain 3 Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
## Chain 3 Exception: cholesky_decompose: Matrix m is not positive definite (in '/tmp/RtmpO4Ubya/model-22a964aae8b9.stan', line 432, column 6 to column 81)
## Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
## Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
## Chain 3
## Chain 3 Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
## Chain 3 Exception: cholesky_decompose: Matrix m is not positive definite (in '/tmp/RtmpO4Ubya/model-22a964aae8b9.stan', line 432, column 6 to column 81)
## Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
## Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
## Chain 3
## Chain 3 Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 3 Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
## Chain 3 Exception: cholesky_decompose: Matrix m is not positive definite (in '/tmp/RtmpO4Ubya/model-22a964aae8b9.stan', line 432, column 6 to column 81)
## Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
## Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
## Chain 3
## Chain 3 Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 3 Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 3 Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 3 Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 3 Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 3 Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 3 Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 3 finished in 3.5 seconds.
##
## All 3 chains finished successfully.
## Mean chain execution time: 3.1 seconds.
## Total execution time: 6.5 seconds.
##
## Parameter estimates (method = normal, sample size(s) = 301)
## from op to mean sd 5.000% 95.000% rhat ess_bulk
## ──────────────────────────────────────────────────────────────────────────────
## Goodness of fit
## ──────────────────────────────────────────────────────────────────────────────
## PPP 0.417 1.001 1668
## RMSE 0.064 0.014 0.043 0.090 1.002 711
## ──────────────────────────────────────────────────────────────────────────────
## Factor loadings
## ──────────────────────────────────────────────────────────────────────────────
## Visual =~ x1 0.931 0.126 0.732 1.146 1.002 1121
## Visual =~ x2 0.472 0.106 0.300 0.647 1.004 1797
## Visual =~ x3 0.623 0.104 0.456 0.795 1.003 1500
## Verbal =~ x4 0.991 0.088 0.846 1.139 1.001 1395
## Verbal =~ x5 1.066 0.102 0.901 1.238 1.000 1948
## Verbal =~ x6 0.935 0.090 0.788 1.088 1.001 1588
## Speed =~ x7 0.560 0.102 0.396 0.728 1.001 1815
## Speed =~ x8 0.682 0.109 0.510 0.866 1.001 1486
## Speed =~ x9 0.786 0.116 0.596 0.988 1.001 1246
## ──────────────────────────────────────────────────────────────────────────────
## Inter-factor correlations
## ──────────────────────────────────────────────────────────────────────────────
## Verbal ~~ Visual 0.428 0.082 0.295 0.568 1.000 1990
## Speed ~~ Visual 0.469 0.096 0.307 0.623 1.001 1812
## Speed ~~ Verbal 0.274 0.078 0.144 0.399 1.000 2406
## ──────────────────────────────────────────────────────────────────────────────
## Residual variances
## ──────────────────────────────────────────────────────────────────────────────
## x1 ~~ x1 0.476 0.219 0.050 0.809 1.005 1026
## x2 ~~ x2 1.158 0.124 0.958 1.365 1.001 2456
## x3 ~~ x3 0.881 0.127 0.668 1.086 1.004 1460
## x4 ~~ x4 0.363 0.145 0.105 0.591 1.001 1096
## x5 ~~ x5 0.512 0.180 0.187 0.790 1.000 1462
## x6 ~~ x6 0.322 0.138 0.062 0.534 1.003 1278
## x7 ~~ x7 0.872 0.114 0.687 1.054 1.001 1954
## x8 ~~ x8 0.568 0.141 0.327 0.784 1.001 1352
## x9 ~~ x9 0.407 0.168 0.082 0.660 1.001 1103
## ──────────────────────────────────────────────────────────────────────────────
##
##
## Column names: from, op, to, mean, sd, 5%, 95%, rhat, ess_bulk
Output structure
At the top of the results table, method = normal
indicates the approach of estimating the residual covariances between
all items: the belief is that the standardized residual covariances
(SRCs) which reflect minor factor influences are normally distributed
with zero mean. The table also prints out the sample size of 301 – only
complete rows are retained for analysis.
We describe the column headers. The from
,
op
and to
combination describe the type of
parameter being reported according to lavaan-style syntax. For example,
the Visual =~ x1
row describes the loading from the visual
factor to item x1. The mean
, sd
and percentage
columns are descriptive statistics of posterior distributions. The
mean
and sd
function like the estimate and
standard error in standard frequentist statistics. The percentage
columns are credible intervals. By default, they are 90% credible
intervals, i.e. given the prior and data, there is a 90% chance the
parameter falls in this interval. rhat
(pronounced R-hat)
and ess_bulk
columns are the potential scale reduction
factor (\(\widehat{R}\)) and effective
sample size (ESS) respectively (Vehtari et al.
2021) – they are useful for checking parameter convergence. For
\(\widehat{R}\), values very close to 1
are preferable. For ESS, larger values are preferable.
A final analysis in a manuscript would ideally have all parameters with
\(\widehat{R}\) under 1.01 and ESS
above 400 for one to be sure parameter estimates have converged (Vehtari et al. 2021). An easy way to meet these
expectations is to increase the number of requested samples when calling
minorbsem()
via the warmup =
and
sampling =
arguments, see ?minorbsem
.
The parameter estimates are presented by the type of parameter.
Goodness of fit
PPP. The first section of results contains
parameters that help assess global model fit. “PPP” is the posterior
predictive p-value in the form described by Muthén and Asparouhov (2012), and is akin to a
\(\chi^2\) test in standard SEMs. It is
conventional to prefer values under .05 or .10. Here, PPP = .382
indicating a good-fitting model. Desirable PPP-values are to be expected
by default in minorbsem
as the package accounts for model
misspecification – alternatively stated: PPP-values above .05 do not
imply an absence of misfit and is not all that informative by default.
We report PPP since minorbsem
is also able to fit Bayesian
SEMs that do not account for misspecification,
e.g. minorbsem(..., method = "none")
.
RMSE. This the root mean square error of standardized residual covariances (SRCs) and communicates the typical size of SRCs. One may also interpret this metric as the standard deviation of SRCs with 95% of SRCs lying within 2 RMSE values from 0. In this example, RMSE = 0.063 and we can expect some SRCs to be greater than 0.10, suggesting some large SRCs (Maydeu-Olivares 2017). Large SRCs challenge the notion that model misspecification is due to the influence of minor factors – if these influences are large, are these factors “minor”? It is possible that the hypothesized structure is incorrect, or minor factors have significant effects.
Substantive parameters
The parameter estimates are reported by type of parameter: factor loadings, inter-factor correlations, and error variances. For this model, all items load on their respective factors with intervals that clearly exclude 0. All factors are assumed standardized in minorbsem, so only their correlations are reported; and all factors are non-trivially correlated.
Residual plots
Given that the RMSE suggests large standardized residual covariances (SRCs), we can request a plot of SRCs using two options: a range-plot and a heat-map.
plot_residuals(fit_cfa, type = "range")
plot_residuals(fit_cfa, type = "matrix")
The heat-map is particularly useful for highlighting the largest SRCs. If these SRCs cluster in a non-random way, one may identify potential model modifications.
Bifactor model with orthogonal factors
To improve on the basic model, we consider the bifactor structure, which builds on the basic CFA model by specifying a general factor ‘G’ that is reflected in all nine indicators. All factors are assumed orthogonal.
Model syntax
syntax_bifactor <- paste0(
"G =~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9", "\n",
syntax_basic
)
writeLines(syntax_bifactor)
## G =~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9
##
## Visual =~ x1 + x2 + x3
## Verbal =~ x4 + x5 + x6
## Speed =~ x7 + x8 + x9
Fit the model
The call to minorbsem()
needs to be of the form:
minorbsem(..., orthogonal = TRUE)
to ensure the factors are
orthogonal:
fit_bifactor <- minorbsem(syntax_bifactor, data = HS, orthogonal = TRUE)
## Parameter estimates (method = normal, sample size(s) = 301)
## from op to mean sd 5.000% 95.000% rhat ess_bulk
## ───────────────────────────────────────────────────────────────────────────────
## Goodness of fit
## ───────────────────────────────────────────────────────────────────────────────
## PPP 0.403 1.000 1890
## RMSE 0.027 0.012 0.009 0.049 1.003 604
## ───────────────────────────────────────────────────────────────────────────────
## Factor loadings
## ───────────────────────────────────────────────────────────────────────────────
## G =~ x1 0.952 0.110 0.765 1.133 1.002 1588
## G =~ x2 0.494 0.105 0.327 0.668 1.001 2061
## G =~ x3 0.630 0.095 0.475 0.781 1.000 1842
## G =~ x4 0.471 0.084 0.340 0.612 1.001 1756
## G =~ x5 0.425 0.095 0.269 0.583 0.999 1863
## G =~ x6 0.460 0.080 0.331 0.596 1.002 1716
## G =~ x7 0.107 0.083 -0.028 0.247 1.001 2525
## G =~ x8 0.288 0.075 0.168 0.414 1.000 2332
## G =~ x9 0.509 0.077 0.389 0.642 1.000 2072
## Visual =~ x1 0.225 0.179 0.018 0.591 1.000 997
## Visual =~ x2 -0.157 0.516 -0.991 0.773 1.002 1581
## Visual =~ x3 -0.017 0.454 -0.736 0.876 1.003 1546
## Verbal =~ x4 0.860 0.075 0.735 0.980 1.001 1923
## Verbal =~ x5 1.051 0.086 0.907 1.190 1.002 1341
## Verbal =~ x6 0.788 0.070 0.673 0.903 1.000 2032
## Speed =~ x7 0.731 0.118 0.545 0.939 1.002 1518
## Speed =~ x8 0.708 0.105 0.534 0.890 1.002 1434
## Speed =~ x9 0.435 0.079 0.306 0.562 1.002 2381
## ───────────────────────────────────────────────────────────────────────────────
## Inter-factor correlations
## ───────────────────────────────────────────────────────────────────────────────
## Visual ~~ G 0.000 0.000 0.000 0.000
## Verbal ~~ G 0.000 0.000 0.000 0.000
## Verbal ~~ Visual 0.000 0.000 0.000 0.000
## Speed ~~ G 0.000 0.000 0.000 0.000
## Speed ~~ Visual 0.000 0.000 0.000 0.000
## Speed ~~ Verbal 0.000 0.000 0.000 0.000
## ───────────────────────────────────────────────────────────────────────────────
## Residual variances
## ───────────────────────────────────────────────────────────────────────────────
## x1 ~~ x1 0.395 0.219 0.017 0.726 1.001 790
## x2 ~~ x2 0.870 0.357 0.054 1.265 1.002 881
## x3 ~~ x3 0.696 0.264 0.083 0.994 1.002 851
## x4 ~~ x4 0.392 0.085 0.250 0.530 1.001 1892
## x5 ~~ x5 0.385 0.130 0.147 0.584 1.002 921
## x6 ~~ x6 0.376 0.074 0.254 0.494 1.000 2027
## x7 ~~ x7 0.639 0.159 0.347 0.865 1.004 1367
## x8 ~~ x8 0.449 0.131 0.207 0.645 1.003 1368
## x9 ~~ x9 0.577 0.075 0.450 0.700 1.003 2451
## ───────────────────────────────────────────────────────────────────────────────
##
##
## Column names: from, op, to, mean, sd, 5%, 95%, rhat, ess_bulk
Compared to the basic CFA model, the RMSE drops from .063 to .028, suggesting a much better fitting model. All items load with 90% intervals excluding 0 on the general factor, except for x7. Additionally, x1 – x3 load confusingly on their specific factor, while x4 – x9 load strongly on their specific factors especially when compared to their general factor loadings. This pattern of factor loadings provide little support for a bifactor structure.
Bifactor model with parameter constraints
We instead explore a more constrained bifactor structure where specific factor loadings are forced equal within each specific factor. Note that minorbsem uses the same parameter constraint syntax as lavaan:
Model syntax
syntax_bifactor_cons <- paste0(
"G =~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9", "\n",
"Visual =~ a * x1 + a * x2 + a * x3
Verbal =~ b * x4 + b * x5 + b * x6
Speed =~ c * x7 + c * x8 + c * x9"
)
Fit the model
fit_bifactor_cons <- fit_bifactor_cons <- minorbsem(
syntax_bifactor_cons,
data = HS, orthogonal = TRUE
)
## Parameter estimates (method = normal, sample size(s) = 301)
## from op to mean sd 5.000% 95.000% rhat ess_bulk
## ──────────────────────────────────────────────────────────────────────────────
## Goodness of fit
## ──────────────────────────────────────────────────────────────────────────────
## PPP 0.397 0.999 2216
## RMSE 0.047 0.012 0.029 0.068 1.003 1038
## ──────────────────────────────────────────────────────────────────────────────
## Factor loadings
## ──────────────────────────────────────────────────────────────────────────────
## G =~ x1 0.864 0.113 0.674 1.049 1.001 1604
## G =~ x2 0.433 0.117 0.229 0.617 1.001 1880
## G =~ x3 0.575 0.114 0.385 0.761 1.000 1615
## G =~ x4 0.499 0.101 0.338 0.669 1.001 1812
## G =~ x5 0.496 0.114 0.319 0.687 1.002 1763
## G =~ x6 0.474 0.095 0.322 0.633 1.001 1657
## G =~ x7 0.132 0.097 -0.024 0.295 1.000 1943
## G =~ x8 0.288 0.086 0.151 0.432 1.001 2261
## G =~ x9 0.517 0.086 0.378 0.660 1.000 2507
## Visual =~ x1 0.280 0.148 0.032 0.512 1.000 1606
## Visual =~ x2 0.280 0.148 0.032 0.512 1.000 1606
## Visual =~ x3 0.280 0.148 0.032 0.512 1.000 1606
## Verbal =~ x4 0.864 0.061 0.764 0.961 1.002 1534
## Verbal =~ x5 0.864 0.061 0.764 0.961 1.002 1534
## Verbal =~ x6 0.864 0.061 0.764 0.961 1.002 1534
## Speed =~ x7 0.605 0.054 0.515 0.689 1.000 2020
## Speed =~ x8 0.605 0.054 0.515 0.689 1.000 2020
## Speed =~ x9 0.605 0.054 0.515 0.689 1.000 2020
## ──────────────────────────────────────────────────────────────────────────────
## Inter-factor correlations
## ──────────────────────────────────────────────────────────────────────────────
## Visual ~~ G 0.000 0.000 0.000 0.000
## Verbal ~~ G 0.000 0.000 0.000 0.000
## Verbal ~~ Visual 0.000 0.000 0.000 0.000
## Speed ~~ G 0.000 0.000 0.000 0.000
## Speed ~~ Visual 0.000 0.000 0.000 0.000
## Speed ~~ Verbal 0.000 0.000 0.000 0.000
## ──────────────────────────────────────────────────────────────────────────────
## Residual variances
## ──────────────────────────────────────────────────────────────────────────────
## x1 ~~ x1 0.531 0.157 0.262 0.774 1.002 1539
## x2 ~~ x2 1.103 0.117 0.917 1.297 1.001 3244
## x3 ~~ x3 0.849 0.109 0.665 1.026 1.000 2363
## x4 ~~ x4 0.356 0.082 0.226 0.490 1.001 2262
## x5 ~~ x5 0.618 0.091 0.474 0.771 1.001 2220
## x6 ~~ x6 0.255 0.077 0.129 0.383 1.003 1883
## x7 ~~ x7 0.777 0.086 0.645 0.925 1.001 3312
## x8 ~~ x8 0.566 0.073 0.449 0.689 1.001 3029
## x9 ~~ x9 0.427 0.085 0.286 0.563 1.000 2152
## ──────────────────────────────────────────────────────────────────────────────
##
##
## Column names: from, op, to, mean, sd, 5%, 95%, rhat, ess_bulk
The RMSE increased since the model is more constrained. The pattern of results with the parameter constraints imposed suggest the general factor mostly reflects items x1 – x3, with other items more strongly reflecting their specific factors. These results suggest limited applicability of the bifactor model for these data.
Non-Simple Structure Model
For our final model, we return to the original basic CFA and relax simple structure. Unlike Muthén and Asparouhov (2012) who do this using small-variance priors, minorbsem does this using a global-local prior (Uanhoro 2024). Precisely, this approach assumes that most cross-loadings are indeed zero and there are some outlier non-zero cross loadings.
Fit the model
The call to minorbsem()
needs to be of the form:
minorbsem(..., simple_struc = FALSE)
fit_non_simple <- minorbsem(syntax_basic, data = HS, simple_struc = FALSE)
## Parameter estimates (method = normal, sample size(s) = 301)
## from op to mean sd 5.000% 95.000% rhat ess_bulk
## ───────────────────────────────────────────────────────────────────────────────
## Goodness of fit
## ───────────────────────────────────────────────────────────────────────────────
## PPP 0.452 1.000 2189
## RMSE 0.024 0.012 0.006 0.047 1.006 433
## ───────────────────────────────────────────────────────────────────────────────
## Factor loadings
## ───────────────────────────────────────────────────────────────────────────────
## Visual =~ x1 0.761 0.119 0.573 0.963 1.001 826
## Visual =~ x2 0.559 0.101 0.392 0.726 1.000 1938
## Visual =~ x3 0.754 0.110 0.580 0.937 1.000 1394
## Visual =~ x4 0.025 0.070 -0.078 0.154 1.004 1456
## Visual =~ x5 -0.059 0.087 -0.216 0.062 1.004 1509
## Visual =~ x6 0.057 0.074 -0.042 0.188 1.003 1184
## Visual =~ x7 -0.139 0.145 -0.420 0.028 1.002 956
## Visual =~ x8 0.044 0.108 -0.118 0.247 1.003 1074
## Visual =~ x9 0.317 0.115 0.125 0.504 1.001 1137
## Verbal =~ x1 0.139 0.107 -0.013 0.324 1.001 970
## Verbal =~ x2 0.015 0.066 -0.093 0.131 1.002 1606
## Verbal =~ x3 -0.070 0.091 -0.238 0.040 1.002 1070
## Verbal =~ x4 0.986 0.075 0.864 1.111 1.001 1584
## Verbal =~ x5 1.137 0.090 0.993 1.295 1.002 1219
## Verbal =~ x6 0.892 0.072 0.775 1.011 1.002 1479
## Verbal =~ x7 0.023 0.071 -0.080 0.145 1.002 1758
## Verbal =~ x8 -0.030 0.071 -0.161 0.069 1.002 1626
## Verbal =~ x9 0.014 0.060 -0.080 0.115 1.001 1965
## Speed =~ x1 0.034 0.080 -0.080 0.181 1.003 1734
## Speed =~ x2 -0.052 0.078 -0.199 0.047 1.002 1895
## Speed =~ x3 0.028 0.078 -0.085 0.163 1.002 1725
## Speed =~ x4 0.003 0.054 -0.084 0.094 1.000 2407
## Speed =~ x5 0.003 0.062 -0.096 0.106 1.000 2006
## Speed =~ x6 0.002 0.053 -0.083 0.086 1.001 2015
## Speed =~ x7 0.763 0.120 0.577 0.970 1.001 978
## Speed =~ x8 0.759 0.099 0.601 0.924 1.006 948
## Speed =~ x9 0.497 0.091 0.353 0.645 1.001 1321
## ───────────────────────────────────────────────────────────────────────────────
## Inter-factor correlations
## ───────────────────────────────────────────────────────────────────────────────
## Verbal ~~ Visual 0.356 0.121 0.150 0.541 1.000 1040
## Speed ~~ Visual 0.284 0.162 0.003 0.533 1.001 1064
## Speed ~~ Verbal 0.227 0.116 0.031 0.412 1.002 1482
## ───────────────────────────────────────────────────────────────────────────────
## Residual variances
## ───────────────────────────────────────────────────────────────────────────────
## x1 ~~ x1 0.668 0.131 0.439 0.866 1.002 991
## x2 ~~ x2 1.092 0.115 0.908 1.291 1.002 2262
## x3 ~~ x3 0.737 0.127 0.524 0.936 1.000 1246
## x4 ~~ x4 0.372 0.085 0.228 0.502 1.003 1254
## x5 ~~ x5 0.425 0.113 0.243 0.610 1.001 847
## x6 ~~ x6 0.374 0.069 0.261 0.489 1.004 1660
## x7 ~~ x7 0.656 0.136 0.426 0.866 1.003 912
## x8 ~~ x8 0.458 0.119 0.252 0.641 1.003 882
## x9 ~~ x9 0.574 0.074 0.458 0.692 1.001 2281
## ───────────────────────────────────────────────────────────────────────────────
##
##
## Column names: from, op, to, mean, sd, 5%, 95%, rhat, ess_bulk
The effect of minor factors is small, RMSE = 0.024. The original hypothesized loadings maintain their relation to their hypothesized factors. Of the cross-loadings, only the relation from the visual factor to x9 is non-trivial, with most being very close to 0. Additionally, the interfactor correlations have all reduced from the original basic CFA, suggesting that forcing cross-loadings to zero artificially inflated interfactor correlations (Ferrando and Lorenzo-Seva 2000).