Cross-loadings with strong priors
Mauricio Garnier-Villarreal
Source:vignettes/cross_loadings_strong_priors.Rmd
cross_loadings_strong_priors.Rmd
Introduction
An advantage of BSEM is that we can use priors to set up soft constraints in the model, by estimating a parameter with a strong prior. This way the parameter is estimated, but the prior will restrict the possible values.
This was suggested by Muthén and Asparouhov (2012), as a way to estimate all possible cross-loadings in a CFA. This way, if the posterior distribution of the restricted parameters includes values outside of the strong prior, it can be interpreted as a model modification. This means that the parameters should be less restricted, or that the prior distribution should be relaxed.
In this tutorial we present how to estimate a CFA where all possible cross-loadings are restricted by strong priors.
Cross-loadings
We will show an example with the Holzinger and Swineford (1939) data. First we will estimate the regular model with no cross-loadings and default priors.
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
fit_df <- bcfa(HS.model, data=HolzingerSwineford1939,
std.lv=TRUE, meanstructure=T)
We can see the overall model results with the summary()
function, looking at the posterior distribution for the factor loadings,
correlations, intercepts and variances.
summary(fit_df)
## blavaan 0.5.8.1336 ended normally after 1000 iterations
##
## Estimator BAYES
## Optimization method MCMC
## Number of model parameters 30
##
## Number of observations 301
##
## Statistic MargLogLik PPP
## Value -3871.000 0.000
##
## Parameter Estimates:
##
##
## Latent Variables:
## Estimate Post.SD pi.lower pi.upper Rhat Prior
## visual =~
## x1 0.910 0.087 0.746 1.083 1.000 normal(0,10)
## x2 0.500 0.082 0.343 0.661 0.999 normal(0,10)
## x3 0.662 0.079 0.508 0.819 1.000 normal(0,10)
## textual =~
## x4 1.001 0.058 0.891 1.118 1.000 normal(0,10)
## x5 1.114 0.063 0.993 1.241 1.000 normal(0,10)
## x6 0.927 0.054 0.822 1.035 1.001 normal(0,10)
## speed =~
## x7 0.616 0.077 0.460 0.761 1.001 normal(0,10)
## x8 0.733 0.079 0.579 0.889 1.001 normal(0,10)
## x9 0.680 0.081 0.524 0.843 1.001 normal(0,10)
##
## Covariances:
## Estimate Post.SD pi.lower pi.upper Rhat Prior
## visual ~~
## textual 0.450 0.065 0.315 0.572 1.000 lkj_corr(1)
## speed 0.463 0.087 0.290 0.627 1.000 lkj_corr(1)
## textual ~~
## speed 0.279 0.072 0.132 0.416 1.000 lkj_corr(1)
##
## Intercepts:
## Estimate Post.SD pi.lower pi.upper Rhat Prior
## .x1 4.936 0.067 4.802 5.064 1.000 normal(0,32)
## .x2 6.088 0.067 5.958 6.221 0.999 normal(0,32)
## .x3 2.250 0.065 2.125 2.377 0.999 normal(0,32)
## .x4 3.059 0.067 2.931 3.183 1.001 normal(0,32)
## .x5 4.341 0.074 4.195 4.488 1.000 normal(0,32)
## .x6 2.185 0.063 2.062 2.307 0.999 normal(0,32)
## .x7 4.185 0.062 4.063 4.303 1.000 normal(0,32)
## .x8 5.526 0.059 5.412 5.641 1.000 normal(0,32)
## .x9 5.374 0.058 5.260 5.490 1.000 normal(0,32)
## visual 0.000
## textual 0.000
## speed 0.000
##
## Variances:
## Estimate Post.SD pi.lower pi.upper Rhat Prior
## .x1 0.555 0.127 0.292 0.792 1.000 gamma(1,.5)[sd]
## .x2 1.149 0.105 0.956 1.361 1.000 gamma(1,.5)[sd]
## .x3 0.859 0.100 0.673 1.058 1.000 gamma(1,.5)[sd]
## .x4 0.379 0.049 0.287 0.479 1.000 gamma(1,.5)[sd]
## .x5 0.455 0.060 0.347 0.580 1.000 gamma(1,.5)[sd]
## .x6 0.363 0.046 0.281 0.458 1.000 gamma(1,.5)[sd]
## .x7 0.821 0.092 0.654 1.010 1.000 gamma(1,.5)[sd]
## .x8 0.502 0.096 0.313 0.694 1.001 gamma(1,.5)[sd]
## .x9 0.567 0.096 0.369 0.743 1.001 gamma(1,.5)[sd]
## visual 1.000
## textual 1.000
## speed 1.000
Next, we will add all possible cross-loadings with a strong prior of . The prior centers the loadings around 0 and allows them little space to move.
HS.model.cl<-' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
## Cross-loadings
visual =~ prior("normal(0,.08)")*x4 + prior("normal(0,.08)")*x5 + prior("normal(0,.08)")*x6 + prior("normal(0,.08)")*x7 + prior("normal(0,.08)")*x8 + prior("normal(0,.08)")*x9
textual =~ prior("normal(0,.08)")*x1 + prior("normal(0,.08)")*x2 + prior("normal(0,.08)")*x3 + prior("normal(0,.08)")*x7 + prior("normal(0,.08)")*x8 + prior("normal(0,.08)")*x9
speed =~ prior("normal(0,.08)")*x1 + prior("normal(0,.08)")*x2 + prior("normal(0,.08)")*x3 + prior("normal(0,.08)")*x4 + prior("normal(0,.08)")*x5 + prior("normal(0,.08)")*x6'
fit_cl <- bcfa(HS.model.cl, data=HolzingerSwineford1939,
std.lv=TRUE, meanstructure=T)
It is important that, for each factor, the first variable after
=~
is one whose loading we expect to be far from 0. So, in
the above model, we specified the regular cfa first (whose loadings we
expect to be larger), then the loadings with small-variance priors on a
separate line. This is important because, in blavaan, the first loading
is either constrained to be positive or fixed to 1 (depending on
std.lv
). If the posterior distribution of that constrained
loading is centered near 0, we may experience identification problems.
Reverse-coded variables can also be problematic here, because a positive
constraint on a reverse-coded loading can lead other loadings to assume
negative values. If you use informative priors in this situation, then
you should verify that the prior density is on the correct side of
0.
After estimation, you can look at the summary()
of this
model and evaluate the cross-loadings. You can specifically see whether
any of the cross-loadings seem large enough to suggest that they should
be kept in the model, by looking at the posterior mean
(Estimate
) and credible interval.
summary(fit_cl)
## blavaan 0.5.8.1336 ended normally after 1000 iterations
##
## Estimator BAYES
## Optimization method MCMC
## Number of model parameters 48
##
## Number of observations 301
##
## Statistic MargLogLik PPP
## Value -3858.968 0.130
##
## Parameter Estimates:
##
##
## Latent Variables:
## Estimate Post.SD pi.lower pi.upper Rhat Prior
## visual =~
## x1 0.763 0.098 0.578 0.966 1.000 normal(0,10)
## x2 0.568 0.092 0.388 0.749 1.000 normal(0,10)
## x3 0.766 0.097 0.576 0.954 1.000 normal(0,10)
## textual =~
## x4 0.983 0.064 0.862 1.113 1.000 normal(0,10)
## x5 1.155 0.070 1.022 1.295 0.999 normal(0,10)
## x6 0.894 0.061 0.776 1.014 0.999 normal(0,10)
## speed =~
## x7 0.730 0.085 0.564 0.899 0.999 normal(0,10)
## x8 0.790 0.082 0.629 0.951 1.000 normal(0,10)
## x9 0.545 0.073 0.406 0.687 1.000 normal(0,10)
## visual =~
## x4 0.033 0.058 -0.081 0.145 1.000 normal(0,.08)
## x5 -0.073 0.064 -0.199 0.057 1.000 normal(0,.08)
## x6 0.063 0.054 -0.044 0.165 1.000 normal(0,.08)
## x7 -0.132 0.065 -0.258 -0.003 1.001 normal(0,.08)
## x8 -0.007 0.066 -0.136 0.120 1.000 normal(0,.08)
## x9 0.192 0.061 0.070 0.311 1.001 normal(0,.08)
## textual =~
## x1 0.109 0.064 -0.022 0.229 1.000 normal(0,.08)
## x2 0.007 0.059 -0.110 0.121 0.999 normal(0,.08)
## x3 -0.085 0.062 -0.207 0.038 1.000 normal(0,.08)
## x7 0.016 0.061 -0.108 0.128 1.001 normal(0,.08)
## x8 -0.038 0.061 -0.153 0.081 1.000 normal(0,.08)
## x9 0.032 0.055 -0.078 0.139 1.001 normal(0,.08)
## speed =~
## x1 0.041 0.063 -0.088 0.164 1.000 normal(0,.08)
## x2 -0.050 0.062 -0.171 0.068 1.000 normal(0,.08)
## x3 0.029 0.064 -0.098 0.147 0.999 normal(0,.08)
## x4 -0.004 0.056 -0.113 0.107 1.000 normal(0,.08)
## x5 0.008 0.062 -0.114 0.135 0.999 normal(0,.08)
## x6 0.000 0.055 -0.109 0.103 0.999 normal(0,.08)
##
## Covariances:
## Estimate Post.SD pi.lower pi.upper Rhat Prior
## visual ~~
## textual 0.376 0.094 0.181 0.548 0.999 lkj_corr(1)
## speed 0.358 0.110 0.125 0.555 1.000 lkj_corr(1)
## textual ~~
## speed 0.258 0.102 0.051 0.451 1.000 lkj_corr(1)
##
## Intercepts:
## Estimate Post.SD pi.lower pi.upper Rhat Prior
## .x1 4.936 0.067 4.807 5.068 1.000 normal(0,32)
## .x2 6.089 0.068 5.958 6.220 0.999 normal(0,32)
## .x3 2.250 0.064 2.122 2.374 0.999 normal(0,32)
## .x4 3.061 0.069 2.928 3.197 1.001 normal(0,32)
## .x5 4.340 0.076 4.190 4.491 1.001 normal(0,32)
## .x6 2.185 0.064 2.059 2.312 1.000 normal(0,32)
## .x7 4.186 0.064 4.058 4.313 1.000 normal(0,32)
## .x8 5.527 0.059 5.411 5.644 1.000 normal(0,32)
## .x9 5.375 0.057 5.261 5.487 1.001 normal(0,32)
## visual 0.000
## textual 0.000
## speed 0.000
##
## Variances:
## Estimate Post.SD pi.lower pi.upper Rhat Prior
## .x1 0.678 0.109 0.457 0.879 1.000 gamma(1,.5)[sd]
## .x2 1.088 0.109 0.891 1.314 1.000 gamma(1,.5)[sd]
## .x3 0.716 0.112 0.493 0.938 1.000 gamma(1,.5)[sd]
## .x4 0.388 0.049 0.299 0.491 1.000 gamma(1,.5)[sd]
## .x5 0.412 0.063 0.297 0.541 0.999 gamma(1,.5)[sd]
## .x6 0.373 0.044 0.291 0.465 0.999 gamma(1,.5)[sd]
## .x7 0.711 0.094 0.526 0.888 1.000 gamma(1,.5)[sd]
## .x8 0.439 0.090 0.253 0.611 1.000 gamma(1,.5)[sd]
## .x9 0.587 0.066 0.465 0.728 1.000 gamma(1,.5)[sd]
## visual 1.000
## textual 1.000
## speed 1.000
We suggest to not simply look at whether the CI excludes 0 (similar to the null hypothesis), but to evaluate whether the minimum value of the CI (the value closer to 0) is far enough away from 0 to be relavant instead of just different from 0.
Caveats
The model with all possible cross-loadings should not be kept as the final analysis model, but should be used as a step to make decisions about model changes. This for two main reasons, (1) this model is overfitted and would present good overall fit just due to the inclusion of a lot of nuisance parameters. In this example the posterior predictive p-value goes from ppp = 0 to ppp = 0.13, and is not that the model is better theoretically but that we are inflating the model fit. And (2), the addition of small-variance priors can prevent detection of important misspecifications in Bayesian confirmatory factor analysis, as it can obscure underlying problems in the model by diluting it through a large number of nuisance parameters (Jorgensen et al. 2019).