Say that we specify a model of the Bollen political democracy data
and draw posterior samples using the following blavaan code
(where save.lvs
saves the latent variable samples for
further use):
model <- '
# latent variable definitions
ind60 =~ x1 + x2 + x3
dem60 =~ y1 + a*y2 + b*y3 + c*y4
dem65 =~ y5 + a*y6 + b*y7 + c*y8
# regressions
dem60 ~ ind60
dem65 ~ ind60 + dem60
# residual correlations
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8
'
fit <- bsem(model, data=PoliticalDemocracy, save.lvs = TRUE)
We describe here how to summarize the fitted model. The most obvious
functions are summary()
, coef()
, and
vcov()
, which all work in a manner similar to the analogous
lavaan functions. But instead of maximum likelihood estimates
and standard errors, blavaan reports posterior means and
posterior standard deviations. Other summaries that are unique to
Bayesian models include model convergence metrics, model fit/comparison
metrics, and samples of latent variables. These are discussed below.
Convergence
Following model estimation, we immediately wish to look at the
“goodness” of the posterior samples, including convergence to a
stationary distribution and autocorrelation. Popular convergence metrics
are available via the blavInspect()
function:
blavInspect(fit, 'rhat')
blavInspect(fit, 'neff')
where R-hat values near 1.00 indicate convergence, and large effective sample sizes (hundreds or above) are preferred. For details on these metrics, see, e.g., the Posterior Analysis section of the Stan Reference Manual.
If the model has definitely not converged (as judged by Rhat), blavaan will issue multiple warnings. Lack of convergence is sometimes caused by bad initial values or by a chain that strays to an extreme region of the posterior space. In these cases, it can be helpful to re-estimate the model a second time. It is also helpful to specify mildly-informative priors on loading parameters, so that the chains do not wander to extreme loading values. For example, if you expect all your variables to be positively correlated and some loadings are being fixed to 1 for identification, then Normal(1,.5) would often be a mildly-informative prior. Otherwise, lack of convergence may imply prior distributions that severely conflict with the data, or an ill-defined model. It is sometimes helpful to try to fit the same model in lavaan, to observe whether errors occur there.
Model Fit & Comparison
Next, we may wish to examine some model fit metrics. While many
metrics are available from the summary()
output, more are
available from the fitMeasures()
function:
summary(fit)
fitMeasures(fit)
For judging absolute fit, blavaan supplies a posterior
predictive p-value that is based on the likelihood ratio statistic.
Good-fitting models have values near 0.5 on this metric. For examining
models’ relative fits, blavaan supplies the DIC, WAIC, and
LOOIC. The latter two metrics are computed with the help of the
loo package (Vehtari et al.
2020). Comparison of multiple models on these criteria is
facilitated via blavCompare()
, which provides standard
errors of the difference between two criteria.
Other notable functions include blavFitIndices()
for
alternative measures of absolute fit and ppmc()
for general
posterior predictive checks.
Latent Variables & Standardization
An often-discussed advantage of Bayesian models is their abilities to
describe uncertainty in “random” parameters, including random effects
and latent variables. To access this functionality in blavaan,
users must set save.lvs = TRUE
during model estimation, as
is done at the top of this page. After model estimation, uses can access
this information via blavInspect()
or
blavPredict()
. Relevant arguments to
blavInspect()
include lvmeans
and
lvs
. The former returns posterior means of latent
variables, which are similar to the predictions supplied by frequentist
models. The latter returns posterior samples of latent variables, so
that users could summarize their uncertainties or other functions of
latent variables. These posterior samples are returned as a list of
length n.chains
, where each list entry has a row per
posterior sample (and number of columns is total number of latent
variables in the model):
postmns <- blavInspect(fit, what = "lvmeans")
postsamps <- blavInspect(fit, what = "lvs")
Some related, but different, information can be obtained by
blavPredict()
. This function will also return posterior
samples of latent variables, but in a matrix instead of a list:
postsamps <- blavPredict(fit, type = "lv")
The blavPredict()
function will also return predictions
of observed variables conditioned on the sampled latent variables. The
type = "yhat"
argument returns expected values of observed
variables conditioned on latent variable samples; the
type = "ypred"
argument returns posterior predictions of
observed variables including residual noise (essentially
yhat
+ error); and the type = "ymis"
argument
returns posterior predictions of missing variables conditioned on
observed. These expected values and predictions are returned in list
format; for a matrix, see the last line of code below.
evpreds <- blavPredict(fit, type = "yhat")
postpreds <- blavPredict(fit, type = "ypred")
mispreds <- blavPredict(fit, type = "ymis")
## convert to matrix from list:
evpreds <- do.call("rbind", evpreds)
Finally, not fully related to latent variables: the
standardizedPosterior()
function will return standardized
posterior draws. It calls the lavaan function
standardizedSolution()
in the background and has some of
that function’s flexibility.