# Ordinal Models in blavaan

#### Ed Merkle

`ordinal.Rmd`

### Introduction

Structural equation models with ordinal observed variables are
supported starting in *blavaan* 0.4-1 (`target="stan"`

only). This document describes the overall approach, which includes
model estimation, threshold parameters, log-likelihood calculation,
posterior predictive p-values, and Jacobians. We assume that you are
somewhat familiar with the layout of SEM; if not, some technical detail
and examples are found in Merkle and Rosseel
(2018) and, more recently, Merkle et al.
(2021) (links to these papers are in the references section). We
aim here to provide enough detail to elucidate the new blavaan features,
while being informal enough for you to not get (too) bored.

### Estimation

Ordinal observed variables are handled via data augmentation, in the
style of Chib and Greenberg (1998). You
might already know this, but the phrase *data augmentation* is
imprecise in the context of SEM. This is because there are many possible
things that could be augmented, each of which can make model estimation
easier. We could be augmenting observed data with predictions of missing
data, which is related to multiple imputation methods. We could be
augmenting the observed data with the latent variables, which can
simplify likelihood calculation (leading to what is sometimes called a
*conditional* likelihood, though *conditional* also has
many meanings). Or we could be augmenting categorical observed variables
with underlying, latent continuous variables. This last type of
augmentation is what we are doing here. In our testing, we found it to
be faster and more efficient than other approaches that would sample
latent variables alongside other model parameters (the latent variables
are integrated out of our likelihoods here; similar to the description
from Merkle et al. (2021)).

In our data augmentation implementation, each ordinal observation (e.g., \(y\)) is used to generate a continuous, underlying counterpart (e.g., \(y^\ast\)). This \(y^\ast\) must obey the model’s threshold parameters (commonly denoted \(\mathbf{\tau}\)), based on the value of the observed data. For example, ignoring subscripts on \(y^\ast\) and assuming an ordinal variable with 4 categories, we would have \[\begin{align*} y^* < \tau_1 &\text{ if }y = 1 \\ \tau_1 <\ y^* < \tau_2 &\text{ if }y = 2 \\ \tau_2 <\ y^* < \tau_3 &\text{ if }y = 3 \\ y^* >\ \tau_3 &\text{ if }y = 4 \end{align*}\] where we require \(\tau_1 < \tau_2 < \tau_3\). We generate such a \(y^*\) separately for each ordinal observation in the dataset. These all become additional, bounded parameters in the Stan file.

The Stan User’s Guide has a helpful example of multivariate probit regression using a related approach; see https://mc-stan.org/docs/2_27/stan-users-guide/multivariate-outcomes.html. The trickiest parts involve enforcing the boundaries of the \(y^*\) variables, and ensuring that the threshold parameters for each ordinal variable are ordered correctly (while allowing for the possibility that different ordinal variables have different numbers of thresholds). These require some Jacobian adjustments that took a good deal of time to code correctly (some further detail appears in a later section).

Once the above parameters are defined and generated, the remainder of the model estimation is similar to the simpler situation where all observed variables are continuous. In terms of the Stan file, most of the ordinal overhead comes in the transformed parameters block. Once we get to the model block, most things operate as they would with continuous data.

### Thresholds & Priors

The prior distributions on the threshold (\(\tau\)) parameters are more involved than they may appear. This is because, as described in the previous section, the threshold parameters for a single variable must be ordered. So if we say, for example, that all thresholds have a normal(0,1) prior distribution, we are ignoring the fact that one threshold’s value influences the size of other thresholds’ values. As Michael Betancourt describes on Stan Discourse, such a prior “interacts with the (ordering) constraint to enforce a sort of uniform repulsion between the interior points, resulting in very rigid differences.”

quote from https://discourse.mc-stan.org/t/prior-choice-for-ordered-inverse-transformed-parameters/16378/3

To address this issue, we first define an unconstrained, unordered
parameter vector whose length equals the number of thresholds in the
model. Call this vector \(\mathbf{\tau}^*\). We then obtain ordered
thresholds by exponentiating the unordered parameter vector in a
specific manner. The manner in which this works is exactly the same as
how Stan defines a parameter of type `ordered`

. See https://mc-stan.org/docs/2_28/reference-manual/ordered-vector.html.
Additionally, a similar idea has been independently developed for signal
detection models by Paulewicz and Blaut
(2020) (and see their *bhsdtr* package).

The idea is most easily shown via example. Say that we have an
ordinal variable with 4 categories. Then the three thresholds for this
variable are obtained via: \[\begin{align*}
\tau_1 &= \tau^*_1 \\
\tau_2 &= \tau^*_1 + \exp(\tau^*_2) \\
\tau_3 &= \tau^*_1 + \exp(\tau^*_2) + \exp(\tau^*_3).
\end{align*}\] We then place normal prior distributions on the
unordered \(\tau^*\) parameters, as
opposed to placing priors on the ordered \(\tau\) parameters. These normal priors
imply that the lowest threshold (\(\tau_1\) above) has a normal prior, while
differences between successive \(\tau\)’s have log-normal priors. In
blavaan, these priors can be specified in the usual two ways. First, we
could add the `dp`

argument to a model estimation command as
follows.

`dp = dpriors(tau = "normal(0, .5)")`

which would assign this prior to all the unordered \(\tau^*\) parameters in the model. Second,
we could specify priors for specific threshold parameters in the model
specification syntax. For example, say that we have a 4-category
observed variable called `x1`

. Then unique priors for each
the three thresholds could be specified in the model syntax via

`x1 | prior("normal(-1, 1)") * t1 + prior("normal(0, .5)") * t2 + prior("normal(0, 1)") * t3`

It is not clear at this time that priors on the \(\tau^*\) parameters are the best option. In a 2019 paper, Michael Betancourt describes a Dirichlet prior that regularizes the thresholds of an ordinal regression model. Such a strategy would seem to work for SEM, and it could be especially useful for datasets where some categories of the ordinal variable are sparse. These issues warrant further study.

https://betanalpha.github.io/assets/case_studies/ordinal_regression.html

### Likelihood Computations

Once we get to continuous data in the model block, it seems reasonable to expect simple likelihood computations. But it depends on what likelihood you want to compute. The likelihood used for sampling in Stan is a simple multivariate normal of the \(y^*\) observations, combined with any continuous observed variables in the model. This is indeed simple to compute. But it is not the likelihood you want to use for model comparison. For one thing, all the \(y^*\) parameters associated with ordinal data are involved in this likelihood, so quantities like the effective number of parameters become very inflated. The number of parameters involved in this likelihood also increases with sample size, which is generally bad in the land of model comparison metrics. See Merkle, Furr, and Rabe-Hesketh (2019) for more detail here.

All this means that, for quantities like WAIC and PSIS-LOO, we must
compute a second model likelihood that involves the observed, ordinal
\(y\) variables and that integrates
over the latent \(y^*\) variables. This
is a difficult problem that amounts to evaluating the CDF of a
sometimes-high-dimensional, multivariate normal distribution (see Chib and Greenberg 1998, Equation 11).
There are multiple possibilities for approximating this CDF. We
currently rely on the `sadmvn()`

function from the
*mnormt* package (Azzalini and Genz
2020), which uses a subregion adaptive integration method by
Genz (1992) that is fast and accurate
(when there are about 15 or fewer ordinal variables in the model). A
second possibility involves Monte Carlo simulation, which is implemented
in the *tmvnsim* package (Bhattacjarjee
2016). For each case, we generate many random samples from the
appropriate truncated multivariate normal and average over the resulting
importance sampling weights. The procedure is computationally intensive
and also time intensive, so we have to balance the number of random
samples drawn with the amount of time that it takes. If users wish to
use `tmvnsim()`

, they must declare the number of importance
samples to draw. This is accomplished by setting `llnsamp`

within the `mcmcextra$data`

argument. For example, to draw
100 samples for the approximation, a call to `bsem()`

or
similar functions would include the argument

Beyond these two methods, it would also be possible to use quadrature
over the latent variables. Many people would consider quadrature to be
the gold standard here, and quadrature would reduce the dimension of
integration for many models (because there are usually fewer latent
variables than observed variables). But the quadrature would have to be
specific to SEM, and fast, efficient, open implementations of such a
method do not appear to currently exist (some implementations are hidden
in *blavaan*, but they are pure R implementations that are fairly
slow). On the other hand, approximation of the multivariate normal CDF
is a general problem that has multiple fast, efficient, open
implementations, so long as there are not too many ordinal variables in
your model.

There also exists a relatively new method by Z. I. Botev (2017) for evaluating the CDF of the
multivariate normal, with an implementation of the method appearing in
the package *TruncatedNormal* (Z. Botev
and Belzile 2021). This method is especially useful for
evaluating high-dimensional normal distributions (in our case, with more
than about 15 ordinal variables), and it may be incorporated in future
versions of *blavaan*.

### Comparison to *lavaan*

Ordinal SEM is associated with two types of model parameterizations:
*delta* and *theta*. These refer to different scale
parameterizations of the \(y^*\)
variables: *delta* refers to the total standard deviation of
\(y^*\) (including variability due to
latent variables), and *theta* refers to the residual standard
deviation of \(y^*\).

In *blavaan*, only the theta parameterization is implemented.
So, if you want to compare *lavaan* results to *blavaan*
results, you need to use the argument
`parameterization = "theta"`

when you estimate the
*lavaan* model.

Also, the default *lavaan* estimator for ordinal models is a
multiple-step procedure that involves a weighted least squares
discrepancy function. The resulting parameter estimates are sometimes
far from the posterior means reported by *blavaan*. The
*blavaan* estimates are usually closer to
`estimator="PML"`

in *lavaan*.

### Posterior Predictive p-values

Posterior predictive p-value (ppp) computations receive a speed boost
in the 0.4 series. These computations now occur in Stan, whereas they
previously occurred in R after model estimation. As discussed by Asparouhov and Muthén (2021), the ppp
computations needed for models with missing data can be excessively
slow, requiring us to run an EM algorithm for each posterior sample in
order to find the “H1” (“saturated”) model covariance matrix. The
solution by Asparouhov and Muthén (2021)
involves the realization that we do not need to use a fully-optimized H1
covariance matrix in order to compute the ppp. In *blavaan*, we
consequently run an EM algorithm for a fixed number of iterations in
order to compute an H1 covariance matrix that is “good enough” for the
ppp. The default number of iterations it set to 20, and users can change
the default by supplying an `emiter`

value via the
`mcmcextra`

argument. For example,

### Jacobians

*(This section is likely only relevant to you if you are
editing/writing Stan models.)* The Stan model underlying
*blavaan* currently requires Jacobian adjustments in two places.
This section briefly reviews the ideas underneath the adjustments, which
my future self may wish to remember.

We need a Jacobian adjustment when we place a prior on something that
does not appear in the Stan `parameters`

block. The Jacobian
tells us about the implied priors of the things in the
`parameters`

block, based on the priors that appear in the
`model`

block.

The Jacobian comes from the statistics literature on “change of
variables”: we are applying a function to some random variables, and
finding the distribution of the function based on the original
distribution of the random variables. When it comes to Stan models, this
means we are starting with the priors from the `model`

block
and finding the implied priors for the `parameters`

block.
This confused me for a long time because, in the Stan file, the
functions naturally go in the opposite direction: starting with the
`parameters`

block, and moving to the `model`

block.

The fact that our functions go from `model`

to
`parameters`

is convenient, though, because Jacobian
adjustments require the inverse functions. And the inverse functions
move us from `parameters`

to `model`

, so they
already exist in the Stan model. We just need to find the appropriate
derivatives of these functions, which lead to the Jacobian.

As an example, consider the fact that *blavaan* allows users
to choose whether priors go on standard deviation, variance, or
precision parameters. The standard deviations appear in the
`parameters`

block regardless of what the user chooses (the
Stan model is precompiled at the time of package installation). Say that
the user wants priors on precisions. We transform the standard
deviations to precisions in the `model`

block, then put the
prior on the precision. In addition to this prior, we need the Jacobian
of the function that starts with a standard deviation (call it \(\sigma\)) and transforms to precision
(\(\sigma^{-2}\)). The derivative of
\(\sigma^{-2}\) with respect to \(\sigma\) is \(-2
\sigma^{-3}\). And because this is a simple function mapping a
single parameter to a different value, the Jacobian is the absolute
value of this derivative, which is \(2
\sigma^{-3}\). In the Stan file, we would then add the log of
this Jacobian to `target`

:

`+= log(2) - 3*log(sigma) target `

Further examples and discussion can be found in:

https://mc-stan.org/users/documentation/case-studies/mle-params.html

### Summary

The *blavaan* 0.4 series offers enhanced functionality in a
variety of areas. The computational decisions that we have made reflect
a balance between estimation precision and estimation speed. It will be
the case that the software defaults behave poorly in some situations.
For example, the default prior distributions can be problematic in
certain situations, the likelihood approximations for ordinal models may
not be as precise as desired, and the new ppp computations may behave
differently than previous computations. We encourage users to carry out
sensitivity analyses, and also to report bugs!

### References

*Structural Equation Modeling: A Multidisciplinary Journal*28 (1): 1–14. https://doi.org/10.1080/10705511.2020.1764360.

*The R Package*. http://azzalini.stat.unipd.it/SW/Pkg-mnormt/.

`mnormt`

: The Multivariate Normal and \(t\) Distributions (Version 2.0.2)*tmvnsim: Truncated Multivariate Normal Simulation*. https://CRAN.R-project.org/package=tmvnsim.

*Journal of the Royal Statistical Society: Series B*79 (1): 125–48. https://doi.org/https://doi.org/10.1111/rssb.12162.

*TruncatedNormal: Truncated Multivariate Normal and Student Distributions*. https://CRAN.R-project.org/package=TruncatedNormal.

*Biometrika*85: 347–61.

*Journal of Computational and Graphical Statistics*1: 141–50.

*Journal of Statistical Software*100 (6): 1–22. https://www.jstatsoft.org/article/view/v100i06.

*Psychometrika*84: 802–29. https://arxiv.org/abs/1802.04452.

*Journal of Statistical Software*85 (4): 1–30. https://www.jstatsoft.org/article/view/v085i04.

*Behavior Research Methods*52: 2122–41.