# Convergence and Efficiency Evaluation

#### Mauricio Garnier-Villarreal

Source:`vignettes/convergence_efficiency.Rmd`

`convergence_efficiency.Rmd`

### Introduction

When Bayesian models are estimated with a Markov-Chain Monte Carlo
(MCMC) sampler, the model estimation doesn’t stop when it has achieved
some convergence criteria. It will run as long as desired (determined by
the `burnin`

and `sample`

arguments), and then you
need to evaluate the convergence and efficiency of the estimated
posterior distributions. You should only analyze the results if
convergence has been achieved, as judged by the metrics described
below.

For this example we will use the Industrialization and Political Democracy example (Bollen 1989).

```
model <- '
# latent variable definitions
ind60 =~ x1 + x2 + x3
dem60 =~ a*y1 + b*y2 + c*y3 + d*y4
dem65 =~ a*y5 + b*y6 + c*y7 + d*y8
# regressions
dem60 ~ ind60
dem65 ~ ind60 + dem60
# residual correlations
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8
'
fit <- bsem(model, data=PoliticalDemocracy,
std.lv=T, meanstructure=T, n.chains=3,
burnin=500, sample=1000)
```

### Convergence

The primary convergence diagnostic is \(\hat{R}\), which compares the between- and within-chain samples of model parameters and other univariate quantities of interest (Vehtari et al. 2021). If chains have not mixed well (ie, the between- and within-chain estimates don’t agree), \(\hat{R}\) is larger than 1. We recommend running at least three chains by default and only using the posterior samples if \(\hat{R} < 1.05\) for all the parameters.

`blavaan`

presents the \(\hat{R}\) reported by the underlying MCMC
program, either Stan or JAGS (Stan by default). We can obtain the \(\hat{R}\) from the `summary()`

function, and we can also extract it with the `blavInspect()`

function

`blavInspect(fit, "rhat")`

```
## ind60=~x1 ind60=~x2 ind60=~x3 a b c
## 0.9998329 1.0009269 1.0012448 1.0002433 0.9996886 0.9997942
## d a b c d dem60~ind60
## 0.9996935 1.0002433 0.9996886 0.9997942 0.9996935 1.0004682
## dem65~ind60 dem65~dem60 y1~~y5 y2~~y4 y2~~y6 y3~~y7
## 0.9994272 0.9995837 1.0001069 0.9996087 0.9993587 0.9993821
## y4~~y8 y6~~y8 x1~~x1 x2~~x2 x3~~x3 y1~~y1
## 1.0004460 0.9993719 1.0002074 0.9992664 0.9996132 0.9999948
## y2~~y2 y3~~y3 y4~~y4 y5~~y5 y6~~y6 y7~~y7
## 0.9996140 0.9992549 0.9998172 1.0000120 0.9993669 0.9993283
## y8~~y8 x1~1 x2~1 x3~1 y1~1 y2~1
## 0.9995331 1.0062831 1.0058931 1.0048471 1.0032289 1.0025551
## y3~1 y4~1 y5~1 y6~1 y7~1 y8~1
## 1.0006281 1.0035112 1.0024221 1.0042214 1.0050368 1.0028954
```

With large models it can be cumbersome to look over all of these entries. We can instead find the largest \(\hat{R}\) to see if they are all less than \(1.05\)

`max(blavInspect(fit, "psrf"))`

`## [1] 1.006283`

If all \(\hat{R} < 1.05\) then we
can establish that the MCMC chains have converged to a stable solution.
If the model has not converged, you might increase the number of
`burnin`

iterations

```
fit <- bsem(model, data=PoliticalDemocracy,
std.lv=T, meanstructure=T, n.chains=3,
burnin=1000, sample=1000)
```

and/or change the model priors with the `dpriors()`

function. These address issues where the model failed to converge due to
needing more iterations or due to a model misspecification (such as bad
priors). As a rule of thumb, we seldom see a model require more than
1,000 burnin samples in Stan. If your model is not converging after
1,000 burnin samples, it is likely that the default prior distributions
clash with your data. This can happen, e.g., if your variables contain
values in the 100s or 1000s.

### Efficiency

We should also evaluate the efficiency of the posterior samples. Effective sample size (ESS) is a useful measure for sampling efficiency, and is well defined even if the chains do not have finite mean or variance (Vehtari et al. 2021).

In short, the posterior samples produced by MCMC are autocorrelated.
This means that, if you draw 500 posterior samples, you do not have 500
independent pieces of information about the posterior distribution,
because the samples are autocorlated. The ESS metric is
like
a currency conversion, telling you how much your autocorrelated
samples are worth if we were to convert them to independent samples. In
`blavaan`

we can print it from the `summary`

function with the `neff`

argument

`summary(fit, neff=T)`

We can also extract only those with the `blavInspect()`

function

`blavInspect(fit, "neff")`

```
## ind60=~x1 ind60=~x2 ind60=~x3 a b c
## 1788.407 1643.328 1871.459 2173.497 2498.604 2204.155
## d a b c d dem60~ind60
## 2119.537 2173.497 2498.604 2204.155 2119.537 2678.910
## dem65~ind60 dem65~dem60 y1~~y5 y2~~y4 y2~~y6 y3~~y7
## 3760.840 3330.003 2711.883 2571.530 3307.739 2654.652
## y4~~y8 y6~~y8 x1~~x1 x2~~x2 x3~~x3 y1~~y1
## 2717.897 1776.845 2168.178 1809.031 3810.470 2942.691
## y2~~y2 y3~~y3 y4~~y4 y5~~y5 y6~~y6 y7~~y7
## 3547.369 3427.482 2621.538 2290.928 2491.826 2377.952
## y8~~y8 x1~1 x2~1 x3~1 y1~1 y2~1
## 1805.490 1278.027 1223.059 1368.768 1422.557 1651.289
## y3~1 y4~1 y5~1 y6~1 y7~1 y8~1
## 1565.144 1332.832 1329.148 1415.159 1332.593 1329.845
```

ESS is a sample size, so it should be at least 100 (optimally, much
more than 100) times the number of chains in order to be reliable and to
indicate that estimates of the posterior quantiles are reliable. In this
example, because we have 3 chains, we would want to see at least
`neff=300`

for every parameter.

And we can easily find the lowest ESS with the `min()`

function:

`min(blavInspect(fit, "neff"))`

`## [1] 1223.059`

### References

*Structural Equations with Latent Variables*. Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons, Inc.

*Bayesian Analysis*16 (2): 667–718. https://doi.org/10.1214/20-BA1221.