### Introduction

In SEM, one of the first steps is to evaluate the model’s global fit. After global fit, we need to evaluate the local fit of a model, meaning how the model reproduces specific correlations between observed variables.

There are a couple of common methods for this, (a) testing for high
residual correlations, or (b) modification indices. This tutorial
focuses on the second. Modification indices test the
**likely** change in the model fit if a single parameter is
added to the model that was not originally included. This test can be
carried out for every possible parameter that was not included (Bentler 1990).

### Modification Indices

Modification indices present different **indices** to
quantify the effect of each parameter, and we will focus on two here.
These are (a) the modification index (MI) or Lagrange multiplier, which
estimates the extent to which the model’s chi-square (\(\chi^2\)) test statistic would decrease if
a parameter were added to the model and freely estimated, and (b)
standardized expected parameter change (SEPC), which is the approximated
standardized value of the parameter if it were to be estimated in the
model (Whittaker 2012).

MI presents the possible effect on the overall model, and SEPC presents the effect size for the missed parameter.

We will show an example with the Holzinger and Swineford (1939) model. You first estimate your SEM/CFA model as usual

```
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
fit <- bcfa(HS.model, data=HolzingerSwineford1939, std.lv=TRUE)
```

Then we would need to write a **discrepancy** function
to collect the modification indices. The list below contains two
functions that estimate and save the MI and SEPC.

```
discFUN <- list(
mod.ind_mi = function(object){
temp <- modificationindices(object, free.remove = F)
mods <- temp$mi
names(mods) <- paste0(temp$lhs, temp$op, temp$rhs)
return(mods)
},
mod.ind_sepc.all = function(object){
temp <- modificationindices(object, free.remove = F)
sepc.all <- temp$sepc.all
names(sepc.all) <- paste0(temp$lhs, temp$op, temp$rhs)
return(sepc.all)
}
)
```

Then we will pass this function to the `ppmc()`

function
of *blavaan*. With this function, the MI and SEPC are computed
for each posterior sample, leading to posterior distributions for each
of them.

`out <- ppmc(fit, discFUN = discFUN)`

Then we view the top 5 parameters arrange by the posterior mean (EAP)
MI, which in this case shows that the parameter having the highest
impact in overall model fit (according to EAP) is
**visual=~x9**, the cross-loading from the Visual factor to
item **x9**.

`summary(out, prob=.9, discFUN = "mod.ind_mi", sort.by="EAP", decreasing=T)[1:5,]`

```
##
## Posterior summary statistics and highest posterior density (HPD) 90% credible intervals
## for the posterior distribution of realized discrepancy-function values based on observed data,
## along with posterior predictive p values to test hypotheses in either direction:
##
##
## EAP Median MAP SD lower upper PPP_sim_GreaterThan_obs
## visual=~x9 35.164 35.119 35.213 11.223 16.637 52.245 0.014
## x7~~x8 32.981 35.507 39.796 14.528 8.480 56.405 0.076
## x8~~x9 26.437 11.969 2.849 43.335 0.000 65.993 0.327
## x4~~x6 19.620 6.387 2.223 35.491 0.000 52.799 0.457
## visual=~x7 18.284 16.753 13.244 9.849 3.525 31.902 0.015
## PPP_sim_LessThan_obs
## visual=~x9 0.986
## x7~~x8 0.924
## x8~~x9 0.673
## x4~~x6 0.543
## visual=~x7 0.985
```

But according to the posterior median, the parameter that would have
the highest impact would be the residual correlation between indicators
**x7** and **x8**

`summary(out, prob=.9, discFUN = "mod.ind_mi", sort.by="Median", decreasing=T)[1:5,]`

```
##
## Posterior summary statistics and highest posterior density (HPD) 90% credible intervals
## for the posterior distribution of realized discrepancy-function values based on observed data,
## along with posterior predictive p values to test hypotheses in either direction:
##
##
## EAP Median MAP SD lower upper PPP_sim_GreaterThan_obs
## x7~~x8 32.981 35.507 39.796 14.528 8.480 56.405 0.076
## visual=~x9 35.164 35.119 35.213 11.223 16.637 52.245 0.014
## visual=~x7 18.284 16.753 13.244 9.849 3.525 31.902 0.015
## x8~~x9 26.437 11.969 2.849 43.335 0.000 65.993 0.327
## textual=~x1 11.052 9.906 2.014 8.385 0.000 22.615 0.227
## PPP_sim_LessThan_obs
## x7~~x8 0.924
## visual=~x9 0.986
## visual=~x7 0.985
## x8~~x9 0.673
## textual=~x1 0.773
```

The MI is still recommended as the best metric to indicate which
parameter is best to include next, and we can use the SEPC to evaluate
the **likely** effect size for the respective
parameters.

`summary(out, prob=.9, discFUN = "mod.ind_sepc.all", sort.by="EAP", decreasing=T)[1:5,]`

```
##
## Posterior summary statistics and highest posterior density (HPD) 90% credible intervals
## for the posterior distribution of realized discrepancy-function values based on observed data,
## along with posterior predictive p values to test hypotheses in either direction:
##
##
## EAP Median MAP SD lower upper PPP_sim_GreaterThan_obs
## x7~~x8 0.794 0.786 0.755 0.410 0.502 1.323 0.043
## visual=~x9 0.516 0.496 0.465 0.134 0.341 0.704 0.010
## textual=~x1 0.268 0.298 0.312 0.188 0.024 0.531 0.137
## x1~~x9 0.249 0.247 0.246 0.043 0.200 0.304 0.016
## x2~~x3 0.223 0.223 0.223 0.037 0.167 0.274 0.021
## PPP_sim_LessThan_obs
## x7~~x8 0.957
## visual=~x9 0.990
## textual=~x1 0.863
## x1~~x9 0.984
## x2~~x3 0.979
```

Here we see that for the 2 highest parameters, the likely SEPC is
x7~~x8 = 0.793876155456826 and visual=~x9 = 0.516400919262519. With this
information we can decide to include one of these new parameters in the
model (one at the time). For this example, because factor loadings have
a larger impact on the model-implied covariance matrix, I would choose
**visual=~x9**

```
HS.model <- ' visual =~ x1 + x2 + x3 + x9
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
fit2 <- bcfa(HS.model, data=HolzingerSwineford1939, std.lv=TRUE)
```

And you can check if the added parameter has the expected impact on
overall fit with the `blavFitIndices()`

and the
`summary()`

functions.

It is important to consider also the theoretical relevance of the
suggested parameters, and to ensure that they make sense, instead of
just adding parameters until having **good** fit.

### Summary

In this tutorial we show how to calculate the MI and SEPC across posterior distributions, and evaluate which parameters can be added.

With the `ppmc()`

function we are able to calculate
relevant information after model estimation, and build posterior
distributions of them.

The general recommendations are to use MI to identify the most likely parameter to add, and SEPC as the effect size of the new parameter.

### References

*Multivariate Behavioral Research*25 (2): 163–72. https://doi.org/10.1207/s15327906mbr2502_3.

*A Study of Factor Analysis: The Stability of a Bi-Factor Solution*. Supplementary Educational Monograph 48. Chicago: University of Chicago Press.

*The Journal of Experimental Education*80 (1): 26–44. https://doi.org/10.1080/00220973.2010.531299.