Skip to contents

Introduction

In SEM, one of the first steps is to evaluate the model’s global fit. After global fit, we need to evaluate the local fit of a model, meaning how the model reproduces specific correlations between observed variables.

There are a couple of common methods for this, (a) testing for high residual correlations, or (b) modification indices. This tutorial focuses on the second. Modification indices test the likely change in the model fit if a single parameter is added to the model that was not originally included. This test can be carried out for every possible parameter that was not included (Bentler 1990).

Modification Indices

Modification indices present different indices to quantify the effect of each parameter, and we will focus on two here. These are (a) the modification index (MI) or Lagrange multiplier, which estimates the extent to which the model’s chi-square (\(\chi^2\)) test statistic would decrease if a parameter were added to the model and freely estimated, and (b) standardized expected parameter change (SEPC), which is the approximated standardized value of the parameter if it were to be estimated in the model (Whittaker 2012).

MI presents the possible effect on the overall model, and SEPC presents the effect size for the missed parameter.

We will show an example with the Holzinger and Swineford (1939) model. You first estimate your SEM/CFA model as usual

HS.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '

fit <- bcfa(HS.model, data=HolzingerSwineford1939, std.lv=TRUE)

Then we would need to write a discrepancy function to collect the modification indices. The list below contains two functions that estimate and save the MI and SEPC.

discFUN <- list(
  mod.ind_mi = function(object){
    temp <- modificationindices(object, free.remove = F)
    mods <- temp$mi
    names(mods) <- paste0(temp$lhs, temp$op, temp$rhs)
    return(mods)
  },
  mod.ind_sepc.all = function(object){
    temp <- modificationindices(object, free.remove = F)
    sepc.all <- temp$sepc.all
    names(sepc.all) <- paste0(temp$lhs, temp$op, temp$rhs)
    return(sepc.all)
  }
)

Then we will pass this function to the ppmc() function of blavaan. With this function, the MI and SEPC are computed for each posterior sample, leading to posterior distributions for each of them.

out <- ppmc(fit, discFUN = discFUN)

Then we view the top 5 parameters arrange by the posterior mean (EAP) MI, which in this case shows that the parameter having the highest impact in overall model fit (according to EAP) is visual=~x9, the cross-loading from the Visual factor to item x9.

summary(out, prob=.9, discFUN = "mod.ind_mi", sort.by="EAP", decreasing=T)[1:5,]
## 
## Posterior summary statistics and highest posterior density (HPD) 90% credible intervals
##  for the posterior distribution of realized discrepancy-function values based on observed data, 
##  along with posterior predictive p values to test hypotheses in either direction:
## 
## 
##               EAP Median    MAP     SD  lower  upper PPP_sim_GreaterThan_obs
## visual=~x9 34.967 35.022 34.859 11.168 16.710 53.630                   0.016
## x7~~x8     33.864 36.587 40.254 14.036  7.381 54.461                   0.070
## x8~~x9     26.327 12.393  2.476 42.480  0.000 67.314                   0.314
## x4~~x6     19.931  7.235  1.526 34.877  0.000 53.137                   0.447
## visual=~x7 18.482 16.693 15.219  9.983  4.402 33.339                   0.013
##            PPP_sim_LessThan_obs
## visual=~x9                0.984
## x7~~x8                    0.930
## x8~~x9                    0.686
## x4~~x6                    0.553
## visual=~x7                0.987

But according to the posterior median, the parameter that would have the highest impact would be the residual correlation between indicators x7 and x8

summary(out, prob=.9, discFUN = "mod.ind_mi", sort.by="Median", decreasing=T)[1:5,]
## 
## Posterior summary statistics and highest posterior density (HPD) 90% credible intervals
##  for the posterior distribution of realized discrepancy-function values based on observed data, 
##  along with posterior predictive p values to test hypotheses in either direction:
## 
## 
##                EAP Median    MAP     SD  lower  upper PPP_sim_GreaterThan_obs
## x7~~x8      33.864 36.587 40.254 14.036  7.381 54.461                   0.070
## visual=~x9  34.967 35.022 34.859 11.168 16.710 53.630                   0.016
## visual=~x7  18.482 16.693 15.219  9.983  4.402 33.339                   0.013
## x8~~x9      26.327 12.393  2.476 42.480  0.000 67.314                   0.314
## textual=~x1 10.748  9.599  1.814  7.986  0.000 21.836                   0.226
##             PPP_sim_LessThan_obs
## x7~~x8                     0.930
## visual=~x9                 0.984
## visual=~x7                 0.987
## x8~~x9                     0.686
## textual=~x1                0.774

The MI is still recommended as the best metric to indicate which parameter is best to include next, and we can use the SEPC to evaluate the likely effect size for the respective parameters.

summary(out, prob=.9, discFUN = "mod.ind_sepc.all", sort.by="EAP", decreasing=T)[1:5,]
## 
## Posterior summary statistics and highest posterior density (HPD) 90% credible intervals
##  for the posterior distribution of realized discrepancy-function values based on observed data, 
##  along with posterior predictive p values to test hypotheses in either direction:
## 
## 
##               EAP Median   MAP    SD lower upper PPP_sim_GreaterThan_obs
## x7~~x8      0.790  0.780 0.734 0.402 0.471 1.229                   0.040
## visual=~x9  0.521  0.500 0.457 0.140 0.346 0.718                   0.009
## textual=~x1 0.269  0.293 0.327 0.168 0.026 0.510                   0.132
## x1~~x9      0.246  0.247 0.241 0.039 0.196 0.301                   0.018
## x2~~x3      0.222  0.222 0.217 0.036 0.176 0.281                   0.021
##             PPP_sim_LessThan_obs
## x7~~x8                     0.960
## visual=~x9                 0.991
## textual=~x1                0.868
## x1~~x9                     0.982
## x2~~x3                     0.979

Here we see that for the 2 highest parameters, the likely SEPC is x7~~x8 = 0.790347008084588 and visual=~x9 = 0.521331916762134. With this information we can decide to include one of these new parameters in the model (one at the time). For this example, because factor loadings have a larger impact on the model-implied covariance matrix, I would choose visual=~x9

HS.model <- ' visual  =~ x1 + x2 + x3 + x9
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '

fit2 <- bcfa(HS.model, data=HolzingerSwineford1939, std.lv=TRUE)

And you can check if the added parameter has the expected impact on overall fit with the blavFitIndices() and the summary() functions.

It is important to consider also the theoretical relevance of the suggested parameters, and to ensure that they make sense, instead of just adding parameters until having good fit.

Summary

In this tutorial we show how to calculate the MI and SEPC across posterior distributions, and evaluate which parameters can be added.

With the ppmc() function we are able to calculate relevant information after model estimation, and build posterior distributions of them.

The general recommendations are to use MI to identify the most likely parameter to add, and SEPC as the effect size of the new parameter.

References

Bentler, P. M. 1990. “Fit Indexes, Lagrange Multipliers, Constraint Changes and Incomplete Data in Structural Models.” Multivariate Behavioral Research 25 (2): 163–72. https://doi.org/10.1207/s15327906mbr2502_3.
Holzinger, K. J., and F. A. Swineford. 1939. A Study of Factor Analysis: The Stability of a Bi-Factor Solution. Supplementary Educational Monograph 48. Chicago: University of Chicago Press.
Whittaker, Tiffany A. 2012. “Using the Modification Index and Standardized Expected Parameter Change for Model Modification.” The Journal of Experimental Education 80 (1): 26–44. https://doi.org/10.1080/00220973.2010.531299.