MLM v SEM for Growth Models & using F...
Message/Author
 Matthew Constantinou posted on Saturday, June 03, 2017 - 10:47 am
I plan to run some moderated/mediated latent growth curve models. The following variables were collected for 2000 subjects:

- A latent outcome (Y) measured at 4 timepoints (TP) and derived from a bifactor model of 4 factors

- 3 latent predictors (P), 2 derived from a scale measured at 4 TPs and 1 derived from a bifactor model of 6 factors estimated at TP1 (time-invariant).

Example growth models include:

- The slopes of the 2 time-varying Ps predicting the random slope of Y whilst moderated by the time-invariant P

- A direct path of the time-invariant P on the random slope of Y, and indirect paths via the 2 time-varying P

Q1: These growth models can be implemented using an SEM framework with a wide data structure, or MLM framework with a long data structure. Which approach would you recommend and why? (besides the difference in missing data handling).

Q2: My models appear computationally demanding (e.g., deriving predictors and outcomes from bifactor models, estimating second order growth factors, including random slopes). Rather than estimating the latent outcome and predictor variables and growth factors, I could run the growth models on factor scores of the predictors and outcomes extracted in separate analyses. Some have argued against using factor scores for SEM/MLM and I wondered what you would recommend and why?
 Bengt O. Muthen posted on Saturday, June 03, 2017 - 2:49 pm
Q1: Wide format enables measurement invariance testing

Q2: Don't use factor scores if you don't have to. See our FAQ: Factor scores
 Matthew Constantinou posted on Sunday, June 04, 2017 - 10:34 am
Thank you for answering those questions, Bengt.

I can't seem to find how to acquire information functions for factor scores, but since its a plot, I'm guessing one could use:

PLOT: TYPE = PLOT3;

And just to clarify for those deciding between MLM and SEM for LGM:

- Wide-data analyses can handle missing data with FIML, which is more efficient (but not necessarily more effective) than multiple imputation (cf https://goo.gl/x0r02s). This would be another reason for using a wide-data format, since FIML with long data does not account for the clustering within subjects. Note that Mplus is one of the only programs that offers FIML.

- One can run a MLM (or 'two-level' in Mplus) with wide-data; you need not be constrained to long-data. However, a two-level growth model with time at the within level and subject at the between can be more computationally efficient.

- I hope I'm not seen as a traitor here but STATA has a nice function for converting data from wide to long and vice versa, so you can try both approaches in Mplus!
 Matthew Constantinou posted on Sunday, June 04, 2017 - 12:08 pm
I just discovered that Mplus has a neat long-to-wide and wide-to-long conversion function (see UG for Mplus 8.0, Chapter 15, p. 580-586). Please excuse the treachery in the last comment!
 Matthew Constantinou posted on Tuesday, June 20, 2017 - 4:09 am
Just to continue the discussion about using a wide or long data format:

- A long data format can be less computationally taxing. For instance, imagine estimating a bifactor CFA with 3 specific factors and 1 general factor at four time-points, and then some latent growth factors for each CFA factor. With a wide data format, you would have to estimate 24 latent factors (e.g., 4 CFA factors by 4 timepoints + 4 intercepts and 4 slopes for the latent growth factors). However, with the long data format, you would only need to estimate the bifactor CFA once and then regress each factor on time (and perhaps estimate random slopes for each).

- With the long data format, you can use all available data points without estimating the missing data. This is useful in an intent-to-treat analysis, where you use all available once subjects have been randomized, regardless of whether they completed the treatment earlier, withdrew, were non-compliant, etc. The wide data format with an ML estimator 'fills in' the missing data points based on other data for that subject by default - but you may not want to estimate the missing data.

For more advantages of the long data format, I would strongly recommend reading a paper by Kwok et al. (2008) (see https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2613314/#R27)
 Bengt O. Muthen posted on Tuesday, June 20, 2017 - 5:27 pm
Regarding the CFA example, one drawback of the long two-level format is that you have to assume measurement invariance across time whereas this can be tested in the wide format.
 J.D. Haltigan posted on Friday, December 14, 2018 - 4:25 am

Matthew: regressing each factor (or information function) on time establishes your curves and random slopes around those curves would allow for predicting, say, group (e.g., depressed, conduct DO) differences on those curves. This is my understanding. Is this your understanding?

Bengt: If MLM and SEM approaches should be mathematically the same (in most cases) is it not the case that MI could be inferred if, in the SEM framework, evidence for configural, metric, and scalar invariance over time is equivalent to (i.e., the relevant estimates) the L1 fixed effect in MLM for the factor structure (i.e., the average factor structure parameter estimates across individuals over time)?
 J.D. Haltigan posted on Friday, December 14, 2018 - 5:03 am
^re above: just realized that my query regarding MI doesn't help if one only wants to run a model using one approach, namely MLM.
 Bengt O. Muthen posted on Friday, December 14, 2018 - 11:46 am
Right, if computationally you could do SEM-wide invariance testing and get MI confirmed, then you can follow with long analysis.

See also the new possibilities in long format that allow some measurement changes over time using cross-classified analysis. This is described both in my SMEP talk and in our Topic 10 Short Course video and handout.