I plan to run some moderated/mediated latent growth curve models. The following variables were collected for 2000 subjects:
- A latent outcome (Y) measured at 4 timepoints (TP) and derived from a bifactor model of 4 factors
- 3 latent predictors (P), 2 derived from a scale measured at 4 TPs and 1 derived from a bifactor model of 6 factors estimated at TP1 (time-invariant).
Example growth models include:
- The slopes of the 2 time-varying Ps predicting the random slope of Y whilst moderated by the time-invariant P
- A direct path of the time-invariant P on the random slope of Y, and indirect paths via the 2 time-varying P
Q1: These growth models can be implemented using an SEM framework with a wide data structure, or MLM framework with a long data structure. Which approach would you recommend and why? (besides the difference in missing data handling).
Q2: My models appear computationally demanding (e.g., deriving predictors and outcomes from bifactor models, estimating second order growth factors, including random slopes). Rather than estimating the latent outcome and predictor variables and growth factors, I could run the growth models on factor scores of the predictors and outcomes extracted in separate analyses. Some have argued against using factor scores for SEM/MLM and I wondered what you would recommend and why?
I can't seem to find how to acquire information functions for factor scores, but since its a plot, I'm guessing one could use:
PLOT: TYPE = PLOT3;
And just to clarify for those deciding between MLM and SEM for LGM:
- Wide-data analyses can handle missing data with FIML, which is more efficient (but not necessarily more effective) than multiple imputation (cf https://goo.gl/x0r02s). This would be another reason for using a wide-data format, since FIML with long data does not account for the clustering within subjects. Note that Mplus is one of the only programs that offers FIML.
- One can run a MLM (or 'two-level' in Mplus) with wide-data; you need not be constrained to long-data. However, a two-level growth model with time at the within level and subject at the between can be more computationally efficient.
- I hope I'm not seen as a traitor here but STATA has a nice function for converting data from wide to long and vice versa, so you can try both approaches in Mplus!
Just to continue the discussion about using a wide or long data format:
- A long data format can be less computationally taxing. For instance, imagine estimating a bifactor CFA with 3 specific factors and 1 general factor at four time-points, and then some latent growth factors for each CFA factor. With a wide data format, you would have to estimate 24 latent factors (e.g., 4 CFA factors by 4 timepoints + 4 intercepts and 4 slopes for the latent growth factors). However, with the long data format, you would only need to estimate the bifactor CFA once and then regress each factor on time (and perhaps estimate random slopes for each).
- With the long data format, you can use all available data points without estimating the missing data. This is useful in an intent-to-treat analysis, where you use all available once subjects have been randomized, regardless of whether they completed the treatment earlier, withdrew, were non-compliant, etc. The wide data format with an ML estimator 'fills in' the missing data points based on other data for that subject by default - but you may not want to estimate the missing data.
Matthew: regressing each factor (or information function) on time establishes your curves and random slopes around those curves would allow for predicting, say, group (e.g., depressed, conduct DO) differences on those curves. This is my understanding. Is this your understanding?
Bengt: If MLM and SEM approaches should be mathematically the same (in most cases) is it not the case that MI could be inferred if, in the SEM framework, evidence for configural, metric, and scalar invariance over time is equivalent to (i.e., the relevant estimates) the L1 fixed effect in MLM for the factor structure (i.e., the average factor structure parameter estimates across individuals over time)?
Right, if computationally you could do SEM-wide invariance testing and get MI confirmed, then you can follow with long analysis.
See also the new possibilities in long format that allow some measurement changes over time using cross-classified analysis. This is described both in my SMEP talk and in our Topic 10 Short Course video and handout.