I plan to run some moderated/mediated latent growth curve models. The following variables were collected for 2000 subjects:
- A latent outcome (Y) measured at 4 timepoints (TP) and derived from a bifactor model of 4 factors
- 3 latent predictors (P), 2 derived from a scale measured at 4 TPs and 1 derived from a bifactor model of 6 factors estimated at TP1 (time-invariant).
Example growth models include:
- The slopes of the 2 time-varying Ps predicting the random slope of Y whilst moderated by the time-invariant P
- A direct path of the time-invariant P on the random slope of Y, and indirect paths via the 2 time-varying P
Q1: These growth models can be implemented using an SEM framework with a wide data structure, or MLM framework with a long data structure. Which approach would you recommend and why? (besides the difference in missing data handling).
Q2: My models appear computationally demanding (e.g., deriving predictors and outcomes from bifactor models, estimating second order growth factors, including random slopes). Rather than estimating the latent outcome and predictor variables and growth factors, I could run the growth models on factor scores of the predictors and outcomes extracted in separate analyses. Some have argued against using factor scores for SEM/MLM and I wondered what you would recommend and why?
I can't seem to find how to acquire information functions for factor scores, but since its a plot, I'm guessing one could use:
PLOT: TYPE = PLOT3;
And just to clarify for those deciding between MLM and SEM for LGM:
- Wide-data analyses can handle missing data with FIML, which is more efficient (but not necessarily more effective) than multiple imputation (cf https://goo.gl/x0r02s). This would be another reason for using a wide-data format, since FIML with long data does not account for the clustering within subjects. Note that Mplus is one of the only programs that offers FIML.
- One can run a MLM (or 'two-level' in Mplus) with wide-data; you need not be constrained to long-data. However, a two-level growth model with time at the within level and subject at the between can be more computationally efficient.
- I hope I'm not seen as a traitor here but STATA has a nice function for converting data from wide to long and vice versa, so you can try both approaches in Mplus!
Just to continue the discussion about using a wide or long data format:
- A long data format can be less computationally taxing. For instance, imagine estimating a bifactor CFA with 3 specific factors and 1 general factor at four time-points, and then some latent growth factors for each CFA factor. With a wide data format, you would have to estimate 24 latent factors (e.g., 4 CFA factors by 4 timepoints + 4 intercepts and 4 slopes for the latent growth factors). However, with the long data format, you would only need to estimate the bifactor CFA once and then regress each factor on time (and perhaps estimate random slopes for each).
- With the long data format, you can use all available data points without estimating the missing data. This is useful in an intent-to-treat analysis, where you use all available once subjects have been randomized, regardless of whether they completed the treatment earlier, withdrew, were non-compliant, etc. The wide data format with an ML estimator 'fills in' the missing data points based on other data for that subject by default - but you may not want to estimate the missing data.