

MLM v SEM for Growth Models & using F... 

Message/Author 


I plan to run some moderated/mediated latent growth curve models. The following variables were collected for 2000 subjects:  A latent outcome (Y) measured at 4 timepoints (TP) and derived from a bifactor model of 4 factors  3 latent predictors (P), 2 derived from a scale measured at 4 TPs and 1 derived from a bifactor model of 6 factors estimated at TP1 (timeinvariant). Example growth models include:  The slopes of the 2 timevarying Ps predicting the random slope of Y whilst moderated by the timeinvariant P  A direct path of the timeinvariant P on the random slope of Y, and indirect paths via the 2 timevarying P Q1: These growth models can be implemented using an SEM framework with a wide data structure, or MLM framework with a long data structure. Which approach would you recommend and why? (besides the difference in missing data handling). Q2: My models appear computationally demanding (e.g., deriving predictors and outcomes from bifactor models, estimating second order growth factors, including random slopes). Rather than estimating the latent outcome and predictor variables and growth factors, I could run the growth models on factor scores of the predictors and outcomes extracted in separate analyses. Some have argued against using factor scores for SEM/MLM and I wondered what you would recommend and why? 


Q1: Wide format enables measurement invariance testing Q2: Don't use factor scores if you don't have to. See our FAQ: Factor scores 


Thank you for answering those questions, Bengt. I can't seem to find how to acquire information functions for factor scores, but since its a plot, I'm guessing one could use: PLOT: TYPE = PLOT3; And just to clarify for those deciding between MLM and SEM for LGM:  Widedata analyses can handle missing data with FIML, which is more efficient (but not necessarily more effective) than multiple imputation (cf https://goo.gl/x0r02s). This would be another reason for using a widedata format, since FIML with long data does not account for the clustering within subjects. Note that Mplus is one of the only programs that offers FIML.  One can run a MLM (or 'twolevel' in Mplus) with widedata; you need not be constrained to longdata. However, a twolevel growth model with time at the within level and subject at the between can be more computationally efficient.  I hope I'm not seen as a traitor here but STATA has a nice function for converting data from wide to long and vice versa, so you can try both approaches in Mplus! 


I just discovered that Mplus has a neat longtowide and widetolong conversion function (see UG for Mplus 8.0, Chapter 15, p. 580586). Please excuse the treachery in the last comment! 


Just to continue the discussion about using a wide or long data format:  A long data format can be less computationally taxing. For instance, imagine estimating a bifactor CFA with 3 specific factors and 1 general factor at four timepoints, and then some latent growth factors for each CFA factor. With a wide data format, you would have to estimate 24 latent factors (e.g., 4 CFA factors by 4 timepoints + 4 intercepts and 4 slopes for the latent growth factors). However, with the long data format, you would only need to estimate the bifactor CFA once and then regress each factor on time (and perhaps estimate random slopes for each).  With the long data format, you can use all available data points without estimating the missing data. This is useful in an intenttotreat analysis, where you use all available once subjects have been randomized, regardless of whether they completed the treatment earlier, withdrew, were noncompliant, etc. The wide data format with an ML estimator 'fills in' the missing data points based on other data for that subject by default  but you may not want to estimate the missing data. For more advantages of the long data format, I would strongly recommend reading a paper by Kwok et al. (2008) (see https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2613314/#R27) 


Regarding the CFA example, one drawback of the long twolevel format is that you have to assume measurement invariance across time whereas this can be tested in the wide format. 


This is a fantastic thread. Matthew: regressing each factor (or information function) on time establishes your curves and random slopes around those curves would allow for predicting, say, group (e.g., depressed, conduct DO) differences on those curves. This is my understanding. Is this your understanding? Bengt: If MLM and SEM approaches should be mathematically the same (in most cases) is it not the case that MI could be inferred if, in the SEM framework, evidence for configural, metric, and scalar invariance over time is equivalent to (i.e., the relevant estimates) the L1 fixed effect in MLM for the factor structure (i.e., the average factor structure parameter estimates across individuals over time)? 


^re above: just realized that my query regarding MI doesn't help if one only wants to run a model using one approach, namely MLM. 


Right, if computationally you could do SEMwide invariance testing and get MI confirmed, then you can follow with long analysis. See also the new possibilities in long format that allow some measurement changes over time using crossclassified analysis. This is described both in my SMEP talk and in our Topic 10 Short Course video and handout. 

Back to top 

