The assumption is normality conditional on the covariates. With a factor model with binary and continuous indicators and no covariates, normality is fulfilled if assuming normal factors and normal residuals for the continuous indicators. If the continuous indicators are non-normal, non-normality-robust chi-square and SEs are useful. Finite mixture modeling with latent classes can be used to capture non-normality.
I plan a LGMM on my data T1-T5. However, especially on the first measurements my outcome is moderately skewed (1.3-1.5), later on it becomes better (<1.0). Reviewers often argument in line with Bauer et al. that nonormal data can easily lead to retention of spurious latent classes. How can one argument against that, given one has indeed nonnormal data, like me? Is it necessary to adress nonnormality by e.g. applying a transformation and would that help to solve the problem claimed by Bauer and others? I'm always a little reluctant using transformations due to loss of interpretability.
To start with your last question, I don’t think you should transform your data. Mixture modeling is designed to capture non-normal outcomes and you would lose information on substantively important latent classes that are potentially present.
Regarding the Bauer-related argument that you mention, it is correct that non-normality of an outcome can be represented by latent classes and in some cases those latent classes have no substantive meaning or usefulness. Having BIC point to more than one class does not imply that the classes are substantively meaningful. So that is a good warning.
Having said that, let me deviate from the rule of multiple postings on the same topic and expand on my answer in my next post.
I do, however, think the Bauer warning has been overstated and may lead some researchers to stay with regular single-class growth modeling with potentially distorted results. It should be self evident that it is the responsibility of the researcher to argue for the substantive meaningfulness of the classes. And this is more easily done with GMM and its use of longitudinal data such as yours as compared to cross-sectional data. I have argued in several articles – the first two listed below – that the statistical analysis must be augmented by substantive/theoretical arguments and/or auxiliary variable information in order to claim that these classes are meaningful. One example is placebo response in antidepressant trials. When I find two classes for the placebo group and see a flat non-response class curve shape and a response class that has the curve shape predicted by the ample theory/substantive reasoning on depression placebo response, then the two classes most likely have substantive meaning. Their existence generates the non-normality of the depression outcomes. Another example is math achievement where I include in the model a predictive validity aspect, namely how the failing math trajectory class differs sharply in its proportion of later high school dropout. Both examples - included in the Muthen-Asparouhov (2008) chapter below - show how to not frivolously choose to overlay a substantive meaning, or simply make an assumption of meaningful classes.
To relate this to your specific question, the skewness you see at your first measurement point that subsequently vanishes could have a 2-class explanation. A subgroup of individuals may have high engagement already from the start - and perhaps stay high or increase more over time than a second sub group, resulting in a qualitatively different trajectory shape. So, having more than one class generates the skewness. If you have a theory for why there would be two or more classes, then the skewness you observe is what you would expect. But that argument has to be supported by substantive theory and/or auxiliary variables. If you lack that, you want to design a new study to shed more light on your exploratory findings.
Muthen, B. (2003). Statistical and substantive checking in growth mixture modeling. Psychological Methods, 8, 369-377.
Muthen, B. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (ed.), Handbook of quantitative methodology for the social sciences (pp. 345-368). Newbury Park, CA: Sage Publications.
Muthen, B. & Asparouhov, T. (2008). Growth mixture modeling: Analysis with non-Gaussian random effects. In Fitzmaurice, G., Davidian, M., Verbeke, G. & Molenberghs, G. (eds.), Longitudinal Data Analysis, pp. 143-165. Boca Raton: Chapman & Hall/CRC Press.
Finally, an additional article is useful in contrasting regular single-class growth modeling ("LGM" or "HLM"), Nagin's LCGA, and the GMM we are discussing. All articles mentioned are on the Mplus web site under Papers, Growth Mixture Modeling.
Kreuter, F. & Muthen, B. (2008). Analyzing criminal trajectory profiles: Bridging multilevel and group-based approaches using growth mixture modeling. Journal of Quantitative Criminology, 24, 1-31.
If I may add to this very interesting post (thank you very much for this Dr Muthén, it will hopefully help to clarify things "out there"), I would say that the Bauer issue is important but confusing since it creates a "what comes first" dilemma: 1) Are spurious latent classes appearing to portray non-normality in the data OR 2) Is the data non normal because latent classes exist. And, for the moment, I don’t think there is another way than to rely on theory to solve this issue. And here, if I'm wrong, it would be nice to know :-)
I think you can do more than rely on theory to gauge if your classes are substantively meaningful - and practically useful, which is another important matter. We can build our theoretical predictions into the model as background variables, concurrent variables, and distal variables.
The math example illustrates that. Dropout theory in education talks about a group of students who are "disengaged" from school life and therefore are more likely to drop out. Well, based on that theory we can add the variable dropout to our GMM, predicting dropout by the latent class membership. Poorly developing mathematics performance can be seen as an indicator of disengagement. If membership in a poorly developing class gets a much higher dropout percentages, then that is a form of predictive validity for the latent class formation. Wouldn't it be great to be able to predict high school dropout early? This is from my 2004 chapter listed earlier.
So build models based on your theories to see if the theories hold up.
All models come with assumptions that you have to be aware of and challenge based on your substantive theory and data exploration. In the context of binary growth modeling, the standard (single-class) model assumes normality for the random effects (the growth factors). This is the case in all multilevel software. That assumption can be challenged but it is probably a good baseline model - and it is used all the time. It may be wrong - a skewed distribution may better represent the subject matter situation - but hopefully using the normal model does not lead you too far astray.
Same thing in the growth mixture modeling of binary outcomes (and continuous outcomes) where the assumption is within-class normality of the random effects. The subject matter may be better portrayed by some type of within-class non-normality, or by a single class with a non-normal distribution. Still, doing the analyses in conjunction with substantive theory reasoning and data exploration using auxiliary information hopefully leads you to learn something useful.