Message/Author 


Hi, I would like to inform if it is possible to include a latent class variable in a multilevel model. For instance in school effectiveness reseach: some characteristics could possibly distinguish 5 classes of schools. Would it be possible to use this latent class variable to predict student outcomes at the lowest level. Or is this impossible? If so, are there references to check for more details about this option? thank you for the information 


Version 3 of Mplus can estimate multilevel mixture models. The Version 3 User's Guide will describe how to set these models up. 

shige posted on Monday, March 29, 2004  6:32 pm



Hi Linda, In order to estimate multilevel mixture model (that is, multilevel model with finite mixture random component), I need the base package, the multilevel addon, and the mixture addon, am I correct about this? Thanks! Shige 


You would need the combination addon to estimate a multilevel mixture model. 

Joyce T. posted on Monday, April 04, 2005  6:36 am



I'am running a multilevel model (using ML) which contains 20 dependant variables, 12 independant variables and 3 continuous latent variables. I would like to know how mplus compute the degrees of freedom for both, the Chisquare test of model fit and the Chisquare test of model fit for the baseline model. Thanks. 

BMuthen posted on Wednesday, April 06, 2005  3:03 am



The degrees of freedom is the number of parameters in the H1 model minus the number of parameters in the H0 model. The chisquare test of model fit for ML uses as H1 a model with free means, and free variances and covariances for both within and between. The baseline model is a model of free means and variances for between and within. 


I am working on a multilevel LCA in a schoolbased data set. The variables of interest are 1) parent involvement (a child level variable) and 2) classroom quality (a classroom level variable). The ultimate goal is to use the latent classes as independent variables to predict child outcomes such as school readiness. 1) One question that has come up in our discussions is how to deal with important covariates (such as maternal education, child age, child sex, child ethnicity). I was wondering if you could speak to the differences between a) including the covariates in the model that estimates the latent class memberships vs. b) running posthoc ANOVAS to examine the distribution of the profiles on these important covariates. 2) Also I am interested in how using the Latent Classes as independent variables to predict other outcomes would shift class memberships. Does this often happen when adding a predictive step into LCA/LPA models? 3) Finally we are using a national database with sampling weights, will weighting the data influence the LCA/LPA outcomes? Thank you so much for your time 


1. Generally it is best to estimate the full model simultaneously. See the following paper which is available on the website for further information: Relating latent class analysis results to variables not included in the analysis. Submitted for publication. 2. If you add outcomes other than the original set, this will most likely change class membership and perhaps it should. See the following related paper which is available on the website: Muthén, B. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (ed.), Handbook of quantitative methodology for the social sciences (pp. 345368). Newbury Park, CA: Sage Publications. 3. You can and should include complex survey data features in the analysis. See the user's guide under complex survey data to see the options available in Mplus. 


Thank you for your quick reply and for the references. Best, AnnMarie 


Hello Linda I have two follow up questions for the multilevel LCA we are working on. 1) Standardized Scores: Do you suggest running the models with standardized continuous indicators? Or is it acceptable to keep the indicators of profiles in their original metric (even if variances are different among indicators?) 2) Predicting Outcomes from Profile Membership: Also, for estimating the relationship between different profiles and an outcome (say literacy achievement) we have been including the outcome of interest as an indicator of profile membership. We then ran a Wald statistic to examine profile differences on the mean estimates of the outcome. Is this how you would suggest estimating profile differences on an outcome? Thank you AnnMarie 


1. I would not standardize. 2. Yes. 


I have a question regarding a multilevel LCA too. I am using a sample of twins in which I would like to identify the geneticenvironmental etiology of class membership. I also have predictors at the within and the between level. Since regressing the measured environmental variable doesn't seem to modify the ACE results (as seen in Turkheimer et al., 2005), I would like to use another strategy developed by Rasgach, O'Connor and Jenkins in which the genetic resemblance is a fixed effect. Although I think this strategy is the best, I'm not sure how to actually bring the equation in a Mplus input. The equation is : y(ij) = Beta(o) + u(j) + e(ij) + g(ij) Basically, the only thing that changes from a regular multilevel model is the g(ij) which is the genetic effect for the child (i) in the j'th family which varies for all individuals according to behavior genetic assumptions (it can be used with complex family pedigrees). I am wondering what do I write in the input. I think they use a single group to do the analyses and it departs from the multiple group analyses I am used to with twin samples. Maybe the paper would make it clearer : http://www.cmm.bristol.ac.uk/publications/jrgenetics.pdf The equation is on page 8. 


Page 8, bottom, suggests that a covariance for the g(ij) term is a function of known constants. This reminds me of "QTL" modeling which is shown in the UG ex 5.23. This UG example shows how to use the Constraint= approach to moderate a covariance using readin values. Perhaps that is a path towards doing what you want. 


In running the multilevel LPA I mentioned earlier, we see from the output that the means of the indicators in the level 2 profiles are constrained to be equal across profiles. It appears that this is the default in Mplus. Is it possible simultaneously estimate a latent profile of level 1 (child level variables) and a latent profile of level 2 (classroom level variables) without the level 2 means being fixed across level 2 profiles? I attempted to override this estimation with starting values, but am getting repeated errors messages: The following MODEL statements are ignored: * Statements in Class %CB#1% of MODEL CB on the BETWEEN level: SSCS98 LSCS98 ECPERSS98 ECFURNS98 ECLANGS98 ECMOTRS98 ECCREAS98 ECSOCLS98 LTARNS98 INTERS * Statements in Class %CB#2% of MODEL CB on the BETWEEN level: SSCS98 LSCS98 ECPERSS98 ECFURNS98 ECLANGS98 ECMOTRS98 ECCREAS98 ECSOCLS98 INTERS *** ERROR One or more MODEL statements were ignored. These statements may be incorrect. Do you have any suggestions? Thank you again, AnnMarie 


Please send the full output and your license number to support@statmodel.com. 


Dear Dr. Muthén, I want to perform LCA on a complex dataset (teachers were rating students), and want to control for clustering effect. However I cannot define Type=complex, since this is already done with Type=mixture. How can I use the clustering or stratification options in this type of analysis? Thanks a lot. Robert 


You can use TYPE = MIXTURE COMPLEX; 


I am working on a twolevel LCA where the variables of interest include both childlevel (observed child interactions) and classlevel variables (classroom quality). I have specified two classes at each level. I would like to see if the resulting four profiles differ on childlevel school readiness outcomes. Is there a way to get an outcome mean for each profile? I am only able to get two means (one for each of the within classes) rather than four means (one for each of the four profiles). Thank you! 


Take a look at the 2010 HenryMuthen article in the SEM journal. For the models of figures 14 the between classes only make the within classes more or less likely but don't change the profiles of the observed items. In contrast, for figure 5 and on there are itemspecific differences across the betweenlevel classes so that would give the profile differences you expect. 

K Frampton posted on Wednesday, March 16, 2011  11:23 am



Hello, I am running a multilevel LPA with continuous parenting indicators, with children nested within families. My goal is to identify parenting profiles, observe how they differ across various factors (mainly SES), and then use profile membership to predict a distal outcome (children's prosocial skills) in interaction with SES. I first fit the model in a single level, and then in a multilevel. 4 classes were identified. Entropy is .86. I then added covariates of interest(e.g., age of child, SES variables), to identify what distinguishes these groups. When I do this, the structure of the classes changes significantly. I know this is because of measurement variance issues. When I regress parenting indicators on covariate(s) in a single level, it improves the fit of the model. However, in a multilevel, when I do the same thing, computation time was + 2 hours, and it did not converge. Any suggestions on how to get around this measurement variance issue in a multilevel? Because entropy is high, is it feasible in a multilevel framework to save the classes identified and then work with them as an observed variable, as you might do in a single level? Also, with all this in mind  how would you suggest answering my final question  how SES X profile predicts a distal outcome? I am grateful for any recommendations. Thanks. Kristen 


It sounds like you have measurement noninvariance and that you add direct effects from covariates to indicators to take this into account. Note that you cannot identify a model with all direct effects. To see your multilevel problem, you would have to send your input, output, data, and license number to support@statmodel.com. SES X profile influencing a distal can be handled by distal regressed on SEM with different slopes in the different profiles. 

IYH Boon posted on Monday, June 27, 2011  12:58 pm



Are there any examples/code snippets available for situations like the one K Frampton describes, above, where the goal is to (1) identify latent profiles at level two and (2) relate these profiles to a distal outcome observed at level one? I'm working on a similar problem and am unsure about how to specify the model statement. Thanks in advance for the help, IYH 


I don't think we have that in script or paper form, but you would work along the lines of the below. This creates a betweenlevel (say school) latent class variable cb from the betweenlevel z indicators and cb influences the means of the random intercept for the distal outcome d (which is say a student variable varying on both within and between), which is how the cb influence carries over to the student's distal outcome.  VARIABLE: Between = z1z10 cb; classes = cb(2);  MODEL: %within% d on x;  %between% d; I don't think you have to say more in MODEL because the z means vary across the cb classes as the default, and so does the d mean, where on between d is the random intercept in the regression of d on x. Hope this start helps. 

Junqing Liu posted on Thursday, August 25, 2011  12:50 pm



I am new to Mplus and LPA. I am working on a twolevel LPA in a workforce data set. The variables of interest are 1) organization culture (a level 2 latent variable based on five level 2 continuous indicators) and 2) worker demography and practices and (level 1 observed variables). The goal is to use the latent classes as independent variables to predict workers' practice such as using a type of therapy. 1)One question is do i need to run the LPA first to get the classes(say there will be 2 or 3 categories)before including the latent class variable into the final model to predict the worker outcome? 2) In the final model using the org. culture class membership to predict worker outcome, do i need to include the observed organization id as a predicting variable to declare this is twolevel model? 3) What is the output of the final model? Is it separate regression models for each category of org. culture? Or is it one regression? 4) Is there any empirical research reference on crosssectional multilevel LPA analysis that I can read? Many thanks! 


1) Although not necessary, it's a good idea. 2) Orginization would be your Cluster= variable  see UG. 3)4) You should read Henry, K. & Muthén, B. (2010). Multilevel latent class analysis: An application of adolescent smoking typologies with individual and contextual predictors. Structural Equation Modeling, 17, 193215. which you can find on our web site. 

Junqing Liu posted on Friday, August 26, 2011  1:48 pm



Hi Bengt, This is extremely helpful! I can compare different LPA models and pick one that fits the best as the final latent profile model to do further analysis. I have some followup questions about the further analysis. 1.How common is it to use the latent profile variable as a predictor along with other covariates, rather than as a dependent variable? 2. If it is common, then should a level 1 latent profile variable be included as a regular categorical covariate (along with other level 1 and level 2 predictors) to predict a level 1 outcome or the way to included it depends on how the latent profile variable is modeled such as a twolevel latent profile model with level 2 factor on random latent class intercepts and level 2 factor on random latent class indicators? 3. Is there any empirical research reference on using multilevel latent profile variable as predictor that I can read? Thank you for your patience with my longwindedness. 


1. It is getting more used now that software is available for easy use. There are papers on our web site showing this. But I would not say that it is common yet. 2. A latent class variable should be included as a predictor if substantive theory warrants that. Note, however, that you don't say "y ON c" (for a distal outcome y), but Mplus lets the y means change over the latent classes. 3. I have not seen multilevel latent profile used as a predictor yet in the literature, but there is nothing precluding it. The approach used in Henry & Muthen can easily be expanded to that using Mplus. 

Junqing Liu posted on Friday, September 09, 2011  6:56 am



Thanks, Bengt. This is very helpful. I tried the following threeclass twolevel LPA of org. culture. However, the output does not include results on latent classes. All the results are about correlations and covariance. The five latent indicators are the mean score of scales and the value ranging from 1 to 5. 1) How may i change the following syntax to get output on latent classes? 2) Is it ok to use the mean of scales as latent indicators? Or is it better to use the items within the scales as indicators? Thank you very much. File:xxx VARIABLE: Usevariables = l_coh_ag L_aut M_Coll M_FocOut M_RoCn ORGID; Classes = c(3); Cluster=ORGID; Within=l_coh_ag L_aut M_Coll M_FocOut M_RoCn; ANALYSIS: type = mixture twolevel; starts=20 10; Process=8(STARTS); Model: %WITHIN% %OVERALL% %BETWEEN% %OVERALL% C#1; C#2; C#1 WITH C#2; output: tech11; 


Please send your output to support. 

Junqing Liu posted on Tuesday, September 13, 2011  12:49 pm



Thanks, Bengt and Linda. The technical problem is solved. I have a couple of questions related to the results of the twolevel model i mentioned earlier. 1) Are tech 11 and tech 14 applicable to a twolevel LPA model? If so, when the LMR p value of tech 11 is not but the p of tech 14 is significant, should i pick a K as oppose to a k1 class model? 2) The BIC and AIC of the twolevel model are smaller than those of a single level model, but not by much (e.g. the adjusted BIC is 2542.96 for the twolevel model and is 2559.12 for the single level model). In this case, should i still choose the twolevel model? 3) What the following specification means, especially C#1 WITH C#2? %BETWEEN% %OVERALL% C#1; C#2; C#1 WITH C#2; Thank you very much! 


1) In principle tech14 is more reliable however you might also want to look at BIC and AIC for the two models. 2) You should also consider the size of the variance on the between level and see if it is significant. Note also that the performance of BIC would depend on the number of clusters (twolevel units) which drives the asymptotics. Ultimately a simulation study would show if AIC and BIC are useful in this context. This is not well studied. 3) These are the interaction parameters in a loglinear model for contingency tables. For example in http://www.education.umd.edu/EDMS/fac/Hancock/Course_Materials/EDMS771/readings/LogLinearModels%20reading.pdf these are the lambda_IJ parameters. 


Actually the answer to #3 above is not correct at all. Here is the correct answer. C#1 and C#2 are alpha1j and alpha2j in formula (4) in http://statmodel.com/download/MultilevelMixtures2.pdf These are normally distributed random effects that vary over clusters and allow the class proportions to change over clusters. The covariance term C#1 WITH C#2; is just the covariance between the two random effects and C#1; C#2; are the two variances. 

Mary Campa posted on Thursday, September 15, 2011  4:40 pm



Hello Dr. Muthen, I am reading your paper with Dr. Henry (2010, SEM 17m 193215)and trying to replicate the analysis titled: Three classes at level 1, Two classes at level 2 random effects model: nonparametric approach (Model 4a from Table 1). I am using 4 classes at level one but otherwise the model statements are the same. However, I continue to get this error message: ** ERROR in MODEL command Unknown class model name CW specified in Cspecific MODEL command. I am not sure what I am missing? Thank you for your assistance. 


This can only be answered if you send the output to support. 

Junqing Liu posted on Friday, September 16, 2011  9:03 am



Thank you very much, Tihomir. I have a couple follow up questions regarding the interpretation of c#1 and C#2. 1) Does the within level means of c#1 means the random intercept of c#1 as compared to c#3? What dos a significant P value of the within level means of c#1? 2) What does a nonsignificant p value of the between level variance of c#1 mean? 3) Regarding your earlier response on examining the significance of between level variance, should i do a log likelihood ration test of the one level and two level model, or should i use the p value that mention above, the p value of the between level variances of c#1 and c#2? Thanks again! 


1) Yes. That pvalue is typically not of interest because it tests that the logit is zero which with 3 classes is not a meaningful point. 2) That indicates that there is zero betweenlevel variance for class 1. 3) Neither test is optimal because of testing variances at zero, the border of the admissible parameter space. I would just leave them as is and report them. 


Happy New Year! I hope, you had a great holidays! I'm a beginner in the field of multilevel analysis with a lot of questions. First, my aim is to reveal a typology on musicians who were rated by audience members. I have read your very intersting and inspiring paper (Henry & Muthen, 2010). First of al, I have some questions about your paper which could help me, solving some misunderstandings: 1) To which item or construct does your cluster variable, named LEAID, refer to? 2) It would be much easier to follow your steps, reported in your article, if I could run your code on the dataset. I wrote to Mrs. Henry, but she advised me to simulate such a dataset by Monte Carlo technique. Unfortunately, I am not able to do this. How does it work? 3) I would like to use a model for my own dataset that is the same as presented in Figure 7 (multilevel latent class model – non parametric approach with level 2 factor on random latent class indicators). Would you give me a hint how to write this in MPLUScode? 4) The final model seems also be attractive to me. But it is hard to understand your reported code without any comments on it. Especially the lower part (model constraint). I would be happy, If you could give us some comments on the code. I would be very happy, if you could help me answering these questions. Thank you! 


You can look at Examples 10.6 and 10.7 in the user's guide. These are similar to what is done in the Henry and Muthen paper and there are data available with each example. A note of caution, if you are a beginner in multilevel analysis, starting with a multilevel mixture model may not be a good idea. Studying both multilevel and mixture modeling as a first step is a good idea. 


Thank's a lot. Which literature would you recommend for an good introduction into These topics? All the best! 


You can see our Topics 5, 6, 7, and 8 course handouts which cover these topics and contain references. 


Thank you! 

Angela Urick posted on Thursday, January 19, 2012  12:46 pm



I’m working on a twolevel LCA (types of teachers in schools with types of principals) with a cw and cb that have different indicators (I have labeled them uw’s and ub’s below). These two sets of indicators have dichotomous and continuous measures. Finally, I want to regress cw on cb with a random intercept (similar to ex. 10.7). Here is my basic code: ... CATEGORICAL = uw 1  uw15 ub1  ub13; CLASSES= cb(3) cw(4); WITHIN= uw1 – uw30 x; BETWEEN= cb ub1  ub30 w; ... MODEL: %WITHIN% %OVERALL% cw on x; %BETWEEN% %OVERALL% cb on w; cw#1 – cw#3 on cb; MODEL CW: %WITHIN% %CW#1% [uw1$1 – uw15$1] [uw16 – uw30] %CW#2% [uw1$1 – uw15$1] [uw16 – uw30] %CW#3% [uw1$1 – uw15$1] [uw16 – uw30] %CW#4% [uw1$1 – uw15$1] [uw16 – uw30] Here are my questions: 1. Is a LRT (TECH 11, 14) possible for this model with different indicators for cw and cb? If not, how would you suggest that I asses class fit? 2. Theoretically, there is a twoway relationship between cw and cb, how would you suggest that this be modeled? 


1. No. It is an open research question on how to determine number of classes in a multilevel setting such as this. The ordinary BIC, for example, may not be the best approach. Interpretability can be helpful. 2. The relationship between cw and cb is captured by the betweenlevel statement cw#1 – cw#3 on cb; where the cw random intercepts are influenced by cb. 

Angela Urick posted on Saturday, January 21, 2012  10:33 am



Thank you, Dr. Muthen. 


Good afternoon, Dr. Muthen, I have another question in reference to the model mentioned above. In the results, the means of the level 1 indicators (uw/u) vary across classes as expected. However, the means of the level 2 indicators (ub/z) are the same across all classes. I ran cb as a single level LCA—there should be three different classes. Why would these means remain the same across the between level classes? Do I need to free the between level indicators or make other edits to the code? Thanks again, Angela 


Try Model cb: using %Between% and then mention the intercepts of the ub indicators in each of the cb classes. 


Thanks, it worked. 

Mary Campa posted on Saturday, May 26, 2012  12:13 pm



Hello, I am building a model similar to the Henry & Muthen, 2010 model 2, a twolevel random effects LCA. I have selected a fourclass model as best fitting. My question is about the betweenlevel variances produced by this code: %BETWEEN% %OVERALL% C#1; C#2; C#3; C#1 WITH C#2 C#3; C#2 WITH C#3; I used starting values to switch the ordering of the classes and the estimates and standard errors of the betweenlevel variances (C#1 C#3) changed. For example, on the initial run, the variance in class 2 was significant but on the next run the variance for the same class (although now a new number) was not. This happened for multiple classes, where the parameter estimates changed based on what class I selected as the reference. My understanding from the Henry & Muthen paper is that these parameters represent the betweenlevel variance in the class membership. The classes (proportions, probability of indicators) remains the same regardless of which is the reference class so I am not clear why these are changing. Does this suggest there is something wrong with my model or am I wrong in the interpretation? Thank you for your help. 


The multinomial regression has the coefficients at zero for the last class, the reference class. The betweenlevel variance components add to the coefficients for all but the last class. It makes sense that the size and SEs of these variance components change when you change the order of the classes because it is all relative to the reference class. 

sunY posted on Friday, June 08, 2012  3:44 pm



Hi Dr. Muthen, Thank you so much for your help in advance. I'm running a two level LCA and latent variable and independent (treat) is school level, and dependent variable is student level. According to the Number of clusters: 32, isn't the sum of latent classes supposed be 32 as well? But, the results show latent variables were converged by student level not school level like below: CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP Class Counts and Proportions Latent Classes 1 1437 0.53720 2 1238 0.46280  CLUSTER = school; BETWEEN = treat c; CLASSES = c(2); ANALYSIS: TYPE IS TWOLEVEL random MIXTURE; STARTS = 150 25; MODEL: %WITHIN% %OVERALL% dv; %BETWEEN% %OVERALL% dv on treat; [dv]; %c#1% dv; dv on treat; [dv]; %c#2% [dv]; dv; 


Although the latent class variable is a betweenlevel variable (varying across schools only), the class counts and proportions printed say how many students are in each class. But there are only 32 schools and the latent classes refer to them. 

sunY posted on Friday, June 08, 2012  11:29 pm



Thank you so much for the prompt response. 


Dear Prof Muthen, I fit multilevel finite mixture modelling for count data. I want to get variances across classes in level 1 and level 2. I run the model with this code: %within% %overall% y on x1; %c#2% y on x1; %between% %overall% y on w; %c#2% y on w; With this model, Mplus only give me one number of variance across classess and level. How I have to intrepret these variances? Can Mplus 6.1 reveal variances across classes and level? Thanks very much for your help 


Please send your output and license number to support@statmodel.com. 

xiaoshu zhu posted on Wednesday, July 25, 2012  10:56 am



Hello, I have a question regarding the class membership at the group level. I followed the codes for the nonparametric MLCA in Henry and Muthen (2010) and specified a model with two student LCs and three group LCs. The output showed that some groups were assigned to two group LCs, simultaneously. How can we deal with this problem? Should we decide the group class membership based on the one with largest proportion of students within the group? Thanks in advance! 


Please send the output and saved data along with your license number to support@statmodel.com. Point to specifically what you see as the problem. 

Mike Todd posted on Sunday, January 20, 2013  5:14 pm



Hello there: We have a multilevel dataset (individuals nested with census tracts) that we would like to use in a multilevel latent profile analysis. The census tracts (Level 2 units), and in turn, the individuals (Level 1 units) can be grouped into two categories. What we are wondering is if how/if measurement invariance across the two categories can be tested for in a multilevel LPA/LCA. Would we test this in the same manner that we would for standard "singlelevel" LPA? If not, can you point us to a relevant approach that could be applied to results generated by Mplus? Thanks so much! 


It sounds like you have an observed grouping variable on level 2. This can be handled by defining a betweenlevel latent class variable that is exactly the same as the grouping variable. See UG ex 7.24 for how this is done in the singlelevel case. The betweenlevel latent class variable has to be declared on the Between = list as in UG chapter 10. Then you can specify and test various degrees of measurement invariance across these betweenlevel classes. 

Mike Todd posted on Monday, January 21, 2013  9:11 am



Great! Thanks, Bengt! 


first part: hello everybody ! i am running a latent analysis with complex survey data on 9 items aiming at political alienation and willingness to participate in the democratic process. in my early steps i ran the analysis without type = complex mixture (only type = mixture) and it turned out that a 3 and 4class solution seemed the most reasonable solutions (loglikelihood based fit indices were all pretty sobering, but interpretation was consistent with substantive theory of the construct). later i realised i should be using type = complex mixture, since the data is clustered with n(cluster)= 27 and differing cluster sizes. hence, i reran the analysis for 3,4 and 5 latent classes. what struck me was, that the estimates did not change at all for the 3 and 5 class solutions, but changed considerably for the 4 class solution, which was the best model interpretationwise when using only type = complex and now unfortunately is far less sensical. 


second part: (sorry for the long message!) how is that possible, that 3 and 5 class solutions did not change, but the 4 class solution did? estimator is mlr, which i assume somehow weights with clustersize? could that be "playing against" my beloved 4 class solution, because one the classes is quite small in comparison to the other 3. maybe if people in this class come from cluster units with a small weight (due to small cluster unit size), this class cannot be detected well enough? as you can tell, i only have a very vague understanding of how the estimators work. i apologize for painful stupidity in my thoughts expressed above. do you have any recommendations for chosing an estimator when using type = complex mixture. ive heard there are different options: mlr, uls ... are there any paper where i could look things up? thanks so much for help!



Unless you have weights, your classes should not change when adding COMPLEX. Perhaps you are not replicating the best loglikelihood in all analyses. Or perhaps the order of the classes changed. If you can't see the problem, send the relevant outputs and your license number to support@statmodel.com. The only estimator available for TYPE=COMPLEX is MLR. Please limit future posts to one window. If they are longer than that, they are not appropriate for Mplus Discussion. 


thanks for the answer. i want to run the blrt but i am using type = complex mixture option. if i compare the 5 class solution using type=complex mixture to the 5 class solution using only type = mixture, estimates dont change and pvalues do, but only slighty. would it be acceptable to run the blrt using type = mixture even though the data is in fact clustered? thanks. 


I would not use BLRT using TYPE=MIXTURE if you have clustered data. I would use BIC. 


hi, i'm running this multilevel latent class model: Variable: Names are indir INDIR1 INDIR2 id GENERE REGOLARE CITT CITT2 NUCLEO1 NUCLEO2 NUCLEO3 LIBRI SP_DOM PC F_ISCED M_ISCED PARED PROF_P PROF_M BFMJ BMMJ HISEI SP_SCOL sp_sc_d cod_scu; Missing are all (9999) ; auxiliary are id; usevariables are GENERE REGOLARE CITT2 NUCLEO3 LIBRI SP_DOM PC PARED HISEI; categorical are GENERE REGOLARE CITT2 NUCLEO3 LIBRI SP_DOM PC PARED HISEI; Classes = CB(2) CW(3); within = GENERE REGOLARE CITT2 NUCLEO3 LIBRI SP_DOM PC PARED HISEI; between = CB; cluster = cod_scu; Analysis: Type= Mixture Twolevel; Model: %within% %overall% %between% %overall% CW on CB; And i have two questions: 1) i have different thresholds' estimates in the same within classes, i.e. thresholds' estimates of latent class 1 1 are different from these of latent class 2 1. Is it correct? 2) How can i calculate thresholds' estimates in probability scale? 


I'm sorry: i forgot to say "Thanks" 


1) The thresholds vary across the betweenlevel CB classes as the default. You should think of thresholds as a betweenlevel quantity in line with having means appear on between in regular multilevel modeling. 2) This is tricky because the probabilities involve the random effects and therefore require numerical integration. 


Dear Bengt, thank you for your kind reply. I would have two minor remarks just to be sure to have properly understood your comments. 1) the following is part of my output. According to your examples, the first coefficients in each group of thresholds (e.g. the pairs of GENERE$1) should be equal. As you can see, mine are not.Is there a mistake or is there a reason I cannot see underlying this results? Latent Class Pattern 1 1 Thresholds GENERE$1 0.461 0.087 5.272 0.000 REGOLARE$1 0.877 0.079 11.060 0.000 Latent Class Pattern 2 1 Thresholds GENERE$1 0.763 0.095 8.036 0.000 REGOLARE$1 0.200 0.103 1.934 0.053 2) ok, i understand, but it's very difficult interpreting characteristics of classes looking at threshold's estimates. could it be a good idea saving class probabilities and then analizing classes with descriptive statistics? 


Which of my examples are you referring to? 


I'm sorry: not properly "your" example, actually... i'm referring to Guide's example, 10x7. Thanks a lot 


Yes, ex 10.7 has the thresholds equal across the cb classes. So you are saying you don't get that  then I would have to see your full output to Support along with your license number. 

B posted on Saturday, July 19, 2014  10:24 am



Hi, I don't think I'm finding a models similar to what I want to explore in MPlus inside the user's guide. I have 2 specific questions. Here's goes: Question 1: I'm creating a multilevel latent profile model for students in classrooms. I want to see if the the random effects (intercepts) for profiles latent means, constituted by a battery of student indicators, are predicted by a level 2 latent factor for classroom environment. Would I specify that latent factor for classroom environment in this part of the model statement? %OVERALL& %BETWEEN% schoolfactor by indicators c#1 on schoolfactor c#2 on schoolfactor  Question 2: Finally, how would I specify a multilevel mixture model where the profile is constituted by child (level1) and classroom (level2) indicators? The profile might have measures of cognitive and social outcomes for children as well as measures of classroom environment. 


Q1. You would follow the ideas in the HenryMuthen multilevel LCA paper on our website. Declare a betweenlevel latent class variable. On between you simply say schoolfac BY ....; which will give you the schoolfactor means in the different between classes. Q2. Just declare the classroom environment variables as Between variables and include them in the BY statement. 

CMP posted on Thursday, February 26, 2015  3:03 am



Hi, I am running a multilevel mixture analysis with random effects. My variables are y x1 x2 x3 x4 x5. On level 1: Y on x1 x2 x3 x4 On level 2, I would like to identify latent classes (cb) using only the random slopes from level 1 (s1 s2 s3 s4). I do not want to include random intercept (y) as it does not make substantive sense, in my case. Following your example on the User’s Guide (example 10.2), I notice the random intercept is included as an indicator of cb. How can I specify that only the random slopes be used? Thank you in advance for your response. 


What you want implies that the mean of the random intercept y would not vary across the cb classes. To specify that you would need to hold those means equal across the cb classes: %cb#1% [y] (1); %cb#2% [y] (1); 

CMP posted on Thursday, March 05, 2015  2:31 am



Hello, I posted earlier on about the multilevel analyses I am running. My further questions are: 1) Repeated measures nested in individuals is by other multilevel standards 2level but considered 1level model by Mplus, as I understand from the UG. In my input I specified the model as TYPE = TWOLEVEL MIXTURE RANDOM; is this correct? 2) The ICs (Bic, aBic) and entropy favour a 3class solution but the LRT does not. So I am trying to use TECH14 to test the different class solutions by bootstrap but the model keeps running and never stops (2 days!). What could the problem be? ANALYSIS: lrtstarts= 50 20 50 20; 3) I had done the multilevel analyses without modelling latent class in HLM. When I run this same model in Mplus, some random slope variances which were significant in HLM become NSG in Mplus. Why is this so? 4) Could this be due to nonnormale multivariate nature of the data? I wanted to change the estimator to MLM but got an error message. Is this impossible to do? Your help with these questions will be much appreciated. 


1) In Mplus you can do growth as 2level in long format or as 1level in wide format. We recommend the latter whenever possible. So if you take the latter approach, Type=Twolevel would refer to some other clustering like students in classrooms. 2) I would simply go by BIC. 3) Perhaps you used MLR in Mplus and ML in HLM. 4) Use MLR. 

CMP posted on Thursday, March 05, 2015  12:33 pm



Thank you very much for your response. When I used ML as the estimator in Mplus my results were similar to that got with HLM using ML. However, there were still slight différences in the p values. Could you point me to any paper that I can reference in which the information provided by ICs, LRT and entropy were contradictory? Thank you once again for your help. 


Perhaps this paper is useful: Morgan, G. B. (2014). Mixed mode latent class analysis: An examination of fit index performance for classification. Structural Equation Modeling: A Multidisciplinary Journal, DOI: 10.1080/10705511.2014.935751 

E. Cohen posted on Monday, August 24, 2015  6:57 am



Dear Drs. Muthén, I have the following multilevel LCA problem: I want to identify subtypes of related cases in a multilevel sampling context (three levels), using a set of seven categorical variables measured on levels 1 and 2 (4 L1 variables, 3 L2 variables). Two questions: 1. Is it possible (yet) to estimate a MLCA model with latent classes based on indicators/variables measured on different levels? (Henry & Muthén (2010) as well as other articles on MLCA look at models comprising L1 latent classes only.) 2. Is it possible (yet) in Mplus to estimate a threelevel LCA (with covariates on two levels predicting class membership)? If not, is there a general estimation/data preparation strategy you would suggest (e.g., ignoring – while acknowledging – L3 clustering, or aggregating scores on L1 variable and treat them as L2 variables, then omitting L1 and shifting the level of analysis upwards)? Thanks in advance for your help! 


1. Mplus handles latent class variables on both level 1 and level 2. See for example, the paper on our website: Muthén, B. & Asparouhov, T. (2009). Growth mixture modeling: Analysis with nonGaussian random effects. In Fitzmaurice, G., Davidian, M., Verbeke, G. & Molenberghs, G. (eds.), Longitudinal Data Analysis, pp. 143165. Boca Raton: Chapman & Hall/CRC Press. and also Muthén, B. & Asparouhov, T. (2009). Multilevel regression mixture analysis. Journal of the Royal Statistical Society, Series A, 172, 639657. 2. I think level 1 and level 2 latent class variables can have their random intercepts be predicted by level 2 and level 3 covariates. 


See also UG ex 10.5 and 10.8 where you see "cb". cb can have observed level 2 indicators as well. 


I ran the following LPA Analysis: Type = Mixture Complex ; algorithm = integration; integration = montecarlo; Starts = 100 10; stiterations = 10; k1starts = 100 10; processors =8(starts); Model: %overall% c on age_oa age_ma; age_oa age_ma; Why in the output, is the covariate (age group dummy coded for middle and older adults) listed within each factor? Latent Class 1 Means SSAVOID 0.771 0.035 22.122 0.000 SSLEAVE 0.303 0.064 4.733 0.000 SMMOD 0.509 0.051 9.917 0.000 ADNEG 0.041 0.060 0.678 0.498 ADPOS 0.545 0.050 10.940 0.000 REDET 0.019 0.051 0.376 0.707 REDIS 0.130 0.048 2.720 0.007 REPOS 0.688 0.034 20.051 0.000 RERUM 0.222 0.100 2.211 0.027 REACC 0.349 0.048 7.196 0.000 RMSUP 0.027 0.059 0.464 0.642 RMPOS 0.250 0.059 4.205 0.000 RMPHYS 0.106 0.068 1.559 0.119 AGE_OA 0.223 0.024 9.272 0.000 AGE_MA 0.303 0.027 11.410 0.000 


Because you change the status of those age dummies from covariates to variables that have parameters in the model by your statement: age_oa age_ma; So they are just like any other "Y" variables. 


Thank you Dr. Muthen for your response. I had to include that statement for the model to run, otherwise I received the missing on X error statement. Given this,is this output and conclusion then fine for interpreting these latent classes? Thank you so much for your help! 


I have to see the full output to say  send to Support along with your license number. 


HEllo, I'm trying like a mad to change the reference category of the betweenlevel latent class in a nonparametric twolevel LCA as descripted in Henry & Muthén's paper. my syntax of the VARIABLE and MODEL sections is: usevariables = TOLER PROT ; cluster = cntry; within = TOLER PROT ; classes = cb (5) cw(3); between = cb; MODEL: %within% %OVERALL% %between% %OVERALL% CW on CB ; model cw: %within% %CW#1% [ toler@0.820 ]; [ prot@0.411 ]; %CW#2% [ toler@0.527 ]; [ prot@0.411 ]; %CW#3% [ toler@0.683]; [ prot@0.589 ]; I can change the level1 class order using the model CW part, but I can't find where and how to specify it for CB. thank you! Davide 


Dear Dr. Muthen, I am running a multilevel LPA model, akin to the model developed by Henry & Muthen's (2010). With your kind permission, I want to pose three questions about interpretation of my output with references to the main article. 1 The propability estimates of level2 predictors over the level1 latent class solutions (i.e. Table 3, Henry & Muthen, 2010) refer to "C#2 on [Level2 predictor name]" statements in the output, right? 2 The effect of level2 predictors over the level1 latent class indicator intercepts (i.e. Table 4, Henry & Muthen, 2010) refer to estimates of "model constraints" such as "POP30, GRW30"?? Then what is the function of total effect variables (i.e. C2POPLOG, C2TOBGRW and C2POVLEV) in the output now that the significance and effect size of those would not reported in the article? Or, first we should check the significance of total effect (e.g. C2POP*LOG) and direct effect estimates (e.g. C2_POP) on the latent class factors and then we can report the effect size of individual crosslevel variables (e.g. POP30)? 3 I use LPA with continuous indicators thus I interpret the linear regression outcomes. Should I pay an extra attention to any estimate while interpreting the output? Thank you for your patience to read my message in advance. Now I cross my fingers for your answer. Best regards. 


Please contact the first author about these questions. She's at Colorado State Univ. 


do someone have any suggestion for the problem I posted above? 


Which Figure and Appendix model are you trying to use? 


my model is: usevariables = TOLER PROT ; cluster = cntry; within = TOLER PROT ; classes = cb (5) cw(3); between = cb; ANALYSIS: type = MIXTURE TWOLEVEL; MODEL: %within% %OVERALL% %between% %OVERALL% CW on CB ; model cw: %within% %CW#1% [ toler@0.820 ]; [ prot@0.411 ]; %CW#2% [ toler@0.527 ]; [ prot@0.411 ]; %CW#3% [ toler@0.683]; [ prot@0.589 ]; and I would need to change the reference class of from CB#6 to CB#1. I know how to do it with the withinlevel classes (by fixing the means as in the example above) but I can't find what I should do for the between one 


It helps me if you connect your model to models numbered in HenryMuthen. 


that would be Model 4a at p.214 of the SEM article Thank you in advance Davide 


Use SVALUES in your original run to save the estimates. Then use them in a Starts=0 run where you switch the regression coefficients for CW on CB so that CB#1 values are given for the last CB class. You may also have to recalculate the logit for CB called [cb#1] etc. You do that by first computing all the CB class probabilities from the logits as described in Chapter 14 and then switch the probabilities and then compute the new logits. 


With model 2 in table 1 in Henry and Muthen (2010), the parametric approach to a multilevel 3 class model, do I need the following in the syntax for a 2 class model? Would I just limit that to only "C#1"? %BETWEEN% %OVERALL% C#1; C#2; C#1 WITH C#2; 


Yes, with 3 classes you have 2 random intercepts: c#1 and c#2. And you want them to be able to correlate. 


Hi Bengt, sorry for bothering again, but it's still not clear how I can switch the regression coefficients for CW on CB so that CB#1 values are given for CB#5 (the last class) as I cannot mention the last class directly. 


Send your run with the original solution and your best attempt at switching the classes to Support along with your license number. Briefly describe where you get stuck. 

Brian Knop posted on Monday, March 07, 2016  2:26 pm



I have looked through the user guide and Henry and Muthen (2010), but can't figure out how to get odds ratios for level 2 variable effects on latent classes (such as effect of tobacco growing state on smoking type). I can get coefficients (for level 2 variables), but not odds ratios. Thank you. 


The categorical variables are random intercept on level 2. So they are continuous. 

Brian Knop posted on Tuesday, March 08, 2016  8:23 am



Thank you for your quick response. That explains why the output gives odds ratios for withinlevel effects, but not betweenlevel effects on the latent classes. But Henry and Muthen (2010) present betweenlevel odds ratios, so it is possible, yes? 


The variable would need to be on the BETWEEN list and the CATGORICAL list to get an adds ratio on between. 

Brian Knop posted on Wednesday, March 09, 2016  9:17 am



Ok, for some reason I still can't get odds ratios when I do that. If I were hypothetically looking at schoollevel effects on latent classes of test scores and all variables used are categorical, my code looks something like this: usevariables= schooltype mathscore engscore histscore; categorical are schooltype mathscore engscore histcore; between= cb schooltype; cluster=schoolid; classes=cb (3) Analysis: TYPE= mixture twolevel; Model: %between% %overall% cb#1 cb#2 ON schooltype; Would that produce odds ratios in the output? 


Please send the output and your license number to suport@statmodel.com. 


Hello, We are trying to run a twolevel LCA. The syntax we are using is VARIABLE: NAMES ARE ID agency directservices ServYouth TimeWork gender age RaceBWO education fieldstudy sector RacePriv InstDisc BlatantRace ; MISSING are all (999); USEVARIABLES = RacePriv InstDisc BlatantRace ; CLASSES=C(4); CLUSTER=agency; ANALYSIS: TYPE = TWOLEVEL MIXTURE; STARTS = 60 30; PROCESS=8(STARTS); MODEL: %between% %overall% output: tech11 tech14 ; Our output AIC and BIC are nearly identical to the AIC and BIC we received when not running it with the second level. Additionally, the output gives us withinlevel variances but not betweenlevel variances. Is this syntax correct for running a multilevel LCA? Is it actually running it with two levels or just reproducing the single level output? Thank you very much for your help! 


You have to mention the random intercepts for the latent class variables on Between. For an example, see UG ex 10.6. 

Allison posted on Monday, August 15, 2016  9:48 am



Hello Bengt and Linda: I have a multilevel LPA, with events (Level1) nested within people (Level2). Currently, I have a model with 8 withinperson profiles, and nothing being specified at the betweenperson level of analysis. In traditional multilevel research with a similar structure (events nested within people), it is common to groupmean center the L1 constructs prior to analyses. Would you recommend groupmean centering the L1 variables being specified as profile indicators? Would other centering decisions, such as grandmean centering or even standardizing the profile indicators withinperson, be appropriate? Any guidance is greatly appreciated! 


I would not do any centering or standardization for this model. 

Fang Fang posted on Sunday, April 16, 2017  12:12 pm



Dear Drs. Muthen: I am trying to run a twolevel LCA. The syntax is Variable: Names are ID conflict CID fcontact pcontact live mpc mcp inpc incp emoclose; missing are .; USEVARIABLES = live fcontact pcontact mpc mcp inpc incp emoclose conflict ID; CATEGORICAL = live fcontact pcontact mpc mcp inpc incp emoclose conflict; CLASSES = c (3); CLUSTER = ID; Analysis: TYPE = TWOLEVEL MIXTURE; MODEL: %BETWEEN% %OVERALL% c#1 c#2; Output: TECH11 TECH8; However, the output doesn't show the "RESULTS IN PROBABILITY SCALE". Is there anything missing in my syntax to run a twolevel LCA? I am using Mplus 6.12. Thank you so much for your help! 


If we don't give it automatically, it is not available. 


Hi Dear Dr. Bengt\Linda Muthen I want to run a twolevel bifactor IRT model with random effects. 1. Is this the right syntax? 2. How do I calculate the effect of the clustering factor, for example gender? Or Is the gender impact significantly different from zero? VARIABLE : NAMES ARE gender lang i3i148; USEVARIABLES ARE i3i27; CATEGORICAL ARE i3i27; MIssing are all (9); cluster= gender; ANALYSIS: TYPE =twolevel random; estimator = ML; ALGORITHM = INTEGRATION; MODEL: %within% G by i3i27; F1 BY i3@ i4i15 (415); f2 by i16@1 i17i27 (1727); G with f1f2@0; F1 with F2@0; %between% G by i3i27; F1 BY i3@ i4i15 (415); f2 by i16@1 i17i27 (1727); G@1; F1@1; F2@1; G with f1f2@0; F1 with F2@0; OUTPUT : TECH1 TECH8; Best regards Muhammad ahmadi 


Try it and if you don't get what you want, send to Support along with your license number. Or ask on SEMNET. 


Hello! I am trying to run a 121 multilevel mediation model and have variables that are both between level and within level, thus I did not specify them in the BETWEEN ARE or WITHIN ARE statements. When I run my syntax I get this error: *** ERROR in MODEL command Observed variable on the righthand side of a betweenlevel ON statement must be a BETWEEN variable. Problem with: QOLITOT *** ERROR in MODEL command Observed variable on the righthand side of a betweenlevel ON statement must be a BETWEEN variable. Problem with: LFQTOT When I added those variables to the BETWEEN ARE statement, I then got this error: *** ERROR in MODEL command Observed variable on the righthand side of a withinlevel ON statement must be a WITHIN variable. Problem with: QOLITOT *** ERROR in MODEL command Observed variable on the righthand side of a withinlevel ON statement must be a WITHIN variable. Problem with: LFQTOT Please advise. Thank you! 


Send your full output to Support along with your license number. 


Hello Dr. Muthen, I am doing a multilevel latent profile analysis (students within classes within schools) but I only want to identify the profiles at the individual level (while accounting for the sampling design). Is this the correct code? idvariable is studID; cluster=idschl; ... type=mixture complex; ! no within or between statements The output for this code provides summary data identifying the correct number of schools but the model output is modeling solely at a single level. In this case, am I getting a model that is adjusted for school? Or am I incorrectly coding for a 2 level (student within school) model? When I change it to: cluster=ischl studID; within=v1 v2 v3 v4 v5 v6 v7; I get almost identical loglikelihoods (143857.684 one cluster and 143848.173 two cluster), similar AIC/BIC/LRT, and identical class counts and proportions of posterior probabilities and entropy. Since I only specified indicator variables at the individual level, is this model and the former similarly adjusting for schools? If I want to also include class, it states a threelevel model is not possible for LPA. Is there another way to also account for a third level? Thanks again for all your help 


I would use Type = twolevel complex where cluster = idschl idclss; So level 1 is student, level 2 is classroom, and the complex part is school. See the Cluster option on page 621 in the UG. 


Great  thank you so much! That worked. 

MLsem posted on Wednesday, May 29, 2019  7:02 am



Hello, I am conducting MLCA. What is the procedure to assign units at Level 2 to the most likely class at Level 2 based on the latent class posterior distribution? Would you please provide me with the example of Mplus code? Thank you 


You can use User's guide example 10.5 to see how this works. Add the following command: savedata: file=1.dat; save=cprob; Take a look at the SAVEDATA INFORMATION section in the output file. The classification variable for the cluster would be CB and you can find those values in the saved data file 1.dat. 

shonnslc posted on Tuesday, August 13, 2019  12:36 pm



Hello, I am conducting LPA and the clustering variables come from different classrooms (nested data). However, the number of cluster is only 6. In this case, I don't think MLPA is appropriate? In this case, what I can do to address the nested data structure? Thanks. 


You have too few clusters for MLPA. Use 5 dummy variables as covariates. 

shonnslc posted on Tuesday, August 13, 2019  6:48 pm



Thank you, Muthen! I am wondering if the following two codes are both correct for including 5 dummies as covariates in LPA: (1) VARIABLE: NAMES = u1u4 dum1 dum2 dum3 dum4 dum5; CLASSES = c(3); AUXILIARY = dum1dum5 (R3STEP); DATA: FILE = 3step.dat; ANALYSIS: TYPE = MIXTURE; (2) VARIABLE: NAMES = u1u4 dum1 dum2 dum3 dum4 dum5; CLASSES = c(3); model: %overall% c#1c#2 on dum1dum5; DATA: FILE = 3step.dat; ANALYSIS: TYPE = MIXTURE; 


They are correct. (1) is 3step and (2) is 1step. See also the appendix of our web note 15: http://www.statmodel.com/download/AppendicesOct28.pdf 

shonnslc posted on Wednesday, August 14, 2019  11:04 am



Thank you, Muthen! One followup question: I found that the LPA results are identical for LPA without dummies (without controlling for clustering effects) and LPA with dummies using R3STEP. Is this normal? I thought bringing in dummies as covariates should somehow change the LPA solution to some extent. 


Yes, it is the point of R3STEP that they are the same  read web note 15. Covariates do change the LPA solution when you don't do 3step like R3STEP. 

shonnslc posted on Thursday, August 15, 2019  1:33 pm



Thank you, Muthen! 1. So, I guess if my goal is to using covariates (i.e., dummies) to control for the nested data structure, 1step approach is preferred since I want LPA solutions to account for the clustering effect. Am I right? 2. I am wondering why we specify the model by adding paths from dummy variables to the latent factor (i.e., c on dum1dum5) but not to the clustering indicators (u1u4 on dum1dum5). Isn't the indicators are the dependent variables, and in fixed effects models, dummies are adding as predictors of dependent variables? Thank you! 


1. Yes. Q2: C on dum1dum5 implies that the dummies influence the indicators indirectly via c. This is a fundamental aspect of the model. You cannot identify both c on dummies and indicators on dummies. 

shonnslc posted on Thursday, August 15, 2019  8:46 pm



Thank you, Muthen! This is really helpful. 

Back to top 