I would like to inform if it is possible to include a latent class variable in a multilevel model. For instance in school effectiveness reseach: some characteristics could possibly distinguish 5 classes of schools. Would it be possible to use this latent class variable to predict student outcomes at the lowest level. Or is this impossible? If so, are there references to check for more details about this option?
In order to estimate multilevel mixture model (that is, multilevel model with finite mixture random component), I need the base package, the multilevel addon, and the mixture addon, am I correct about this? Thanks!
You would need the combination add-on to estimate a multilevel mixture model.
Joyce T. posted on Monday, April 04, 2005 - 6:36 am
I'am running a multilevel model (using ML) which contains 20 dependant variables, 12 independant variables and 3 continuous latent variables. I would like to know how mplus compute the degrees of freedom for both, the Chi-square test of model fit and the Chi-square test of model fit for the baseline model. Thanks.
BMuthen posted on Wednesday, April 06, 2005 - 3:03 am
The degrees of freedom is the number of parameters in the H1 model minus the number of parameters in the H0 model. The chi-square test of model fit for ML uses as H1 a model with free means, and free variances and covariances for both within and between. The baseline model is a model of free means and variances for between and within.
I am working on a multi-level LCA in a school-based data set. The variables of interest are 1) parent involvement (a child level variable) and 2) classroom quality (a classroom level variable). The ultimate goal is to use the latent classes as independent variables to predict child outcomes such as school readiness.
1) One question that has come up in our discussions is how to deal with important covariates (such as maternal education, child age, child sex, child ethnicity). I was wondering if you could speak to the differences between a) including the covariates in the model that estimates the latent class memberships vs. b) running post-hoc ANOVAS to examine the distribution of the profiles on these important covariates.
2) Also I am interested in how using the Latent Classes as independent variables to predict other outcomes would shift class memberships. Does this often happen when adding a predictive step into LCA/LPA models?
3) Finally we are using a national database with sampling weights, will weighting the data influence the LCA/LPA outcomes?
1. Generally it is best to estimate the full model simultaneously. See the following paper which is available on the website for further information:
Relating latent class analysis results to variables not included in the analysis. Submitted for publication.
2. If you add outcomes other than the original set, this will most likely change class membership and perhaps it should. See the following related paper which is available on the website:
Muthén, B. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (ed.), Handbook of quantitative methodology for the social sciences (pp. 345-368). Newbury Park, CA: Sage Publications.
3. You can and should include complex survey data features in the analysis. See the user's guide under complex survey data to see the options available in Mplus.
I have two follow up questions for the multilevel LCA we are working on.
1) Standardized Scores: Do you suggest running the models with standardized continuous indicators? Or is it acceptable to keep the indicators of profiles in their original metric (even if variances are different among indicators?)
2) Predicting Outcomes from Profile Membership: Also, for estimating the relationship between different profiles and an outcome (say literacy achievement) we have been including the outcome of interest as an indicator of profile membership. We then ran a Wald statistic to examine profile differences on the mean estimates of the outcome. Is this how you would suggest estimating profile differences on an outcome?
I have a question regarding a multi-level LCA too.
I am using a sample of twins in which I would like to identify the genetic-environmental etiology of class membership. I also have predictors at the within and the between level.
Since regressing the measured environmental variable doesn't seem to modify the ACE results (as seen in Turkheimer et al., 2005), I would like to use another strategy developed by Rasgach, O'Connor and Jenkins in which the genetic resemblance is a fixed effect.
Although I think this strategy is the best, I'm not sure how to actually bring the equation in a Mplus input. The equation is : y(ij) = Beta(o) + u(j) + e(ij) + g(ij)
Basically, the only thing that changes from a regular multilevel model is the g(ij) which is the genetic effect for the child (i) in the j'th family which varies for all individuals according to behavior genetic assumptions (it can be used with complex family pedigrees). I am wondering what do I write in the input. I think they use a single group to do the analyses and it departs from the multiple group analyses I am used to with twin samples.
Page 8, bottom, suggests that a covariance for the g(ij) term is a function of known constants. This reminds me of "QTL" modeling which is shown in the UG ex 5.23. This UG example shows how to use the Constraint= approach to moderate a covariance using read-in values. Perhaps that is a path towards doing what you want.
In running the multi-level LPA I mentioned earlier, we see from the output that the means of the indicators in the level 2 profiles are constrained to be equal across profiles. It appears that this is the default in Mplus.
Is it possible simultaneously estimate a latent profile of level 1 (child level variables) and a latent profile of level 2 (classroom level variables) without the level 2 means being fixed across level 2 profiles?
I attempted to override this estimation with starting values, but am getting repeated errors messages:
The following MODEL statements are ignored: * Statements in Class %CB#1% of MODEL CB on the BETWEEN level: SSCS98 LSCS98 ECPERSS98 ECFURNS98 ECLANGS98 ECMOTRS98 ECCREAS98 ECSOCLS98 LTARNS98 INTERS * Statements in Class %CB#2% of MODEL CB on the BETWEEN level: SSCS98 LSCS98 ECPERSS98 ECFURNS98 ECLANGS98 ECMOTRS98 ECCREAS98 ECSOCLS98 INTERS *** ERROR One or more MODEL statements were ignored. These statements may be incorrect.
Dear Dr. Muthén, I want to perform LCA on a complex dataset (teachers were rating students), and want to control for clustering effect. However I cannot define Type=complex, since this is already done with Type=mixture. How can I use the clustering or stratification options in this type of analysis? Thanks a lot. Robert
I am working on a two-level LCA where the variables of interest include both child-level (observed child interactions) and class-level variables (classroom quality). I have specified two classes at each level. I would like to see if the resulting four profiles differ on child-level school readiness outcomes. Is there a way to get an outcome mean for each profile? I am only able to get two means (one for each of the within classes) rather than four means (one for each of the four profiles). Thank you!
Take a look at the 2010 Henry-Muthen article in the SEM journal. For the models of figures 1-4 the between classes only make the within classes more or less likely but don't change the profiles of the observed items. In contrast, for figure 5 and on there are item-specific differences across the between-level classes so that would give the profile differences you expect.
K Frampton posted on Wednesday, March 16, 2011 - 11:23 am
I am running a multilevel LPA with continuous parenting indicators, with children nested within families. My goal is to identify parenting profiles, observe how they differ across various factors (mainly SES), and then use profile membership to predict a distal outcome (children's prosocial skills) in interaction with SES.
I first fit the model in a single level, and then in a multilevel. 4 classes were identified. Entropy is .86.
I then added covariates of interest(e.g., age of child, SES variables), to identify what distinguishes these groups. When I do this, the structure of the classes changes significantly. I know this is because of measurement variance issues. When I regress parenting indicators on covariate(s) in a single level, it improves the fit of the model. However, in a multilevel, when I do the same thing, computation time was + 2 hours, and it did not converge.
Any suggestions on how to get around this measurement variance issue in a multilevel? Because entropy is high, is it feasible in a multilevel framework to save the classes identified and then work with them as an observed variable, as you might do in a single level?
Also, with all this in mind - how would you suggest answering my final question - how SES X profile predicts a distal outcome?
It sounds like you have measurement non-invariance and that you add direct effects from covariates to indicators to take this into account. Note that you cannot identify a model with all direct effects.
To see your multilevel problem, you would have to send your input, output, data, and license number to email@example.com.
SES X profile influencing a distal can be handled by distal regressed on SEM with different slopes in the different profiles.
IYH Boon posted on Monday, June 27, 2011 - 12:58 pm
Are there any examples/code snippets available for situations like the one K Frampton describes, above, where the goal is to (1) identify latent profiles at level two and (2) relate these profiles to a distal outcome observed at level one?
I'm working on a similar problem and am unsure about how to specify the model statement.
I don't think we have that in script or paper form, but you would work along the lines of the below. This creates a between-level (say school) latent class variable cb from the between-level z indicators and cb influences the means of the random intercept for the distal outcome d (which is say a student variable varying on both within and between), which is how the cb influence carries over to the student's distal outcome.
Between = z1-z10 cb; classes = cb(2);
d on x;
I don't think you have to say more in MODEL because the z means vary across the cb classes as the default, and so does the d mean, where on between d is the random intercept in the regression of d on x.
Hope this start helps.
Junqing Liu posted on Thursday, August 25, 2011 - 12:50 pm
I am new to Mplus and LPA. I am working on a two-level LPA in a workforce data set. The variables of interest are 1) organization culture (a level 2 latent variable based on five level 2 continuous indicators) and 2) worker demography and practices and (level 1 observed variables). The goal is to use the latent classes as independent variables to predict workers' practice such as using a type of therapy. 1)One question is do i need to run the LPA first to get the classes(say there will be 2 or 3 categories)before including the latent class variable into the final model to predict the worker outcome?
2) In the final model using the org. culture class membership to predict worker outcome, do i need to include the observed organization id as a predicting variable to declare this is two-level model?
3) What is the output of the final model? Is it separate regression models for each category of org. culture? Or is it one regression?
4) Is there any empirical research reference on cross-sectional multilevel LPA analysis that I can read?
2) Orginization would be your Cluster= variable - see UG.
3)-4) You should read
Henry, K. & Muthén, B. (2010). Multilevel latent class analysis: An application of adolescent smoking typologies with individual and contextual predictors. Structural Equation Modeling, 17, 193-215.
which you can find on our web site.
Junqing Liu posted on Friday, August 26, 2011 - 1:48 pm
This is extremely helpful!
I can compare different LPA models and pick one that fits the best as the final latent profile model to do further analysis. I have some follow-up questions about the further analysis.
1.How common is it to use the latent profile variable as a predictor along with other covariates, rather than as a dependent variable?
2. If it is common, then should a level 1 latent profile variable be included as a regular categorical covariate (along with other level 1 and level 2 predictors) to predict a level 1 outcome or the way to included it depends on how the latent profile variable is modeled such as a two-level latent profile model with level 2 factor on random latent class intercepts and level 2 factor on random latent class indicators?
3. Is there any empirical research reference on using multilevel latent profile variable as predictor that I can read?
Thank you for your patience with my long-windedness.
1. It is getting more used now that software is available for easy use. There are papers on our web site showing this. But I would not say that it is common yet.
2. A latent class variable should be included as a predictor if substantive theory warrants that. Note, however, that you don't say "y ON c" (for a distal outcome y), but Mplus lets the y means change over the latent classes.
3. I have not seen multilevel latent profile used as a predictor yet in the literature, but there is nothing precluding it. The approach used in Henry & Muthen can easily be expanded to that using Mplus.
Junqing Liu posted on Friday, September 09, 2011 - 6:56 am
Thanks, Bengt. This is very helpful.
I tried the following three-class two-level LPA of org. culture. However, the output does not include results on latent classes. All the results are about correlations and covariance. The five latent indicators are the mean score of scales and the value ranging from 1 to 5.
1) How may i change the following syntax to get output on latent classes?
2) Is it ok to use the mean of scales as latent indicators? Or is it better to use the items within the scales as indicators?
Junqing Liu posted on Tuesday, September 13, 2011 - 12:49 pm
Thanks, Bengt and Linda. The technical problem is solved.
I have a couple of questions related to the results of the two-level model i mentioned earlier.
1) Are tech 11 and tech 14 applicable to a two-level LPA model? If so, when the LMR p value of tech 11 is not but the p of tech 14 is significant, should i pick a K as oppose to a k-1 class model?
2) The BIC and AIC of the two-level model are smaller than those of a single level model, but not by much (e.g. the adjusted BIC is 2542.96 for the two-level model and is 2559.12 for the single level model). In this case, should i still choose the two-level model?
3) What the following specification means, especially C#1 WITH C#2?
1) In principle tech14 is more reliable however you might also want to look at BIC and AIC for the two models.
2) You should also consider the size of the variance on the between level and see if it is significant. Note also that the performance of BIC would depend on the number of clusters (two-level units) which drives the asymptotics. Ultimately a simulation study would show if AIC and BIC are useful in this context. This is not well studied.
3) These are the interaction parameters in a log-linear model for contingency tables. For example in
These are normally distributed random effects that vary over clusters and allow the class proportions to change over clusters. The covariance term C#1 WITH C#2; is just the covariance between the two random effects and C#1; C#2; are the two variances.
Mary Campa posted on Thursday, September 15, 2011 - 4:40 pm
Hello Dr. Muthen,
I am reading your paper with Dr. Henry (2010, SEM 17m 193-215)and trying to replicate the analysis titled: Three classes at level 1, Two classes at level 2 random effects model: nonparametric approach (Model 4a from Table 1).
I am using 4 classes at level one but otherwise the model statements are the same. However, I continue to get this error message:
** ERROR in MODEL command Unknown class model name CW specified in C-specific MODEL command.
This can only be answered if you send the output to support.
Junqing Liu posted on Friday, September 16, 2011 - 9:03 am
Thank you very much, Tihomir.
I have a couple follow up questions regarding the interpretation of c#1 and C#2.
1) Does the within level means of c#1 means the random intercept of c#1 as compared to c#3? What dos a significant P value of the within level means of c#1?
2) What does a non-significant p value of the between level variance of c#1 mean?
3) Regarding your earlier response on examining the significance of between level variance, should i do a log likelihood ration test of the one level and two level model, or should i use the p value that mention above, the p value of the between level variances of c#1 and c#2?
Happy New Year! I hope, you had a great holidays! I'm a beginner in the field of multilevel analysis with a lot of questions. First, my aim is to reveal a typology on musicians who were rated by audience members. I have read your very intersting and inspiring paper (Henry & Muthen, 2010). First of al, I have some questions about your paper which could help me, solving some misunderstandings: 1) To which item or construct does your cluster variable, named LEAID, refer to? 2) It would be much easier to follow your steps, reported in your article, if I could run your code on the dataset. I wrote to Mrs. Henry, but she advised me to simulate such a dataset by Monte Carlo technique. Unfortunately, I am not able to do this. How does it work? 3) I would like to use a model for my own dataset that is the same as presented in Figure 7 (multilevel latent class model – non parametric approach with level 2 factor on random latent class indicators). Would you give me a hint how to write this in MPLUS-code? 4) The final model seems also be attractive to me. But it is hard to understand your reported code without any comments on it. Especially the lower part (model constraint). I would be happy, If you could give us some comments on the code.
I would be very happy, if you could help me answering these questions.
You can look at Examples 10.6 and 10.7 in the user's guide. These are similar to what is done in the Henry and Muthen paper and there are data available with each example. A note of caution, if you are a beginner in multilevel analysis, starting with a multilevel mixture model may not be a good idea. Studying both multilevel and mixture modeling as a first step is a good idea.
Angela Urick posted on Thursday, January 19, 2012 - 12:46 pm
I’m working on a two-level LCA (types of teachers in schools with types of principals) with a cw and cb that have different indicators (I have labeled them uw’s and ub’s below). These two sets of indicators have dichotomous and continuous measures. Finally, I want to regress cw on cb with a random intercept (similar to ex. 10.7). Here is my basic code:
Here are my questions: 1. Is a LRT (TECH 11, 14) possible for this model with different indicators for cw and cb? If not, how would you suggest that I asses class fit? 2. Theoretically, there is a two-way relationship between cw and cb, how would you suggest that this be modeled?
1. No. It is an open research question on how to determine number of classes in a multilevel setting such as this. The ordinary BIC, for example, may not be the best approach. Interpretability can be helpful.
2. The relationship between cw and cb is captured by the between-level statement
cw#1 – cw#3 on cb;
where the cw random intercepts are influenced by cb.
Angela Urick posted on Saturday, January 21, 2012 - 10:33 am
Good afternoon, Dr. Muthen, I have another question in reference to the model mentioned above. In the results, the means of the level 1 indicators (uw/u) vary across classes as expected. However, the means of the level 2 indicators (ub/z) are the same across all classes. I ran cb as a single level LCA—there should be three different classes. Why would these means remain the same across the between level classes? Do I need to free the between level indicators or make other edits to the code? Thanks again, Angela
Mary Campa posted on Saturday, May 26, 2012 - 12:13 pm
Hello, I am building a model similar to the Henry & Muthen, 2010 model 2, a two-level random effects LCA. I have selected a four-class model as best fitting. My question is about the between-level variances produced by this code:
%BETWEEN% %OVERALL% C#1; C#2; C#3; C#1 WITH C#2 C#3; C#2 WITH C#3;
I used starting values to switch the ordering of the classes and the estimates and standard errors of the between-level variances (C#1 -C#3) changed. For example, on the initial run, the variance in class 2 was significant but on the next run the variance for the same class (although now a new number) was not. This happened for multiple classes, where the parameter estimates changed based on what class I selected as the reference.
My understanding from the Henry & Muthen paper is that these parameters represent the between-level variance in the class membership. The classes (proportions, probability of indicators) remains the same regardless of which is the reference class so I am not clear why these are changing.
Does this suggest there is something wrong with my model or am I wrong in the interpretation?
The multinomial regression has the coefficients at zero for the last class, the reference class. The between-level variance components add to the coefficients for all but the last class. It makes sense that the size and SEs of these variance components change when you change the order of the classes because it is all relative to the reference class.
Hi Dr. Muthen, Thank you so much for your help in advance. I'm running a two level LCA and latent variable and independent (treat) is school level, and dependent variable is student level. According to the Number of clusters: 32, isn't the sum of latent classes supposed be 32 as well? But, the results show latent variables were converged by student level not school level like below: CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP
Although the latent class variable is a between-level variable (varying across schools only), the class counts and proportions printed say how many students are in each class. But there are only 32 schools and the latent classes refer to them.
xiaoshu zhu posted on Wednesday, July 25, 2012 - 10:56 am
I have a question regarding the class membership at the group level. I followed the codes for the non-parametric MLCA in Henry and Muthen (2010) and specified a model with two student LCs and three group LCs. The output showed that some groups were assigned to two group LCs, simultaneously.
How can we deal with this problem? Should we decide the group class membership based on the one with largest proportion of students within the group?
Please send the output and saved data along with your license number to firstname.lastname@example.org. Point to specifically what you see as the problem.
Mike Todd posted on Sunday, January 20, 2013 - 5:14 pm
We have a multilevel dataset (individuals nested with census tracts) that we would like to use in a multilevel latent profile analysis.
The census tracts (Level 2 units), and in turn, the individuals (Level 1 units) can be grouped into two categories. What we are wondering is if how/if measurement invariance across the two categories can be tested for in a multilevel LPA/LCA. Would we test this in the same manner that we would for standard "single-level" LPA? If not, can you point us to a relevant approach that could be applied to results generated by Mplus?
It sounds like you have an observed grouping variable on level 2. This can be handled by defining a between-level latent class variable that is exactly the same as the grouping variable. See UG ex 7.24 for how this is done in the single-level case. The between-level latent class variable has to be declared on the Between = list as in UG chapter 10. Then you can specify and test various degrees of measurement invariance across these between-level classes.
Mike Todd posted on Monday, January 21, 2013 - 9:11 am
i am running a latent analysis with complex survey data on 9 items aiming at political alienation and willingness to participate in the democratic process.
in my early steps i ran the analysis without type = complex mixture (only type = mixture) and it turned out that a 3- and 4-class solution seemed the most reasonable solutions (log-likelihood based fit indices were all pretty sobering, but interpretation was consistent with substantive theory of the construct).
later i realised i should be using type = complex mixture, since the data is clustered with n(cluster)= 27 and differing cluster sizes. hence, i re-ran the analysis for 3,4 and 5 latent classes.
what struck me was, that the estimates did not change at all for the 3 and 5 class solutions, but changed considerably for the 4 class solution, which was the best model interpretation-wise when using only type = complex and now unfortunately is far less sensical.
how is that possible, that 3 and 5 class solutions did not change, but the 4 class solution did? estimator is mlr, which i assume somehow weights with cluster-size? could that be "playing against" my beloved 4 class solution, because one the classes is quite small in comparison to the other 3. maybe if people in this class come from cluster units with a small weight (due to small cluster unit size), this class cannot be detected well enough?
as you can tell, i only have a very vague understanding of how the estimators work. i apologize for painful stupidity in my thoughts expressed above.
do you have any recommendations for chosing an estimator when using type = complex mixture. ive heard there are different options: mlr, uls ... are there any paper where i could look things up?
Unless you have weights, your classes should not change when adding COMPLEX. Perhaps you are not replicating the best loglikelihood in all analyses. Or perhaps the order of the classes changed. If you can't see the problem, send the relevant outputs and your license number to email@example.com.
The only estimator available for TYPE=COMPLEX is MLR.
Please limit future posts to one window. If they are longer than that, they are not appropriate for Mplus Discussion.
thanks for the answer. i want to run the blrt but i am using type = complex mixture option. if i compare the 5 class solution using type=complex mixture to the 5 class solution using only type = mixture, estimates dont change and p-values do, but only slighty. would it be acceptable to run the blrt using type = mixture even though the data is in fact clustered?
hi, i'm running this multilevel latent class model: Variable: Names are indir INDIR1 INDIR2 id GENERE REGOLARE CITT CITT2 NUCLEO1 NUCLEO2 NUCLEO3 LIBRI SP_DOM PC F_ISCED M_ISCED PARED PROF_P PROF_M BFMJ BMMJ HISEI SP_SCOL sp_sc_d cod_scu; Missing are all (-9999) ; auxiliary are id; usevariables are GENERE REGOLARE CITT2 NUCLEO3 LIBRI SP_DOM PC PARED HISEI; categorical are GENERE REGOLARE CITT2 NUCLEO3 LIBRI SP_DOM PC PARED HISEI; Classes = CB(2) CW(3); within = GENERE REGOLARE CITT2 NUCLEO3 LIBRI SP_DOM PC PARED HISEI; between = CB; cluster = cod_scu; Analysis: Type= Mixture Twolevel; Model: %within% %overall% %between% %overall% CW on CB;
And i have two questions: 1) i have different thresholds' estimates in the same within classes, i.e. thresholds' estimates of latent class 1 1 are different from these of latent class 2 1. Is it correct?
2) How can i calculate thresholds' estimates in probability scale?
1) The thresholds vary across the between-level CB classes as the default. You should think of thresholds as a between-level quantity in line with having means appear on between in regular multilevel modeling.
2) This is tricky because the probabilities involve the random effects and therefore require numerical integration.
Dear Bengt, thank you for your kind reply. I would have two minor remarks just to be sure to have properly understood your comments.
1) the following is part of my output. According to your examples, the first coefficients in each group of thresholds (e.g. the pairs of GENERE$1) should be equal. As you can see, mine are not.Is there a mistake or is there a reason I cannot see underlying this results?
2) ok, i understand, but it's very difficult interpreting characteristics of classes looking at threshold's estimates. could it be a good idea saving class probabilities and then analizing classes with descriptive statistics?
I don't think I'm finding a models similar to what I want to explore in MPlus inside the user's guide. I have 2 specific questions.
-Question 1: I'm creating a multilevel latent profile model for students in classrooms. I want to see if the the random effects (intercepts) for profiles latent means, constituted by a battery of student indicators, are predicted by a level 2 latent factor for classroom environment.
Would I specify that latent factor for classroom environment in this part of the model statement? %OVERALL& %BETWEEN% schoolfactor by indicators c#1 on schoolfactor c#2 on schoolfactor
- Question 2: Finally, how would I specify a multilevel mixture model where the profile is constituted by child (level1) and classroom (level2) indicators? The profile might have measures of cognitive and social outcomes for children as well as measures of classroom environment.
Q1. You would follow the ideas in the Henry-Muthen multilevel LCA paper on our website. Declare a between-level latent class variable. On between you simply say
schoolfac BY ....;
which will give you the schoolfactor means in the different between classes.
Q2. Just declare the classroom environment variables as Between variables and include them in the BY statement.
CMP posted on Thursday, February 26, 2015 - 3:03 am
Hi, I am running a multilevel mixture analysis with random effects. My variables are y x1 x2 x3 x4 x5. On level 1: Y on x1 x2 x3 x4 On level 2, I would like to identify latent classes (cb) using only the random slopes from level 1 (s1 s2 s3 s4). I do not want to include random intercept (y) as it does not make substantive sense, in my case. Following your example on the User’s Guide (example 10.2), I notice the random intercept is included as an indicator of cb. How can I specify that only the random slopes be used?