Message/Author 

J.W. posted on Monday, October 25, 2004  1:44 pm



Including covariates Xs into a LCA is likely to change latent class distribution. If so, this implies that the covariates Xs may have effects on the indicators Us; then influence membership classification. My questions are: 1) Does this mean Xs should be specified to predict the latent class variable C, as well as the categorical indicators (i.e., Us)? If Xs are specified only to predict C, not Us, is there any model misspecification problem? 2) I tried to regress one indicator, U1, on Xs, then no conditional probabilities were provided in Mplus output. Is there any option to print out conditional probabilities in Mplus output in this regard? 3) I also tried to regress a categorical indicator U2 (a 3level ordinal measure) on Xs in the model. I got one set of coefficients in each class. Are these cumulative logistic regression coefficients? Thank you very much for your help! 


1. If a u ON x coefficient is significant, this means that if the x is not included in the model, the classes will be different. 2. Conditional probabilities are not computed when there are x's because they vary with the value of x. 3. If you get one set for each class, you must have mentioned this ON statement in the classspecific part of the MODEL. If I understand the term cumulative, these are not. They are simply the logistic regression coefficients for each class with no order implied. 

J.W. posted on Monday, November 01, 2004  2:25 pm



Thank you very much for your prompt answeres to my questions. 1) It is very likely that covariates, such as sociodemographics, would be related to the u indicators, which are often outcome measures. If we want to use sociodemographics to predict latent class membership, do we have to specify the relationship between all x’s and all u’s? That would be i) very tedious and difficult for model specification when there are a lot of u indicators and x covariates; ii) there would be no conditional probabilities reported, as you pointed out, because they vary with the values of covariates. iii) If we use sociodemographic covariates to predict class membership without relating them to the u indicators, would the LCA model be misspecified? 2) Sorry, I did not make my question 3 clear in my last message. Let me try again. Regressing a K level categorical u indicator (e.g., 3 categories) on x covariates, I was expecting K1 (e.g., 2) sets of multinomial logit model coefficients for each class. However, I only got one set of coefficients when I regressed a 3 level categorical u variable on x covariates. I was wondering if the coefficients were from a proportional odds logit model (sometimes called “cumulative logistic regression”) since the u variable is an ordinal measure. 

bmuthen posted on Monday, November 01, 2004  4:00 pm



1) A significant direct effect from a covariate to a u indicator shows that the u measurement is not invariant with respect to the groups of people represented by the values of the covariates. Such a measurement noninvariance check cannot be done by including all direct effects in addition to the effect of the covariates on the latent class variable because this model is not identified. One can investigate one u at a time, allowing the covariates to influence this u directly, not only via the latent class variable. (i) yes, measurement noninvariance investigations can be tedious, but can be important (seldom done unfortunately). (ii) correct. (iii) yes 2) Yes, you are right, these estimates are from a proportional odds model since polytomous u's are taken to be ordinal when the categorical = option in the Variable command is used. If you want to specify the u's as nominal, use the nominal = option. 

J.W. posted on Wednesday, November 03, 2004  11:30 am



When LCA is used to assess the pattern of outcome measures, such as diagnosed symptoms or risk behaviors, very often LCA was conducted without covariates or relationships between class membership and covariates was assessed separately after class membership was estimated. This was inappropriate, because the LCA model was misspecified. To my understanding, covariates that influence, in theory, the latent class membership should be included in LCA. Estimation of latent class membership and the relationships between the class membership and covariates should be done simultaneously. What bothers me is that covariates (e.g., sociodemographic characteristics), that influence the class membership, would also very likely to influence the u outcome indicators. Is there any covariate that does not influence the u outcome indicators, but class membership? The difficulties are: 1) If multiple or all the u indicators are significantly related with the covariates, we can’t regress all these measurement noninvariant u indicators on covariates, when the covariates are also used to predict class membership, as you pointed out in your last message, the “model is not identified.” 2) Even with just one measurement noninvariant u indicator, it would be difficult to define the latent classes because the conditional probabilities would not be available when covariates are used to predict the u indicator. Now, I find myself in dilemma. Excluding covariates, the LCA is misspecified; including covariates, I have the above difficulties. Any solution? Many thanks in advance. 

bmuthen posted on Wednesday, November 03, 2004  11:46 am



The u indicators are in fact correlated with the covariates even when the covariates only point to the latent class variable and not directly to the u's. This is because the covariates then have an indirect effect on the u's. So this model has strong correlations between the covariates and the u's (even without direct effects). 2) Conditional probabilities are available even with a direct effect to a u. You can compute the conditional probability for each class and each value of the covariate (Mplus doesn't do it, but it can be done by hand using the estimates). Hope this answers your questions. 

J.W. posted on Thursday, November 04, 2004  11:10 am



Thank you so much! 

ADC posted on Monday, April 25, 2005  2:51 pm



I'm doing an LCA using two demographic covariates to predict latent class membership for 8 indicator variables. I'd like to compare models with various numbers of latent classes using AIC, BIC, etc. This seems pretty straightforward for 2 or more latent classes. But I also want to compare a model with a single latent class, and can't figure out how to model a single latent class with covariates. If I include my covariates in the USEVARIABLES line, without specifying an ON statement for the relation between covariates and latent class, does MPlus know that these variables are covariates? Does it even make sense to include covariates in a single class model? Mplus calculates parameters for them, but I'm not sure it's doing what I think it's doing. Thanks for any help you can provide. 


With one class, there is no latent class membership to predict. Everyone is in one class. 

bmuthen posted on Monday, April 25, 2005  3:38 pm



If you only include the covariates in the USEV list and not in the model, the covariates are treated as variables that are uncorrelated among themselves and with the other observed variables (see the warning you got)  this is not what you want; don't include the covariates in the USEV list when you only have 1 class (unless you want them to influence the outcomes). 

ADC posted on Tuesday, April 26, 2005  6:45 am



That makes much more sense. Thanks for your help! 

anonymous posted on Friday, January 13, 2006  12:05 pm



when conducting LCA with covariates, can the covariates have more than 2 categories for which i then substitute different values for x when calulating the probabilities using the logitistic regression coefficients (e.g.0,1,2)? or do i need to create 3 dummy variables to represent the 3 categories of this covariate? 


Covariates can be continuous or a set of dummy variables. You would create two dummy variables to represent three categories. 

anonymous posted on Wednesday, January 18, 2006  7:34 am



HI can the odds ratios be interpreted for covariates that are not binary, i.e. that have 3 or 4 nominal categories? also, if i have 2 or 3 covariates is it possible to look at the odds ratios for each covariate in turn for each class? for example when including sex and religion, do i say the odds for females of being in class 1 is higher than for males, and the odds of catholics being in class 1 is higher than for protestants, etc. 


If you have nominal covariates, you need to create a set of dummy variables. Covariates can be continuous or binary as in regular regresssion. You would want to add to "for males" the words "holding other covariates constant". 

anonymous posted on Wednesday, January 18, 2006  10:36 am



thanks. i included a set of dummy variables, but got the following message: ONE OR MORE MULTINOMIAL LOGIT PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY BECAUSE THE MODEL IS NOT IDENTIFIED, OR BECAUSE OF EMPTY CELLS IN THE JOINT DISTRIBUTION OF THE CATEGORICAL LATENT VARIABLES AND ANY INDEPENDENT VARIABLES. THE FOLLOWING PARAMETERS WERE FIXED: 119 123 127 131 135 139 i'm not sure how to rectify this. i'm assuming this has something to do with the inclusion of the dummy variables, so i may have done something wrong. for a nominal variable with 3 categories i created 3 dummy variables to represent membership (0) or nonmembership (1) for each of the 3 categories respectively. 


For a nominal variable with three categories, you would create two dummy variables just like in regular regression. 

anonymous posted on Wednesday, January 18, 2006  12:08 pm



sorry, just to clarify once more. given then that i will have one reference category against which i will be comparing the other 2 categories (dummy variables), does this mean that i will not be able to calculate the probabilities of class membership for this given reference category? or do i simply regard the slope as 0 for this reference category, where the logit=intercept? 


I think you should read the section in Chapter 13 called Calculating Probabilities From Logistic Regression Coefficients. This should answer your questions. 


This may be a dumb question, but can you explain to me why an LCA model would be not identified if you have direct effects froon a covariate to all u indicators and to a latent classs variable. 


Think of p u indicators and q x covariates. What identifies relationships between u's and x's can be thought of as logistic regression relationships for u on x. There are pxq such slopes. Even with a binary latent class variable c, we already use up q slopes for c ON x, so there isn't enough information left for pxq more slopes. 


For my LCA model with 2 dummy variables to reflect a 3 category covariate RACE, I get the warning: "ONE OR MORE MULTINOMIAL LOGIT PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. SINGULARITY IS MOST LIKELY BECAUSE THE MODEL IS NOT IDENTIFIED, OR BECAUSE OF EMPTY CELLS IN THE JOINT DISTRIBUTION OF CATEGORICAL LATENT VARIABLES AND ANY INDEPENDENT VARIABLES. FOLLOWING PARAMETERS WERE FIXED: 90 93 Syntax is: USEVARIABLES are ..... BLK HISP; Classes = c(6); Analysis:Type=mixture ; MODEL: %OVERALL% c ON BLK HISP; Fixed parameters seem to be BLK in C#4 and HISP in C#5. GAMMA(C) BLK HISP C#4 90 91 C#5 92 93 Categorical Latent Variables C#4 ON BLK 23.313 0.000 999.000 999.000 HISP 0.914 0.234 3.908 0.000 C#5 ON BLK 22.400 0.351 63.901 0.000 HISP 20.592 0.000 999.000 999.000 LOGISTIC REGRESSION ODDS RATIOS C#4 ON BLK 0.000 HISP 2.493 C#5 ON BLK ********* HISP ********* Am I simply getting this warning to alert me that there are no BLK in Class 4 nor BLK/HISP in Class 5? If so,is this a warning I can ignore? 


Yes. Yes. 


Good Morning, I was wondering if anyone on the discussion forum could help. I am running the following 4 Class LCA model, with a direct effect (Age)....... Variable: Names are scrser area gor6 nrf2005 education age swimming snooker darts football fishing newoutdo wintersp water tennis badmin squash cycling fitness cricket tabten golf horserid yoga tenpin jog bats rackets rackets2; Missing are all (9999) ; USEVARIABLES swimming snooker football newoutdo wintersp water cycling fitness golf tenpin jog cricket rackets2 age; CATEGORICAL swimming snooker football newoutdo wintersp water cycling fitness golf tenpin jog cricket rackets2; CLASSES = C (4); ANALYSIS: TYPE = MIXTURE; MODEL: %OVERALL% swimming  rackets2 on newage1; ........... My question is, the age variable is categorical (5 categories), and how do I let MPlus know that this variable is categorical? as I cant put it into the CATEGORICAL section as it will place it in the latent class model. Any help would be great. Cheers Paul 


You don't need to put it on the CATEGORICAL list. This is only for dependent variables. All covariates are treated as continuous in regression analysis. 


Hello, I am doing a twolevel mixture model with binary indicators and I would like to know how to get the condutional probabalities for each indicator. For example, in the regular mixture model, TECH1 produces conditional probabilities and thresholds; however, in the twolevel model, there are only thresholds, no conditional probabilities for the indicators. Is there a way to use the thresholds to compute conditional probabilities in the twolevel model? Thanks, Vernon 


We don't provide conditional probabilities when numerical integration is involved. You cannot compute the conditional probabilities by hand because they must be computed using numerical integration. 


Hello, I have a followup to my previous question. If I am unable to compute conditional probabilities by hand and I cannot estimate them in the twolevel model, would you suggest estimating them in a single level model? I am using complex data with clustering at 2 levels and the single level mixture model only allows one cluster variable. How concerned should I be if I am unable to estimate the conditional probabilities with both clustering variables in the model? 


I would not change my model. I would instead interpret the sign and significance of the latent class indicators and look at the profiles. You will not learn anything more from the probabilities than from the parameter estimates. 

nina chien posted on Wednesday, May 07, 2008  11:14 am



When I include covariates, all cases with missing data on just ONE of the covariates are dropped from the analysis. I am using the TYPE = MISSING command. "Data set contains cases with missing on xvariables. These cases were not included in the analysis. Number of cases with missing on xvariables: 418" Is this supposed to happen? Thanks very much for your help. Nina 

nina chien posted on Wednesday, May 07, 2008  3:59 pm



I am doing an LPA with covariates. There is one covariate  poverty status  that I know influences profile membership. But I also want to use poverty status, later down the line, as a predictor to test for interaction effects with profile membership (profile x poverty status) on some outcome variables. My options are: 1) Include poverty status as a covariate in the LPA. Then use it later again as a predictor variable when testing the interaction of profile x poverty status. 2) Do not include poverty status as a covariate in the LPA. Use it later as a predictor variable when testing the interaction. 3) Include poverty status as a covariate in the LPA. I cannot use it as a predictor variable later (i.e., I must drop my research question having to do with the interaction entirely). Which is the correct one (I really hope not 3)? Or another option entirely? Thanks very much for your help. 


A model is estimated conditioned on the covariates. As a result no distributional assumptions are made about them. If you don't want the observations with missing values on the covariates deleted from the analysis, you must bring these variables into the model and thereby make distributional assumptions about them. You can do this by mentioning their variances in the MODEL command. 


I would include the covariate in the LPA and also when the distal outcome is added. You would regress both the categorical latent variable and the distal outcome on poverty. By allowing the regression to vary across classes, you would capture the interaction you are interested in. 

Kaigang Li posted on Tuesday, May 27, 2008  8:35 pm



Hello Linda, Based on your answer to Nina Chien's question on May 07, could you please clarify how to mention the variances in the MODEL command? Should I compute the variances using other stats package and fix the variances in the MODEL command using @? Thanks, Kaigang 


If the variable is y1, you refer to the variance as: y1; You should not fix the variances. 


I performed a crosssectional LCA using early adult factors as the indicators. The optimal number of classes was 4. I now want to see how these 4 classes vary on a specific set of adolescent factors. One such variable is highschool GPA. It seems that there are two ways to examine such class differences. One is to include "C on gpa" in the %Overall% model statement. A second way is to include "Auxiliary = gpa(e)" in the Variable section of the code. Each approach appears to provide results that are consistent with the other. The relative risks (change in prob of class membership for an increase of 1 in gpa) calculated from the “C on gpa” approach are consistent with the gpa means provided for each class from the “Auxiliary” approach. My questions are: (1) Do these two approaches provide equivalent results (though in different metrics)? Am I comparing “apples to apples”? (2) If they are not equivalent, in the "C on gpa" method does heterogeneity in gpa contribute to/influence actual class structure/assignment? 


1. AUXILIARY (e) and (r) are meant to be used as screening tools. Once the covariates are selected, they should be included in the model. 2. Yes. 


Dr. Muthen, Thanks for the quick response. So, say my LCA has 5 "u" variables or indicators. Would a model with the 5 "u" variables in the class specific model statements (i.e., %C#1%, %C#2%, etc.) and with "C on gpa" yield the same class structure and probabilities as a model with the 5 "U" variables AND GPA in the class specific model statements? (With the obvious difference between the two models being that the former also provides changes in membership probability given changes in levels of GPA). Here are the two versions in syntax form... %Overall% c on GPA; %C#1% [U1$1*1.435]; [U2$1*.045]; [U3$2*.317]; [U4$1*.664] [U5$1*1.765]; %C#2% [U1$1*1.435]; [U2$1*.045]; [U3$2*.317]; [U4$1*.664] [U5$1*1.765]; VERSUS... %Overall% %C#1% [U1$1*1.435]; [U2$1*.045]; [U3$2*.317]; [U4$1*.664] [U5$1*1.765]; [GPA]; %C#2% [U1$1*1.435]; [U2$1*.045]; [U3$2*.317]; [U4$1*.664] [U5$1*1.765]; [GPA]; 


I think these are equivalent parametrizations where conditioned on a categorical latent variable the u's and x are independent. I think you will obtain the same loglikelihood for each model. 


Hi Dr. Muthen, I posted a couple of days ago (on another thread) about changing reference classes in a 3 class lca model in order to be able to have my desired ORs reported in the output. Your recommendation was to use the ending values as starting values for the classes in a new analysis. This works very well, thank you. The challenge I am having now is when I add my three covariates to the model the classes return to the order of when I am not using the ending values as starting values. It is like the covariates nullify the command to switch the order. Is it possible to reorder the classes with the ending values and to simultaneously add covariates? If so, is there additional syntax I am missing? Thank you for your help. 


The starting values from the model without covariates may not be correct for the model with covariates. There may be direct effects needed between the covariates and the latent class indicators that make the classes change when covariates are added. See the following paper which is available on the website for a discussion of this: Muthén, B. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (ed.), Handbook of quantitative methodology for the social sciences (pp. 345368). Newbury Park, CA: Sage Publications. 


Dear, I'm doing LCA (what works great), but now I want to include two covariates (age and sex). This is my syntax: ... VARIABLE: NAMES ARE idnr it1 it2 it3 it4 it5 it6 it7 it8 age sex; USEVARIABLES ARE it1 it2 it3 it4 it5 it6 it7 it8 age sex; CLASSES = c (2); CATEGORICAL = it1 it2 it3 it4 it5 it6 it7 it8 sex; ANALYSIS: TYPE = MIXTURE; ALGORITHM = INTEGRATION; MODEL: %OVERALL% c ON age sex; OUTPUT: SAMPSTAT STANDARDIZED MODINDICES; SAVEDATA: FILE IS prob2class_710mergeAGESEXok.dat; SAVE IS CPROB;  Is this the right way: c ON age sex; to use age and sex as covariates? I don't think so because my fit indices change when I leave out "sex" in the category "CATEGORICAL". And I can't put "age" in "CATEGORICAL" because this is a nog a categorical variable. So, my question is I can control for age and/or sex. Thank you, Elien 


c ON age sex; is correct. The CATEGORICAL list is for dependent variables only. You should not put age and sex on this list. 


Thank you very much for you quick answer! I suspect that I must use "c ON age" if I only want to control for age. And "c ON sex" if I only want to control for sex? Thank you, Elien 


Yes. 


Thank you very much!!! 


Dear, To come back on my previous question, is the following correct because age and sex are still seen as dependent continuous variables: Number of dependent variables 10 Number of independent variables 0 Number of continuous latent variables 0 Number of categorical latent variables 1 Observed dependent variables Continuous AGE SEX Binary and ordered categorical (ordinal) IT1 IT2 IT3 IT4 IT5 IT6 IT7 IT8 Categorical latent variables C 


The example above was LCA with 1 class. The following is with 2 classes and now age and sex are seen as independent variables. I suspect this is correct?: umber of groups 1 Number of observations 477 Number of dependent variables 8 Number of independent variables 2 Number of continuous latent variables 0 Number of categorical latent variables 1 Observed dependent variables Binary and ordered categorical (ordinal) IT1 IT2 IT3 IT4 IT5 IT6 IT7 IT8 Observed independent variables AGE SEX Categorical latent variables C Thank you! 


Please send the two full outputs and your license number to support@statmodel.com. The information provided is not sufficient to answer your question. 


Dear, When I control for age and sex (c ON age sex;), I put age and sex in USEVARIABLES. My question is: When I control for only 1 variable, do I have to put the other one also in USEVARIABLES? E.g. when I only control for sex (which is of course in USEVARIABLES), do I also have to put age in USEVARIABLES? Kind regards, Elien 


Only the analysis variables should be included on the USEVARIABLES list. 


Thank you very much for your answer! 


Hi, I am running an LCA using covariates by regressing class membership on these variables. I would also like to examine class differences on an outcome (i.e., using the auxiliary function), but I also wanted to control for covariates at this step. Is this possible to do? I'm assuming that including the covariates in the creation of the classes is not the same as controlling for them when examining how those classes relate to an outcome. I haven't seen any examples of this and wanted to know if you can help. Thanks! William 


The AUXILIARY (e) option is used for screening purposes, not for model estimation. You should include the distal outcome on the USEVARIABLES list. To control for the covariates, regress the distal outcome on the covariates. The effect of the distal outcome is seen in the varying of the intercepts of the distal outcome across classes. 


Thank you for your help! Would you by any chance know of a good LPA/LCA reference that used a distal outcome and covariates in this manner? I can't seem to find any on your website. I am mostly interested in seeing how the results should be presented in general and interpreted. 


See the following paper which is available on the website: Muthén, B., Brown, C.H., Masyn, K., Jo, B., Khoo, S.T., Yang, C.C., Wang, C.P., Kellam, S., Carlin, J., & Liao, J. (2002). General growth mixture modeling for randomized preventive interventions. Biostatistics, 3, 459475. See also papers by Hanno Petras. 


Thanks! I have another question I was hoping you could help me with. Is there a way to specify my models so that when I run various analyses (i.e., with different covariates, outcomes, etc.) the classes are extracted in the same order, so that I can more easily make comparisons between different models? 


You can give key starting values. 


Hi, I am a bit confused. In this thread Linda said that the "auxiliary" function is to be used to exploratory purposes only. However I understood that Asparouhov & Bengt Muthen (2012) and the relative ppt presentation it is meant to use "auxiliary" for substantive purposes. Could some please clarify? thank you Davde Morselli 


These are the new functions of AUXILIARY that will be in Version 7 not the current functions. 


Thank you linda. So I cannot use the (r) specification for hypothesis testing? Even if the entropy is quite high (>8)? Davide 


The (r) specification is an ad hoc approach. You should use the new 3step approach for hypothesis testing. 


Thank you again linda Given mplus 7 has not been released yet, how can I do it? The muthen and clark paper showed only (e) and (r).. Davide 


I mean: can I simply use the most likely class as nominal dv or i should weight it fo it its probability compared to the others probability classes? in this case is there a smart way to do it with mplus (except r3step option)? 


You should wait for R3STEP or do it by hand according to the instructions given in the handout where R2STEP is described. 

anonymous posted on Tuesday, March 12, 2013  9:10 am



I'm running a LPA with three covariates. The covariates appear to not significantly differ across classes, but when I enter the covariates in the model using the 1step method and look at a plot of the estimated probability as a function of covariate #1, it looks like this differs based on gender such that covariate #1 appears to differ across classes for females but not for males. Is there a way to empirically demonstrate whether two covariates interact in predicting the classes? Would it be appropriate to add an interaction term as a covariate or to replace gender and covariate #1 with the interaction term? Or would it be more appropriate to run this as a multigroup LPA, with males and females in separate groups? 


Just create a product term between gender and covariate #1 and include all 3 variables in the prediction of c. 

ian jantz posted on Thursday, May 09, 2013  7:32 am



Hi, I selected a 4class model based on 10 indicator variables. When I introduced covariates, it seems as if a substantial number of individuals were reclassified. As such, class prevalences in the model without covariates differed from the model one with covariates. Is there a resource which provides some guidance for conducting measurement noninvariance investigations of indicator variables? The first posts in this thread (October 25 through November 4, 2004 from J.W. and professors Muthen) were very helpful. But, I have some basic questions about diagnosing measurement noninvariance, what counts as substantial reclassification, the steps to isolate the indicator variables and covariates responsible for the noninvariance, and some potential solutions to addressing the issue. Any guidance is much appreciated. 


Have you read the following paper which is available on the website: Muthén, B. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (ed.), Handbook of quantitative methodology for the social sciences (pp. 345368). Newbury Park, CA: Sage Publications. 

ian jantz posted on Friday, May 10, 2013  1:50 pm



Hi Dr. Muthen, I read Muthen (2004) and it was very useful. I guess one basic question I have is what counts as substantial reclassification. When I introduce one combination of covariates, class prevalence changes very little, say, 2 or 3%. However, when introduce all covariates of interest,prevalence for one class changes by about 12%. Thanks so much. 


This is really a substantive decision. Besides looking at the percent changes in the classes, you should look at the individual changes in posterior probabilities. 

anonymous posted on Sunday, February 23, 2014  8:33 pm



In response to the following user question "ONE OR MORE MULTINOMIAL LOGIT PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. SINGULARITY IS MOST LIKELY BECAUSE THE MODEL IS NOT IDENTIFIED, OR BECAUSE OF EMPTY CELLS IN THE JOINT DISTRIBUTION OF CATEGORICAL LATENT VARIABLES AND ANY INDEPENDENT VARIABLES. FOLLOWING PARAMETERS WERE FIXED: 90 93 .... Am I simply getting this warning to alert me that there are no BLK in Class 4 nor BLK/HISP in Class 5? If so,is this a warning I can ignore?" Bengt responded "yes and yes" on December 14, 2007 (see above). I have a followup question. I am in a similar situation as that user, I have one class with no males (gender was entered as a covariate); however, this makes sense based on the characteristics of the class so I haven't been concerned. Is there a way to still calculate and report inferential statistics for gender, though? Thanks! 


Please send the output and your license number to support@statmodel.com. It depends which parameters are fixed and why. I would need to see to be sure. If you have not variability of a covariate in a class, you cannot estimate the regression coefficient. 


good afternoon I have no received response to my answer, I think ? 1/ what is the differences between actives covariates and inactive covariates ? 2/ Is it possible to use active covariates, then estimate posterior membership and assess OR associated with classes (the dependent variables) and the covariates (the same as the active which contribute to define the classes)(X variables) ? or is it an error and it is better to use posterior probbaility of the model without active covariates ? 3/ When is it better to use active or inactive covariates ? thanks very much E 


I don't know what active and inactive covariates is. 


Dr Prof Muthen, I have seen active and inactive used in LGold. When covariates are inactive, they did not affect size, or estimates of the classes as they are not included into the model; however, when they are active, estimates for indicators may differ. So I did not know if covariates should be used are active or inactives ... I have this problem because I think that my covariates are associated with indicators which could explain why the definition of classes differ... Scuse me for my bas english best regards Emilie 

Jon Heron posted on Wednesday, February 26, 2014  9:22 am



I've always felt that LG's inactivecovariates approach was the same as Mplus' PseudoClass draws (auxiliary with "(r)"). You may not get perfect agreement across the packages though because LG does something to deal with covariate missingness. 


Adding to Jon's comments, it sounds to me like the inactive covariates would be best handled by the Mplus 3step method done by the Auxiliary R3STEP option. But if you suspect direct effects from some covariates to some latent class indicators you would want to take a regular (1step) approach where all covariates are "active". 

anonymous posted on Friday, February 28, 2014  12:01 pm



As a followup to my February 23rd question above, if a parameter is fixed in say a four class model but was not in the three class model, can Likelihood Ratio Tests such as the LoMendellRubin Test (tech 11) still be interpreted? 


No. 


dear Prof Muthen, thanks for your previous answer. I think covariates are associated with indicators and they have a direct effects on estimates as well as number of classes. However, can I use the classification of indicators obtained after inclusion of covariates than using posterior mebership (as ine the three step) assess associations (Or° BETWEEN the classes and other variables such as mortality or hospitalisation or other characteristics ? or it is an error and we has to use only the classification obaitned without the covariates ? 


You can use a model where covariates have both an effect on the latent class variable and some (but not all) of the latent class indicators. If you have strong direct effects, 3step methods are not suitable  we describe this in our Oct 28 3step paper on our website. 

MJKim posted on Friday, March 07, 2014  11:09 am



Dear Drs. Muthen, I have a question for running a regression mixture model, which is presented as Example7.1. In the example, although the correlation between X1 and X2 are included in the figure, "X1 with X2" statement is not included in the Model statement. When I included the statement, I didn't get the result because the program told me that I had to add ALGORITHM=INTEGRATION;. After adding it to the Analysis option, I could run the model, but the results were different from what I got using the original statement (as shown in the example). 1. Which one should be used if I want to know the relationship between X1 and XZ? 2. If I need to add "X1 with X2", is the statement not corresponding to the presented figure? Thank you so much for your help. 


In regression, the model is estimated conditioned on the observed exogenous variables. Their means, variances, and covariances are not model parameters. We show the covariance between x1 and x2 because it is not zero during model estimation. If you want to know its value, ask for SAMPSTAT or TYPE=BASIC. 


dear Prof muthen I have a problem in LCA analysis regarding health profiles in patients with a specific disease, and with concomittent variables. 1/ in the model without covariates > BIC and BLRT indicate to choose the 3class solution 2/ including concomittent variates (age, sex, tumor site) > BIC says 3class is better but BLRT is better with 4 classes (and appears more realistic) moreover, prevalence of classes is different with and without covariates I have noted that concomittent variables were strongly associated with indicators (all p<0.001) so What is it better and how to deal with this problem ?  to keep the 4class solution with covariates ?  to keep the 3class solution with covariates ?  to keep the 3class without covariates ?  or to change covariates as indicators (age, sex and tumor site as indicators ?) Moreover, if I want realize the 3step method to assess associations between classes and other variables (not included into the model) such as healtcare utilization: how to do ??? use the classification of indicators obtained with covariates ? or without covariates ?? Thanks very much for the answers you could provide to these questions... best regards Emilie 


I would go with BIC for simplicity. It is not necessarily a problem that the concomittent variables (I call them covariates) are strongly related to the latent indicators  if the covariates influence the latent class variable, that implies that the two sets of variables are related. The easy way out to handle covariates is to use the new 3step approach of R3STEP described in our Web Note 15 (or, DCAT/DCON for distal outcomes). But if you really want to understand the class prevalence differences you mention you want to explore direct effects from covariates to latent class indicators. 


thank you prof Muthen but I did not understant the last sentence if I want to assess OR between classes and other variables (different from covariates: e;e.: institutionalisation): I need to choose classification obtained with covariates (i.e; age, sex) or classification obtained without covariates thank you 


Scuse me Prof Muthen how to state if there is a relationship between the covariate and latent class ?  with the P value between covariate and LC ?  because the add of covariates change estimates ? (but from when we can assert it ?) 


The answers are in Web Note 15 on our website: Asparouhov & Muthén (2013). Auxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus. Accepted for publication in Structural Equation Modeling. An earlier version of this paper is posted as web note 15. Appendices with Mplus scripts are available here. 


Good evening I have read this article but (difficulties in english) It is not clear for me to how assess strong or no effects of covariables on LC or relationship between LC and covariates (p value ? other ?). It is important because I Have read that in case of strong effects, onestep is a better approach than 3step as you have also answered in a previous question. Furthermore, it seems important for my research because without covariates BLRT and AIC3 both state 3class , whereas with active covaraites BLRT and AIC3 both state 4 classes... so interpretation is not the same. (BIC is always 3 classes, but Nylund showed that BLRT performed the best) thank you very much best regards Emilie (from France) 


To check for direct effects from covariates to latent class indicators, you include in your model a regression of one indicator at a time on all the covariates and check for significant effects. You then include all such significant effects when you assess BIC for that number of classes. 

CB posted on Monday, October 20, 2014  1:34 pm



I'm conducting LCA with 4 indicators variables, but I want to include a binary exogenous variable to only have a direct effect on an indicator variable (with 3 categories). I have tried coding it as indicator (U) on exogenous variable (X): U on X but the output reports the error that a nominal variable may not appear on the righthand side of an ON statement. (Currently, I have my binary exogenous/independent variable as a nominal variable). My questions are: 1) Why can a nominal variable not appear on the righthand side of an ON statement? 2) How else can I code the exogenous variable to have an effect on my indicator? 


Covariates must be binary or continuous. You need to create a set of dummy variables for your nominal variable. It should not be put on the NOMINAL list. This is for dependent variables. In regression, covariates are treated as continuous variables. 

CB posted on Monday, October 20, 2014  7:34 pm



Thanks for your quick response! I have an additional followup question. I've read that CATEGORICAL includes binary and polytomous indicators, but in the Mplus Users Guide, it says that CATEGORICAL includes binary and ordered categorical indicators and NOMINAL includes polytomous indicators. So I'm confused how polytomous indicators are coded and how both binary and polytomous indicators can be coded to conduct LCA. With that said, I'm conducting LCA with 4 indicators  2 binary (X1 and X2) and 2 categorical indicators with 3levels each (X3 and X4). What would the appropriate code for this? 


CATEGORICAL is for binary and orderedcategorical or ordered polytomous variables. NOMINAL is for binary or unorderedpolytomous variables. 

Daniel Lee posted on Friday, April 24, 2015  8:46 pm



Hi Dr. Muthen, I obtained odds ratios in my conditional growth mixture (looking at betweenclass effects). However, I could not find significance tests (pvalue, confidence intervals) for these odds ratio. Is there a command I should type in the input to obtain these significance test? Thank you so much, as always! 


If they are not printed, you have to compute these. See the 2 FAQs on our website: Odds ratio confidence interval from logOR estimate and SE Odds ratio interpretation with a nominal DV in multinomial logistic regression 

Brian C posted on Friday, January 29, 2016  2:27 pm



Hi Drs. Muthen, In published papers I have read on LCA, usually the researchers identify the "best" fitting class at the model building stage, by starting with 1 class and up with just the indicators (without covariates) and comparing the BIC, etc. Then the class prevalence and posterior probabilities are presented and interpreted. After the best class solution is identified (e.g., 3class) I would see an analysis of association between the classes and covariates (e.g., logistic/multinomial logistic regression), which I understand can be done via the auxiliary option or in a singlestep approach. But what is unclear to me is that when the researchers present association analysis, they don’t address the fact that the 3 classes are not necessarily the same 3 classes identified in the modelbuilding stage (when no covariates were considered). Can I assume that that is because they used the auxiliary option (but which one, R?) such that the analysis of association between classes and covariates does NOT change the unconditional 3class model? Conversely, if the singlestep approach is used, does that mean the covariates need to be included at the modelbuilding stage (i.e., from 1 class up)? I haven’t see this done as all LCA papers I have seen simply start with the indicators, and only after the best solution is identified would they venture into covariates. Thanks! 


That's a long story. Part of it is described in the paper on our website: Asparouhov, T. & Muthén, B. (2014). Auxiliary variables in mixture modeling: Threestep approaches using Mplus. Structural Equation Modeling: A Multidisciplinary Journal, 21:3, 329341. The posted version corrects several typos in the published version. An earlier version of this paper was posted as web note 15. Download appendices with Mplus scripts. You are right that a 3step method (such as using R3STEP) can overlook matters that change the class formation when including covariates in the model (onestep). This includes cases where there are some direct effects of covariates on some of the LCA indicators  which is a case of measurement noninvariance that would seem to be quite common. Unfortunately, this is often overlooked. 

Brian C posted on Sunday, January 31, 2016  12:08 pm



Thanks, Dr. Muthen. 


I am running a series of mixture models in several steps, to build a gmm of social skills over 5 timePoints With the following recommended steps: a)Testing an unconditional singleclass model, b) Specifying a (LCGA) model comparing 2, 3, and 4 class solutions (comparing BIC, entropy, theoretical considerations, etc.), c) testing a conditional LCGM with the bestfitting model from step 2,adding our hypothesized covariates. d) specifying a conditional growth mixture model (GMM), entering the covariates, e) testing whether the covariates have different effects on the growth factors within each class and f)finally, testing whether parameter estimates are replicated using the OPTSEED. Q1: How do I determine whether the model in e) is better or worse than the one in d)? Do I look at significance of the classspecific estimates of the covariates? Or the BIC, entropy etc. ? or all of these combined? In the modification index, I get (among other many other modification suggestions): Class 1 Means/Intercepts/Thresholds [ INSSRS4 ] 30.381 4.866 4.866 0.351 [ INSSRS5 ] 23.759 5.291 5.291 0.390 [ INSSRS6 ] 112.658 19.717 19.717 1.432 Q2: should I and how do I change my input to allow for these modification suggestions in my model? 


Hi again Drs. Muthen, to follow up on my previous questions: I have found that a 3class solution works best according to the various indices. In the output under MODEL RESULTS, I get, for Class 1, that I ON x is 0.96 (p<0.01) and S ON x is 0.08 (p = 0.6). Q3: Does this mean that the greater the the startingout score for Class 1, the lower the score on x? And that the greater the score on x, the more upward the slope goes (almost sig.)? Q4: When I look at the plots for my 3 classes, they have clearly different trajectories, one starting high, going down, one starting in the middle and staying stable, and one starting low and going up. When I look at the Intercepts for the slope factors for the 3 classes, none of them reach significance. For class 2, it makes sense, but not for the other two. I think the ‘problem’ is that all the SEs are big. Q5. Does that indicate that the individuals in each of these classes are too dissimilar in their slopes? And how can I can remedy this situation? Thank you! 


Please don't double post. Q1. Use BIC Q2.No, we have found that Modindices are not quite reliable for mixture models. Q3. Q1 No, you have the causality reversed. Q2. Yes. Q4. This significance is not needed. Q5. This is not a problem. 

Ali posted on Monday, February 15, 2016  4:53 am



I am using LCA model with 4 nominal indicators including covariates (Country and gender. In the output,I want to see if the country or gender can predict significantly the latent class membership. So, I took a look LOGISTIC REGRESSION ODDS RATIO RESULTS Categorical Latent Variables C#1 ON CNT 0.976 GENDER 0.295 C#2 ON CNT 1.172 GENDER 0.316 But, it seems no significance test to know if the covariates are significant. Or, should I look another part? 


Please send the output and your license number to support@statmdoel.com. 


Hello, If I have two covariates in my LCA model and one is categorical and the other is continuous, do I need to standardize the continuous covariate if I want to calculate the class probabilities by hand or Model Constraint (e.g., logit= intercept + b1(covariate1) + b2(covariate2); exp(logit)/sum)?? Thank you! 


I'd like to investigate whether a covariate predicts latent class prevalence by gender, but not latent class structure. I included gender as a grouping variable and fixed item response probabilities across gender. Next, I: 1) compared a model where class prevalences were freely estimated across gender to a model where they were fixed across gender. Both models had the covariate predicting class (c on covariate). Does this model comparison accurately test whether the covariate is a better predictor when class prevalences are free vs. fixed across gender? 2) I compared the best fitting model from step 1 to a model without the covariate to see if the covariate was a significant predator of class membership. Do these comparisons answer my question if my covariate predicts class membership by gender? 


First post: No. Second post: I am not quite sure what you are doing. You say "fixed" but I wonder if you really fix or just hold equal. The class prevalence is different across gender if you regress your latent class variable (c) on gender (cg). The item probabilities are different across gender if you let them vary across cg classes. 

Jin Qu posted on Wednesday, August 10, 2016  11:52 am



Dr. Muthen, I am conducting a LPA analysis and I have obtained 4 classes. Now I included an interaction term (AxB, both continuous variables) as a predictor of class membership. The results showed that this interaction term is significant in predicting the likelihood of being in class 2 in comparison to being in class 1. Now I wonder how can I probe this interaction further? Can I do it in Mplus? 


Look at our page http://www.statmodel.com/Mediation.shtml under the heading Moderated mediation plot based on User's Guide ex 3.18 This shows that you can use the LOOP and PLOT options to plot the interaction and its confidence intervals. Although this is for linear regression, the same principles apply for a nominal DV. This plotting is further described in our new book: http://www.statmodel.com/Mplus_Book.shtml 

Jin Qu posted on Wednesday, August 10, 2016  7:24 pm



Dr. Muthen, Thanks for your quick reply. I am a bit confused by the model described in this link: http://www.statmodel.com/Mediation.shtml. From my understanding. the example in 3.18 describes a model in which m is mediating the link between x and y; z moderates the link between x and m (y on m; m on x z xz). The model in this Addendum is: y on m xz z; m on z xz x; Is this still the same model as 3.18? Also, my I understanding correctly that the analysis I want to pursue is a multinominal regression, so I should get the variable of membership (1,2,3,4) out from LPA analysis, and use this new variable as my depedent variable, rather than running LPA and probing interaction at the same time? 


Your model is not the same as 3.18 but the ideas for exploring interactions are the same. You can probe interactions in a single analysis. 

Jin Qu posted on Friday, August 12, 2016  4:10 pm



Dr. Muthen, Thanks for your reply. I am able to obtain the plot for interactions. However, when I added in the Bootstrap command to see whether the slopes in the interaction graph are significant or not, I received an error message that says "BOOTSTRAP is not available for estimators MLM, MLMV, MLF and MLR." Would you mind taking a look at my codes (c is the variable that I obtained from LPA using "cprobabilities." c has 4 classes. I want to use mRS, mSC and mRSxmSC to predict c)? nominal is c; define: center mRS_6 mSC_6 (grand mean); mSCpRS = mRS_6*mSC_6; analysis: bootstrap=500; Model: c#1 on mRS_6 (b1) mSC_6 (b2) mSCpRS (b3); c#2c#3 on mRS_6 mSC_6 mSCpRS; MODEL CONSTRAINT: PLOT(lowSC highSC); LOOP(mRS_6,2,2,0.5); lowSC = (b1+b3*(2.26))*mRS_6+b2*(2.26); highSC = (b1+b3*(2.26))*mRS_6+b2*(2.26); Plot: TYPE = PLOT2; output: CINTERVAL(Bootstrap); I assume that in this input, I am testing the significance (Confidence intervals) of the interaction to predict class 1 vs. 4? 


You should use ML with bootstrap. The other estimators have their own particular standard errors. ML withe bootstrap with give ML parameter estimates and bootstrap standard errors. The reference class is 4. The confidence intervals are for each parameter estimate. 


Dear Linda and Bengt, I have a few basic questions about LCA: 1. What is the difference between regressing a latent class variable c on an independent variable x1 (as in example 7.1 in the UG), and treating x1 as a latent class predictor (using AUXILIARY and R3STEP)? 2. What is the difference between treating a binary variable g (indicating sex for instance) as a categorical latent variable which has known class (group) membership, using KNOWNCLASS (as in example 7.21 in the UG) and treating it as a latent class predictor (using AUXILIARY and R3STEP)? 3. Must COUNT variables have only positive integer values? 4. What criteria or cutoff should one use to decide whether a COUNT variable should be treated as zeroinflated? 5. What criteria should one use to decide whether a variable should be treated as truncated? (e.g., should a percentage be treated as truncated?) 6. Can the differences in parameter estimates (means, probabilities) across classes be tested to see if the difference is statistically significant? Thanks, 'Alim 


Mplus Discussion is not really the place to learn about basic LCA but I will give some quick answers: 1. No difference if you have only 1 x. But if you have more x's, the difference is that c on x does not assume that the x's are uncorrelated. But as indicators, LCA assumes that the x's are uncorrelated within class. 2. None unless the predictor also has a direct effect on the LCA indidcators. 3. Positive or zero. 4. BIC 5. Usually by having a strong floor or ceiling effect, say 25% or more. 6. Yes, for instance using Model Constraint. 


Thank you, Bengt. I have rerun my LCA, recasting 4 variables that had a floor effect of 25% or more as censored from below. This has led to a lower BIC, AIC, and aBIC. I tried also declaring all 4 censored variables as inflated. For the 2class solution this also produces lower indices, but for the 3class solution they are almost identical. In the 3class solution, 2 means for the inflated variable were set at 15, one is not not significant, and one is significant (this is the same within all 3 classes). Can I conclude from this if it is better to treat all 4 (or some) of these censored variables as inflated or not in the 3class solution? Thanks! 


Hard to say. I would probably not use inflation here given your results. 


Dear Bengt and Linda, I have an ordinal variable (a measure of organization size) which I wish to include as a control variable in my LCA, as oneway ANOVAs have shown that there are significant differences in most of the indicators across the levels of this variable. 1) Is the correct approach to simply include it as another indicator or should it be declared as auxiliary? 2) When including it as an indicator, I have tried both the CATEGORICAL and NOMINAL types (in case the effect of size is not monotonic). I planned to use BIC to decide which type to use, but both produced the same indices (BIC, AIC, etc.). Is this always the case? Or does this mean that in this particular case the results are unaffected by which type I use for this variable? Thanks and happy Thanksgiving! 

Guiyun Hou posted on Friday, November 25, 2016  3:19 am



Dear Bengt and Linda, I have some problems in analysing LTA. I have 3 times, and 20 items per time, all of the items are continous data. When I run the MPLUS ,it can not output the BIC, and it gives me some warning as follows, accoring to the warning, I set the START as 500 50, it also can not work. but when I reduce the number of item to 16, the value of AIC and BIC may occur. Looking forward to your reply. syntax : TITLE: LTA Model DATA:FILE is 99.dat; VARIABLE: NAMES ARE a1a20 b1b20 c1c20; CATEGORICAL = ; CLASSES = L1(3) L2(3) L3(3); USEVAR =a1a20 b1b20 c1c20; ANALYSIS: TYPE=MIXTURE; MODEL: MODEL L1: %L1#1% [a1 a20] (120); %L1#2% [a1 a20] (2140); %L1#3% [a1 a20] (4160); MODEL L2: %L2#1% [b1 b20] (120); %L2#2% [b1 b20] (2140); %L2#3% [b1 b20] (4160); MODEL L3: %L3#1% [c1 c20] (120); %L3#2% [c1 c20] (2140); %L3#3% [c1 c20] (4160); OUTPUT: TECH1 TECH8; SAVEDATA: FILE is 99out.dat; SAVE = cprob; WARNING: THE BEST LOGLIKELIHOOD VALUE WAS NOT REPLICATED. THE SOLUTION MAY NOT BE TRUSTWORTHY DUE TO LOCAL MAXIMA. INCREASE THE NUMBER OF RANDOM STARTS. STARTING VALUES. PROBLEM INVOLVING PARAMETER 77. 


Hard to say without seeing the full output  you can send it to Support along with your license number. I assume you have done an LCA for each time point and had no problems with the 20 items. 


Answer to Alim: 1) I see a control variable as a covariate that influences the latent class variable (and perhaps some indicators directly). 2) Categorical and Nominal typically produce different number of parameters (unless it is for a binary outcome) in which case BICs would not be the same. I can't say what's going on in your case without seeing the full output. 

Guiyun Hou posted on Friday, November 25, 2016  7:13 pm



Dear Bengt and Linda, my output is as follows, thanks for your help. WARNING: WHEN ESTIMATING A MODEL WITH MORE THAN TWO CLASSES, IT MAY BE NECESSARY TO INCREASE THE NUMBER OF RANDOM STARTS USING THE STARTS OPTION TO AVOID LOCAL MAXIMA. WARNING: THE BEST LOGLIKELIHOOD VALUE WAS NOT REPLICATED. THE SOLUTION MAY NOT BE TRUSTWORTHY DUE TO LOCAL MAXIMA. INCREASE THE NUMBER OF RANDOM STARTS. THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ILLCONDITIONED FISHER INFORMATION MATRIX. CHANGE YOUR MODEL AND/OR STARTING VALUES. THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A NONPOSITIVE DEFINITE FISHER INFORMATION MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.260D20. THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THIS IS OFTEN DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. CHANGE YOUR MODEL AND/OR STARTING VALUES. PROBLEM INVOLVING PARAMETER 77. 


Please send the output and your license number to support@statmodel.com. 


Dear Bengt, Thanks for your response. I believe I now understand how to proceed, but I want to double check I have understood correctly based on your responses in this forum and various articles on LCA. 1. Because I suspect the covariates have a direct effect on some indicators, I should use singlestep regression (onestep method). 2. However, for class enumeration I should first only use the indicators. Once I have determined the number of classes, I should include covariates via singlestep regression. Not only is this common practice, it is the best approach according to NylundGibson & Masyn's (2016) recent MC simulation study. 3. Covariates must be binary or continuous. Whether I want to treat the categorical covariate 'size' as ordinal or nominal, I should create K1 dummies for it. 4. Because I suspect a direct effect between my covariates and indicators, I should include in my model "a regression of one indicator at a time on all the covariates and check for significant effects" (you wrote this above). The final model should only retain significant effects. Have I gotten it right? Thanks for your help. 


Correct on all 5  you get an A. 

Back to top 