Message/Author 


I have a question about the item labeled "LATENT CLASS REGRESSION MODEL PART" in mixture model output. Suppose one has a latent profile analysis with four standardized continuous variables and the categorical latent variable has two levels. In this case, how does one interpret the LCRMP coefficient and what does it mean if est/SE is greater than abs(2)? 


The latent class regression model part refers to the regression of the latent class variable on covariates, that is intercepts and slopes. If there are no covariates, which seems to be the case in your application, the coefficients given under this heading are just the intercepts. In this context, intercepts are logit coefficients determining the probabilities of the classes. In this case, the est/se ratio is not of much interest since the ratio tests against a zero logit, which translates to a probability of 0.5. So, these ratios can be ignored when there are no covariates. 


I have 4 years of panel data and two variables of interest. The first variable, P, is an ordinal level variable with 3 categories. The second variable, A, is continuous. What I am trying to estimate is the change over the four years in the predicted probabilities for each category of P for individual i, given a change in A. Given that there are 9 possible paths to get from year 1 to year 4 (excluding intermediate points) for variable P, is there a way to estimate a path given a change in A? In SAS, there is a procedure PROC TRAJ that handles this, and I was told that there may be a way to do it with MPLUS. Thank you. 


Sorry for the delay in answering this  I was out of town. You mention TRAJ which concerns latent classes of development, i.e. mixture modeling. The mixture modeling part of Mplus currently does not allow polytomous outcomes or trends in development over time in categorical outcomes. Your message, however, does not describe theories of latent classes of development. If instead the chief concern is describing longitudinal development of the polytomous outcome over time as a function of the continuous predictor variable, you can use the regular, nonmixture, part of Mplus for the analysis. You can either use models that are autoregressive or growth models, the latter having growth factors, i.e. continuous latent variables, that influence the outcome. The continuous predictor variable can influence either the growth factors or the outcome directly. Hope this helps. 


I have a model with three latent classes (C#1, C#2, C#3). Under the model statement, I included the command "C#1 on BLK" where BLK is a binary race variable. In the output under LATENT CLASS REGRESSION MODEL PART, there is the slope for C#1 ON BLK and then under intercepts there are two intercepts, C#1 and C#2. Why are there two intercepts? I believe I am modeling: Pr(C#1)=B0 + B1*BLK So there should only be one intercept. Thank you. 


The categorical latent class variable has three categories (classes). If you want to specify the multinomial logistic regression of the categorical latent variable on BLK, you need two statements: c#1 ON BLK; c#2 ON BLK; This is in line with regular multinomial logistic regression. See the Agresti reference on our website. 

Gil posted on Tuesday, August 21, 2001  7:24 pm



I'd like to identify latent segments differing with respect to the price elasticity for many commodities. The model, say, a mixture regression model allows for the fact that these price elasticities may differ for different segments, and consists of: 1. to explain the latent variable as a function of covariates 2. to predict a dependent variable as a function of predictors The problem is that the dataset contains repeated observations (timeseries) for each commodity (crosssectional), and the model closely relates with these longitudinal data. Suppose a regression model for estimating price elasticity, i.e., doublelog model of quantity on price. We estimate price elasticities for N commodities by using T repetedly observed data for each, but within the framework of latent segments. This might be accomplished using the command GROUPING which identifies commodity. But the command GROUPING cannot be applied with MIXTURE. Could somebody please give me some suggestion how I would fit these models using Mplus? Thanks in advance for explanations, comments and tips. 


It sounds like your observations are commodities. I'm not sure why you want to use the GROUPING option. If you want to identify latent segments of the commodities, the TYPE=MIXTURE is appropriate. GROUPING is used when there are observed not latent groups or segments. A growth mixture model would probably be appropriate here. If you check the References and Examples posted here on the website, you may find something to help. 

Hana Kim posted on Sunday, November 11, 2001  2:56 pm



Hello! My research is conjoint marketing studies where respondents (cases) were repeatedly asked to provide personnel intention of purchase under several different scenarios. Please note that there are several records per respondent. My interest is to classify cases into several homogeneous groups and to develop regression models for each segment. Since it appears most likely that any different demographic profiles causes these segments, I will show then how to classify each respondent into the segment which is most appropriate. (A very similar example of my study could be seen from http://www.statisticalinnovations.com/tutorials/tut2.htm) Do the Mplus fill these requirements, for cases involving repeated measurements? 

bmuthen posted on Tuesday, November 13, 2001  8:23 am



I don't know conjoint analysis, but it sounds like you can use the Mplus mixture analysis to let the repeated measures for respondents be the latent class indicators (u variables in Mplus language) that measure the latent class variable (c in Mplus), and regress c on covariates (x). Here, the latent classes of c gives you the different segments and segment probabilities are expressed as a function of the x's by multinomial logistic regression. All of this is carried out by a single analysis using maximumlikelihood estimation. This analysis relates to what is referred to as Latent Class Growth Analysis in paper number 86 on the Mplus web site  we'd be happy to send this to you. 

Anonymous posted on Wednesday, January 16, 2002  8:57 am



Hello, thank you for making this forum available. I have fitted a threeclass mixture, with class 1 being the least prevalent and class 3 the most. One of the models I need to run is a logistic model of a binary outcome based on class. The issue here is that the Probability of the outcome in class one and two is 1, or as near to certainty as you get. Therefore, there is no variance in the dependent variable for classes one and two. However, a person in class three has only a hypothesized 60% chance of the event (as supported by empirical frequency results). What is interesting is to 1. Categorize subjects into class three, and 2. Calculate their probability of the event. I think I can do this by first assigning kids to class, using one model then, second, running an additional logistic regression for the kids in class three in a second model (weighting by the probability of being in class three from the output data set), but this seems like an inelegant solution. Is there a way in Mplus to code the regression model into the first mixture model, such that it runs only for class three? 

Anonymous posted on Wednesday, January 16, 2002  2:02 pm



Sorry for 2 questions in one day, and thank you for answering them. Is there away to tell Mplus a dependent variable of interest and have it output predicted values for it based on the structural model? I'm pretty sure I can do this by hand, but obviously, it would be easier if Mplus were kind enough to do it for me. many thanks. David 


The way to handle the problem of zero variability of a dependent variable in two of the three classes is as follows where u1u5 are your binary latent class indicators and u6 is the binary outcome. Here u6 has probability of 1 in the first two classes and its probability is estimated in the third class. Note that u6 is seen simply as yet another latent class indicator. As usual, the probablity of u6 given class 3 is in a logit scale. This is, of course, a partial input. VARIABLE: categorical = u1u6; MODEL: %OVERALL% %c#1% [u1$1u1$5*1]; [u6$1@15]; %c#2% [u1$1u1$5*0]; [u6$1@15]; %c#3% [u1$1u1$5*1]; [u6$1*.5]; 


Regarding your second question regarding predicted values of an observed dependent variable, Mplus does not do this automatically. 

Anonymous posted on Thursday, June 27, 2002  8:29 am



I have 5 years of data collection of a continuous outcome. Using MPLUS I was able to identify 4 trajecory classes with a quadric model. A reviewer suggested me to revise this findings using PORC TRAJ in SAS. I don't know what are the assumptions made in MPLUS as compared to PROC TRAJ and if the classes could be different. Has anybody compared both approaches? Are they similar? Is there any literature available? Thanks 

bmuthen posted on Thursday, June 27, 2002  12:28 pm



There are articles related to this topic authored by me and listed under References, Growth Mixture Modeling on this web site, e.g. papers 82, 85 and 86. PROC TRAJ assumes no withinclass variability of the trajectories, which is a special case of Mplus, restricting the growth factor covariance matrix to zero (i fixed at 0, s fixed at 0, i with s fixed at zero). My experience with realdata analyses is that this specification often does not fit the data well. 

Anonymous posted on Thursday, June 27, 2002  2:01 pm



It appears that one can only include a single latent variable in an Mplus MIXTURE model. Is this due to methodological restrictions? Are you planning on expanding on this capability in later versions ? Thank you. 

bmuthen posted on Thursday, June 27, 2002  3:25 pm



You can have as many continuous latent variables as you want in mixture modeling. As for categorical latent variables, the program is intended for a single variable, but can be used also with several variables. The multiple latent categorical variable approach is described briefly on page 11 of paper #86. New Mplus development are in progress for more efficient handling of multiple latent categorical variables. 

Anonymous posted on Friday, August 23, 2002  12:45 pm



Can mixture modeling in MPlus analyze a set of regressions, or simply one regression at a time? In other words, if I were interested in a set of regressions such as the following: d= a + b + c + error f= d + e + error h= f + g + error would I need to analyze each regression seperately, or could I have the procedure analyze the set of regressions concurrently? 

bmuthen posted on Friday, August 23, 2002  1:57 pm



The set of regressions can be analyzed in a single analysis. 

Bonnie posted on Monday, April 11, 2005  3:35 pm



Dear Whom May Concern: I have a question about SEM with a categorical latent variable. Outcome Y, Mediators M1 and M2 are all continuous variables. But U1 and U2 are both binary aviables, so the latent variable C is also categorical. X1X4 are covariates, not shown in the graph. The following is my code. My question is that how to write the code for the Model section, should I say "Y on C M1M2 X1X4" or say "Y on C#1 M1M2 X1X4" ? It would report error if I used the former one. And how to order these statements. What is shown below is not working actually, I just hope to provide some info. I would greatly appreciate your help! VARIABLE: NAMES ARE X1X4 M1 M2 Y u1 u2 ; USEVARIABLES ARE X1X4 M1 M2 Y u1 u2 ; CATEGORICAL = u1 u2 ; CLASSES = c(4); ANALYSIS: TYPE IS Mixture; ALGORITHM=INTEGRATION; Model: %overall% Y on u1 u2 M1M2 X1X4; c#1 on u1 u2; c#2 on u1 u2; c#3 on u1 u2; Best Regards, Bonnie 


The latent variable c for u1 and u2 does not have to be categorical because u1 and u2 are categorical. The factors in SEM are continuous not categorical. They can, however, have indicators that are continuous, categorical, or other scales. Do you want a traditional SEM model where the factors have categorical indicators or are you interested in a mixture model where the factors are categorical? 

Huabin posted on Monday, May 09, 2005  12:26 pm



I am trying to use Mplus for a mixture modeling. I am confused with the CLASS statement on P109 of User's Guide. Looking at the data file for Ex.7.1.I noticed that the classes of all the observatios have been specified( 1 or 2), not "latent" . But the User's Guide on p 109 says, " ... there is one categorical latent variable c that has two latent classes." I can not understand. Sorry I raised this very basic question. Sincerely, Huabin Luo 


The 2 is the number of latent classes. Perhaps I don't understand your question. It is not necessary to specify latent. 

Lilian posted on Sunday, December 04, 2005  7:08 pm



Hello, I was wondering whether we can change the reference category in Mplus when running a latent class regression. I am running a 6class model and regressing the latent outcome on a few covariates, and i would like the reference category to be the class with the lowest symptom probability.. is that possible? Thanks! 


You can use the ending values of the class you want to be last as starting values for the last class in a subsequent run and that class will be last. 

pete posted on Thursday, February 09, 2006  11:53 am



Hello, I try to fit a mixed logistic regression model with covariates on both the regression and the the latent class part on an individual level. The model is working and there appear no error messages. Since such models tend to be unidentifiable, does the lack of error messages indicate that the model is identified or is there no guarantee for indentification in mplus? 

bmuthen posted on Thursday, February 09, 2006  12:09 pm



That is a notoriously difficult model and I would be a bit suspicious. Look at your condition number  if it is close to 1010 I would be wary. You can also try a high starts = value to investigate the trustworthiness of the solution. You can also do an Mplus Monte Carlo study using your parameter estimate values to see if the model can be recovered.  If the model still holds up, I'd like to use it as an example... 

Shane Allua posted on Saturday, February 03, 2007  5:25 am



Hello, I see that odds ratios for regression of latent class variable on covariates is new in V4.2. What syntax is required to get this information and can the ORs be output to a dataset? Thanks! 


It is done automatically. 

xi li posted on Tuesday, August 11, 2009  9:56 pm



Hi, Just run a simple mixture model, but kept getting the error. Could you tell me what went wrong? I am using mplus 4.2. Thanks! Title: mixture Data: a.dta.dat; Variable: Names are a1 a2 a3; Categorical are a1 a2 Classes = c(4); Analysis: Type=mixture; Model: %OVERALL% a3 ON a1 a2; c ON a1 a2; Output: tech1 tech8; *** ERROR in Model command Unknown variable(s) in an ON statement: C 


Xi Li, Add a ; to the end of: Categorical are a1 a2 /Amir 


I have fit a series of LCGA models on 13 infant growth measures. I have decided on the number of classes that I favor and would now like to examine associations between class membership and a series of distal (continous) outcomes. I have tried a number of variations in code and received error messages. My only success was with the 2 class model (which is not my favored model) where I added the following line at the end of my model statement: bmia ON C; 1. What code is needed for a regression with a 4 class model? 2. If I have a series of outcomes I am interested in, can I put them all in one model statement? Thank you, Meghan 


The effect of a distal outcome is seen in the means or thresholds varying across classes. The ON option is not used. If you have more than one distal outcome, the assumption of conditional independence is imposed. If this is not what you want, you may want to do one at a time. 


Hello, I would like to fit a LCA model with categorical covariates. I am not sure how to specify that covariates are categorical. If I specify them as categorical, this is the error message I am getting: *** ERROR The following MODEL statements are ignored: * Statements in the OVERALL class: C#1 ON X5 *** ERROR One or more MODEL statements were ignored. These statements may be incorrect or are only supported by ALGORITHM=INTEGRATION. Please find my code with just one categorical covariate below: variable: names are X1X6 u1u7; usevariables are u1u7 x5; categorical are u1u7 x5; classes = c (2); analysis: type=mixture; model: %overall% c on x5; Could you please help me to sort out this problem? Many thanks! Olga 


The CATEGORICAL list is for dependent variables. In regression, covariates must be binary or continuous. In both cases, they are treated as continuous. 


Many thanks, Linda! Is there a way to include a categorical covariate with more than 2 categories into regression without creating dummy variables? Thank you very much, Olga 


Not if the categories are unordered. 


I am using 2 covariates in my LPA. These covariates are correlated because of shared method variance (same rater). I was wondering if the logistic regression is a standard (instead of backwise/stepwise) and if I can assume that the shared variance is not attributed to any of those two variables? I have one variable at an earlier timepoint but this doesn't lead to the same results. Basically, can I use those two variables (they do not predict the same class membership). 


I forgot to say that these 2 variables are measuring two members of a dyad (motherchild). Maybe I should use the earlier measure of the mother because I would be partialling out variance attributable to the child as well as the observer if I take a measure that was taken during the same interaction. 


Mplus does standard logistic regression. I am not sure how you should approach the situation with your covariates. 


How would one do a liability threshold model with the latent class variables as (latent) dichotomous dependent variables for analyses with twin data? It was mentioned in the Twin Research and Human Genetics 2006. I have found how to use it with "normal" variables but wonder how to do it with classes. 


I chose to post it here since I want to use regressions in a multilevel model to get heritability estimates. This strategy was shown by McArdle and Prescott in the same issue of Twin Research and Human Genetics. Sorry if it didn't seem logical to post here at first sight. 


Andre, The third chapter in my dissertation addresses how to use latent classes in a liability threshold model (Clark, S.L. (2010). Mixture modeling with behavioral data. Doctoral dissertation, University of California, Los Angeles.). The appendix for that chapter includes example Mplus code for this model. In order to do this model in Mplus you will need to be using version 6. A copy of my dissertation can be found on the Mplus website under the factor mixture modeling tab of the papers section. If you have any questions feel free to email me at shclark@ucla.edu 

mpduser1 posted on Monday, October 04, 2010  11:45 am



Is it possible to use MODEL PRIORS in Mplus 6.0 to specify a small informative priors to aid in the identification of a latent class regression analyses when one of the latent classes is small (see, for example, the procedure mentioned by Collins & Lanza, 2010)? 


You can provide informative priors for every model parameter in Mplus and yes it should be possible to use informative priors to help identify small classes. 

Sarah Ryan posted on Tuesday, March 29, 2011  5:06 pm



I'm trying to figure out how best to go about analyzing a mediation model which can be described: 1) Secondary data set (N= appx. 9,000), using three waves of data 2) Several background covariates 3) 5 exogenous measures 4 latent factors and 1 manifest (continuous) indicator (a student colleague has suggested using bifactor analysis to treat these as one "general factor" as each of the 5 measures could be considered submeasures, suggests this may simplify interpretation of any mediation effect) 4) Latent mediator (Arrived at through latent class analysis) 5) One manifest DV (6 ordinal levels, treated as continuous) The more I read on these discussion boards, the less convinced I am that using a latent class variable as the mediator actually is doable, or that it is a match theoretically (aside from the fact that this may be an unnecessarily complicated model). It almost seems like what I'd end up with is more a moderation analysis (DV would actually be DV means as a function of class membership). Am I right in this thinking? 


A latent class mediator makes for a more complex model, just like an observed nominal mediator would. What should mediation mean in this case? Perhaps the following. The latent class membership can be influenced by exogenous variables, including factors, and latent class membership can influence DVs (by changing their means if cont's as you said). One can also add the restriction of having no direct effects from IVs to DVs. That formulation seems reasonable and the modeling can be done in Mplus because you can have latent variables influencing class membership. But how an indirect effect should be quantified is not clear  it is not just a product of two slopes as with a cont's mediator. Re 5), perhaps you are thinking about a secondorder factor model where only that general factor is an IV. With the bifactor model the general and the specific factors all can be IVs. 

Sarah Ryan posted on Wednesday, March 30, 2011  2:50 pm



Thanks for this response it is very helpful. I also just read your 2009 paper with Clark, "Relating LCA Results..." and this gives me more food for thought. It is theoretically conceivable, with the indicators I'm using and the construct I'm testing, that the latent CLASS mediator could function as latent FACTOR, making it continuous and reducing a bit of the complexity. My committee suggested considering the latent class mediator, but I'm not sure they realized that they were sending me off into relatively uncharted waters (as far as I can tell, though perhaps I'm wrong). I need to go back to my model and the literature to think more about which (factor or class) I believe is more likely. I'll keep plugging away here, and VERY much appreciate this board and your advice. 


Hi, I am trying to regress a continuous variable on a categorical latent variable (c = 3) and on a continuous latent variable. Here is my code: TITLE: SEM WITH CATEGORICAL LATENT VARIABLE DATA: FILE IS wpa4.dat; VARIABLE: NAMES ARE U1U14; USEVARIABLES U1U13; CATEGORICAL = U2U13; CLASSES = C (3); ANALYSIS: TYPE = MIXTURE; ALGORITHM = INTEGRATION; MODEL: %OVERALL% F BY U7U13; C ON F; U1 ON F; U1 ON C F; %C#1% [U2$1U6$1]; %C#2% [U2$1U6$1]; %C#3% [U2$1U6$1]; OUTPUT: TECH1 TECH14; The statement " U1 ON C F;" gives me the following error message: ** ERROR The following MODEL statements are ignored: * Statements in the OVERALL class: U1 ON C#1 U1 ON C#2 *** ERROR One or more MODEL statements were ignored. These statements may be incorrect. Please help! Thanks, Igor 


You cannot regress an observed variable on a categorical latent variable. The results you are interested in are the means or thresholds of the observed variable varying across classes. 


Linda/Bengt, I have a single binary covariate predicting different trajectories for emotions over time (10 measures), which in turn are expected to predict differences in consumption. Bengt was kind enough to direct me to examples of mixture modeling with distal outcomes, and I have experimented with many variations, including keeping factor variances and residual variances as classinvariant. My question now is simple  how can I estimate the indirect effect of the covariate on the distal outcome? Unlike normal mediation, there is no a*b effect to be estimated. How can I assert that the effect of the covariate on the outcome is mediated by the differences in trajectories? Thanks Suresh 


Are you saying that your binary covariate influences the latent class membership, where latent class membership gives class specific means for the outcome? 


right, that is what I am saying. So, the binary covariate influences the intercept, linear slope and quadratic trend, and these in turn are predicted to lead to differences in the distal outcome. I included the statement c#1 on x to estimate the effect of the covariate on latent class menbership, and I know that the beta for the effect of c on the distal outcome is given by the difference in the class means. What I now need to know is whether there s an indirect effect of the covariate on the distal outcome. 


Please give your MODEL command statements so I am sure of your model. 


Bengt, This is what I have. I hope I am doing this right. My x variable is litdar (01 variable). My distal outcome is consum (range 050,treated as continuous). What I find is the following: a) 2 class model with classvarying psis and thetas fits the data well, better than a growth model. The two classes are high versus low guilt. b) The effect of the covariate on class membserhip is not significant. c) However, within the high guilt class, the covariate predicts differences in trajectories. One group (x = 0) has a rising guilt while the other (x=1) has a reducing guilt, and the linear and quadratic growth factors are significantly different for the two groups. Further, the distal outcome is significantly different for these two groups within the high guilt condition. d) There are no differences in trajectories for the low guilt class, and the distal outcome also does not differ for these two groups. 


This is the set of MODEL commands I used. MODEL: %OVERALL% ac bc qc ag1@0 ag2@1 ag3@2 ag4@3 ag5@4 ag6@5 ag7@6 ag8@7 ag9@8 ag10@9; [ag1ag10@0]; ac*5048.14; bc*607.81; qc*5.90; ac WITH bc*678.04; ac WITH qc*58.72; bc WITH qc*57.52; ag1*779.60 ag2*240.52 ag3*428.66 ag4*363.95 ag5*369.10 ag6*511.33 ag7*645.82 ag8*682.81 ag9*265.19 ag10*582.09; ac ON litdar; bc ON litdar; qc ON litdar; c#1 ON litdar; consum ON litdar; %c#1% [ac*51.48 bc*5.39 qc*.71]; ac*5048.14; bc*607.81; qc*5.90; ac WITH bc*678.04; ac WITH qc*58.72; bc WITH qc*57.52; ag1*779.60 ag2*240.52 ag3*428.66 ag4*363.95 ag5*369.10 ag6*511.33 ag7*645.82 ag8*682.81 ag9*265.19 ag10*582.09; ac ON litdar; bc ON litdar; qc ON litdar; consum ON litdar; %c#2% [ac*120.51 bc*24.93 qc*2.1]; 


So you have a direct effect of litdar (your binary X) on consum (your Y). But you don't have any indirect effect because litdar is not significantly influencing the class membership and although litdar influences the growth factors within class, your model doesn't say that the growth factors influence litdar. 

Jamie Vaske posted on Tuesday, January 17, 2012  9:32 am



Hello, A colleague and I were recently looking over the Jung & Wickrama (2008) article on Latent Class Growth Analysis and Growth Mixture Modeling with MPLUS. In their article, they have a LCGA and they directly regress the slope factor on a covariate. Here is their syntax: Model: %OVERALL% i st1@0 t2@1 t3@2; is@0; i s ON x; c#1 ON x; c#2 ON x; Our question pertains to how to interpret the effect of the covariate on the growth factors. The variation in the growth factors is set to 0, so the covariate is not explaining variation in the growth factors within a class. What does the effect of X on the growth factors represent when the variation in growth factors is constrained to zero? Thanks! 


With a conditional model, the residual variances are fixed at zero. When i and s are regressed on x, it is a shift in means for each gender if x is, for example, gender. 

Regan posted on Friday, February 03, 2012  11:32 pm



"This brings up two issues which may not always be well understood in mixture modeling. First, modeling the influence of a latent class variable c on a distal outcome y is not done by saying y ON c, but what is done gives information equivalent to having used ON..." Dr. B. Muthen, the above was a response you gave to someone some years back...I am new to LCA and want to clarify somethings with regards to this comment: 1) Do I interpret your comment correctly if I say that when adding a distal outcome y to see the effect of class membership on y, that instead of using a y on c command, that we should just add the outcome variable to the 'usevariable' statementtherefore it is technically a covariate, but interpreted as an outcome? 2) Similar, but regarding dependent variables: Is there a substantive difference in whether we add the dependent variable to the 'usevariable' statement vs. using the 'knownclass' statement and adding a regression statement of c on x? If there is a significant association between x and c (for instance if x is gender) should we move to a multiple group analysis? 


1. It is still an outcome. In reality it is another latent class indicator. 2. Please send outputs that illustrate what you are asking to make it clear. Also send your license number to support@statmodel.com. 

Regan posted on Monday, February 06, 2012  9:46 am



Thank you for the clarification. I am just starting out with the analysis and trying to understand what I have learned from attending your sessions and putting them to practical use at the current time. Therefore I have not yet gotten any output yet but wanted to understand more about the different command statements in order to obtain the correct output. 


A good way to answer your question and get experience is to run the analysis various ways and compare the results. 


Hello, My LVSEM model involves two steps. First completing a latent profile analysis to devise a latent class variable of commuity adversity; and THEN, using that latent class variable as a 'predictor' in a LVSEM model. Is there an example of how to do this somewhere. The results of my LPA confirmed a 3class solution. So, I thought that inorder to use my new Latent class variable in my model all I would have to do is have in the variable command. CLASS = C(3); and then define my latent class variale in the %overall% model command with its continuous indicators; while using the TYPE=MIXTURE analys command. Then I could regress my latent class C variable onto my outcome of interest. However, an error message i saying that my latent class variable cannot also be defined as a continous latent variable. Any help that you can offer would be appreciated. ***Melissa 


Please send the output and your license number to support@statmodel.com. 


Hi Dr. Muthen, I cannot, it is government protected data where each output has to be vetted through security and you are only allowed to vet twice. So, I need to save it for when my model works. Any ideas would be helpful. 


I'm guessing you have y ON c. This is not the correct specification. Remove that. What you want to look at is the varying of the means of y across classes. 

Gail Smith posted on Friday, March 30, 2012  7:22 am



I am doing a LCA with 3 classes and want to change my reference class from class 3 to class 2. In earlier posts, you have suggested to use the ending values for the parameters in the class that you want to be last as starting values for the parameters in the last class in a subsequent analysis. My question is: where do I find these ending values? Thank you for your help. 


They are your results in the analysis where class 3 is the reference class. You can use the SVALUES option of the OUTPUT command to generate input with starting values and then change the class labels. You also need to change the means of the categorical latent variables. 


It is clear that one cannot treat a latent class/profile as an independent variable by regressing X on C; instead you recommend including the X outcomes in the analysis to see how they may vary across the classes. My model is a 6profile solution (LPA of 7 continuous indicators), however I am not interested in how the 6 profiles are differentially related to a dependent variable. Rather, I want to compare the predictive abilities of certain profiles to other independent variables (e.g., controlling for a closely diagnoses, do profiles 1 and 2 predict impairment). e.g., X on C#1 C#2 DX1 DX2; Is their a way to run such an analysis in a single step in mplus? I understand that it is not recommended to export the posterior probabilities and run the analysis in a second step. Could you also explain why it is not possible to regress X on C in Mplus? Do you expect this to be possible in future versions? 


We don't specify x ON c but the intercept difference of x across classes is this regression. You can use MODEL TEST to test the intercept differences. 


So you should say x on dx1 dx2; and then the x intercept differences across the latent classes are equivalent to the slopes for "x on c". 

Regan posted on Tuesday, June 12, 2012  11:53 am



Drs. Muthen I have one question: In the sample code below, drug use is used as a predictor of class membership: %OVERALL% c#1 on drug; Is the following the correct code if I want to use drug use as a distal outcome  testing if class membership predicts drug use? %OVERALL% drug on c#1; Thank you! 


For a distal outcome, simply include drug on the USEVARIABLES list. The varying of the means of drug across classes captures drug ON c#1. 

Regan posted on Friday, June 15, 2012  10:42 am



Thank you! 

Regan posted on Thursday, June 21, 2012  3:13 pm



Hello Professors My first question: In your handout on LCA on slide 126 it shows that the predictor variable "black" is not siginificant in the regression equation for class 1, however it is significant for classes 2 and 3. My question here, is if this is interpreted as 'a significant predictor of class only for classes 2 and 3, however being black is not a significant predictor of class 1"? Also, would this imply that a multiple group model be run for black and nonblack respondents? My second question: In using a distal outcome, I know I need to compare the means across groups and use the 'model test' command. However, if I have 3 groups, do I need to run the model three times to obtain 3 different Wald tests (p1=p2; p2=p3; p1=p3)? Thank you so much. 


1. No. For the interpretation, see page 445 of the Mplus User's Guide. 2. To test the three separately, you need to run MODEL TEST three times. 

Regan posted on Friday, June 22, 2012  11:22 am



Thank you again 

Regan posted on Friday, July 27, 2012  2:08 pm



Professors: When conducting the test of mean differences on a distal outcome, I am using the MODEL TEST command. I am running this several times because I have four groups. I am wondering if there needs to be a posthoc Bonferonni test applied in this context, and if so, how is it conducted in Mplus? Thank you. 


Whenever you do several tests you should consider some type of correction. I would suggest being conservative about the pvalues. There is no formal approach taken. 


Hello, I run regression of the latent class variable on covariates. In the Model Result part of the output, for some covariates the S.E. is 0 (and pvalue 999). What is the problem and how can it be avoided? Many thanks for your help. 


That means that the slope cannot be determined. This happens when a class has zero variance for a covariate  everyone in that class has the same covariate value. It is the same issue as in ordinary logistic regression. It is not really a problem in that it is useful to know that people in that class are homogeneous with respect to that covariate. 


Hello, I have encountered the following error will running a LCA model without covariates on a dataset that contains both continuous and dichotomous outcomes (class indicators) with no missing values. I'm not sure what to do with this message since I don't have any covariates. THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NONPOSITIVE DEFINITE FIRSTORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS0.297D16. PROBLEM INVOLVING PARAMETER 62. This parameter refers too: STARTING VALUES FOR LATENT CLASS REGRESSION MODEL PART ALPHA(C) C#1 


Please send the output and your license number to support@statmodel.com. 


Thanks for your past responses. In the following input I can regress an observed variable, YT2, on a categorical latent variable, CW. Model estimation terminates normally, although I do receive a message about a nonpositive definite matrix. Several users report receiving Mplus output errors when they try this regression. The advice in response is that an observed variable can not be regressed on a latent variable. Instead, include the observed variable on the USEVARIABLES statement and then examine classspecific means. What is the reason why I am able to run the regression? Is it because the observed variable YT2 is a dependent variable in the model? VARIABLE: USEVARIABLES = YT1 YT2 U1U7 W1 W2 WGT NBRHD; WEIGHT = WGT; CLUSTER = NRBHD; CLASSES = CB(2) CW(2); BETWEEN = CB W1 W2; !W1 and W2 are betweenlevel covariates. WITHIN = YT1 U1U7; !YT1 is a covariate. ANALYSIS: TYPE = MIXTURE TWOLEVEL; MODEL: %WITHIN% %OVERALL% CW YT2 ON YT1 ; !YT2 is on both levels. Omit from BETWEEN= and WITHIN= above. YT2 ON YT1; %BETWEEN% %OVERALL% CW ON CB ; CB ON W1 W2; YT2 ON CW; !The regression in question. MODEL CW: %WITHIN% %CW#1% [U1  U7]; %CW#2% [U1  U7]; 


Please send your output where you see this YT2 ON CW regression to support@statmodel.com. 


For the input below: (1) YT2 ON CW is modeled on the between level. Does this mean that betweenlevel clusters influence the association between withinlevel latent class membership (CW) and YT2? (2) Is the first regression of YT2 on YT1 (under MODEL: %WITHIN%) expected to produce an intercept for YT2? There were slope estimates in the Within Level results but I did not see intercept estimates. YT2 intercept estimates are in the Between Level output, and I assume they are for the regression on CW. (3) If I uncommented the regression lines under MODEL CW, would I be allowing the regression of YT2 ON YT1 to be different for each class? Thank you. VARIABLE: USEVARIABLES = YT1 YT2 U1U7 W1 W2 WGT NBRHD; WEIGHT = WGT; CLUSTER = NBRHD; CLASSES = CB(2) CW(2); BETWEEN = CB W1 W2; WITHIN = YT1 U1U7; ANALYSIS: TYPE = MIXTURE TWOLEVEL; STARTS = 100 20; MODEL: %WITHIN% %OVERALL% CW ON YT1 ; YT2 ON YT1; %BETWEEN% %OVERALL% CW ON CB ; CB ON W1 W2; YT2 ON CW; MODEL CW: %WITHIN% %CW#1% [U1  U7]; !YT2 ON YT1; %CW#2% [U1  U7]; !YT2 ON YT1; 


(1) You have %BETWEEN% %OVERALL% CW ON CB ; which should not be used because it regresses a continuous betweenlevel random effect on a categorical latent class variable, which is not the Mplus design. The CW means can vary across CB classes without saying this. (2) An observed variable has only one intercept and this is by default printed on Between. (3) Yes, but you can't have it in the Overall part as well because that would lead to nonidentification. 

S Elaine posted on Wednesday, March 18, 2015  12:42 pm



Van Horn et al. (2009) stated: "In general, we believe that regression mixture models are best viewed as a largesample technique, though further methodological research is needed before sample size guidelines are provided." Are there any recent guidelines about sample size for Latent Class Regression? I am unable to locate recent articles addressing this issue. I ask, because in exploring this technique our research team found three distinct groups based on differential effects of 3 risk factors on 3 mental health outcomes; however, we only have 291 children in our sample. I am wondering if it is reasonable to proceed with examining two predictors of group differences. Thank you for your help. 


I am not aware of such guidelines. You can easily do your own Monte Carlo study in Mplus to find out. 


I am running a multigroup latent class model with medical conditions defining latent class (c) and HIV serostatus as the knownclass. I have a set of covariates on which I want to regress both the latent and known classes. I do not want to treat them as distal variables but do want to allow them to influence class membership and within class conditional probabilities. The categorical covariates are fine. The issue is with the covariate  age in years  measured on an interval level. To get the model to run and converge, I have to use these statements in the model statememt: Model: %Overall% c with age; c on newrace2; c on orient; c on k6cat; c on alcuse2; c on mjuse2; c on methuse2; c on eduse2; c on popuse2; This produces means for age and ORs for the categorical predictors. If I flip the statement "c with age" to be "c on age" to get ORs for each increased year of age as I might in an ordinary LR, the model does not converge and/or runs forever. I suspect the distribution of age, which has few cases at the upper end, might contribute to this. Is there a better way to get ORs with CIs for the age variable? Also, and if not, is there a way to get significance tests to compare mean ages across the different latent classes or do I just have to use the CIs for age in the printout to figure that out myself? Thanks for any help. 

Jon Heron posted on Monday, October 19, 2015  9:17 am



I guess in a more general setting you could derive the directional association from the ratio of the covariance and the variance of the independent variable, however I am struggling to envisage this when C is latent nominal  what does a covariance even represent in this situation? Also, when you say "tests to compare mean ages across the different latent classes" it sounds like you are now thinking of age as being dependent. There are discussions in the technical appendices regarding continuous dependent variables causing problems when their distribution is nonnormal, however I'm not aware of this problem when the variable is a predictor (indeed it's use as a predictor was used as one solution to this (LTB)). 


Just to add to Jon's answer, perhaps you want to scale down the age variable, e.g. centering it and/or dividing it by 10. 

Chris Giebe posted on Tuesday, November 15, 2016  2:33 am



Hello, I'm trying to include a covariate into my twolevel model, with class 1 as a reference class. I've been using example 10.6 of the user's guide and your ASB example of topic 5 part 3 video as a reference, to create the following model: MODEL: %WITHIN% %OVERALL% c ON PGCASMIN; %c#1% [PLB0357$1PLB0350$1*0]; %c#2% [PLB0357$1PLB0350$1*1]; %c#3% [PLB0357$1PLB0350$1*2]; %c#4% [PLB0357$1PLB0350$1*3]; %BETWEEN% %OVERALL% f BY c#2c#4; f ON w; but am getting this error: *** ERROR in MODEL command Unknown variable(s) in a BY statement: C#2C#4 What am I doing wrong? Thanks in advance 


I don't know how you make class 1 your reference class. Mplus reacts against the mentioning of c#4 in the BY statement. Instead say f BY c#1c#3; If this doesn't help, send output to Support along with license number. 

Chris Giebe posted on Thursday, March 30, 2017  6:29 am



Hello, Using manual BCH, I have successfully run an LCR with a single covariate and a single continuous distal outcome, comparing class means using the wald test. Is it possible to run several distal outcomes that are all part of a questionnaire? In my dataset, I have several items of a health questionnaire that I would like to combine into an "overall health score" that is specific to each latent class and compare classes on that score. Is that possible? Comparing the different classes on an overall health score is just so much more meaningful to what I'm trying to do than comparing every single item in the quetionnaire. Thanks 


Why not use the total score as the observed distal outcome. Running several outcomes gives the same result as running one at a time. 

Chris Giebe posted on Saturday, April 01, 2017  7:06 am



Thanks for the quick response. I guess that makes sense to create a summary score beforehand, and then include that in the model. I do have a followup question: In the output, under MODEL RESULTS I am seeing the class specific Estimates, S.E., Est./S.E., and pvalues columns. Am I correct in understanding that the intercepts are the class means of my outcome variable? I'm noticing under RESIDUAL OUTPUT (I'm assuming this is Tech4?) ESTIMATED MODEL AND RESIDUALS (......) that there are also Model Estimated Means for my covariate and outcome. These are vastly different than the intercepts under MODEL RESULTS. Which ones do I report? The Model Estimated Means under RESIDUAL OUTPUT or the intercepts under MODEL RESULTS? Thanks. 


The intercepts for the outcomes are not the means for the outcomes  just like in regular regression. 

Jenny Chang posted on Monday, November 13, 2017  4:32 am



I am trying to use 3step approach to compare the difference of a distal outcome PND. My preliminary result by auto BCH is consistent with those by traditional 3step. Then I further control effect of covariate AG on PND, which was not assumed to vary across classes. 1.Result of manualBCH shows Classification Probabilities matrix had negative value and also value above 1. The result is very different from those by autoBCH.Dose it fails in this case? The webnote 21 mentioned equal variance of distal outcome may solve this problem, but I did not found the sample code. 2.I followed the syntax in Appendix E in Asparouhov & Muth¨¦n(2014) to use manual 3step (Vermunt,2010), and compared the intercept of PND by wald test(syntax as follow), in order to compare the difference of their mean. Does it make sense? 3.If my step 2 makes sense, I found the result has less significant pairwise comparison than those by autoBCH and traditional 3step. Is the result by this manual 3step robust, since the association between C and distal outcome is even lower than those by traditional approach. 4.Based on the current reuslt, which approach is recommended? Model: %overall% PND on AG; %C#1% PND (a1); %C#2% PND (a2); ¡ MODEL TEST: 0 = a1 ¨C a2; 


Send the 2 outputs (manual 3 step and auto BCH) to Support along with your license number and these questions. 

QianLi Xue posted on Thursday, January 11, 2018  7:58 am



When fitting a latent class regression, I got the following: LOGISTIC REGRESSION ODDS RATIO RESULTS Categorical Latent Variables RISK#1 ON R1LOWACT 0.318 R1SLEEP 1.472 R1FINFIT 0.873 R1HEALTHYR 0.671 R1COMMENG1 1.010 R1MEDS 6.985 R1TECH 1.958 RISK#2 ON R1LOWACT 0.543 R1SLEEP 1.263 R1FINFIT 0.843 R1HEALTHYR 0.797 R1COMMENG1 1.001 R1MEDS 1.983 R1TECH 1.558 The question is why the z scores and confidence intervals are shown. They were shown in the Alternative parameterization table. Do we expect to get pvalues from there? 


The pvalues are not given for these odds ratio results. You can compute confidence intervals for them using the approach shown in our FAQ on our website: Odds ratio confidence interval from logOR estimate and SE 


I want to ask about MODEL RESULTS in case of having 2 class latent variable and binary observed variables what does Thresholds in each class refer to? Does it refer to the coefficients of logit function directly(alpha and beta)? logit(prob(y=1z)=alpha + beta(z) 


The threshold parameter for a binary variable is the same as the logit intercept with a sign change. The comparison category is 0 for the binary variable. 


Part 1: Input: Hello, I’m trying to run a multigroup LCA analysis—my groups are designated as known classes, g, and I want to estimate five classes. I want to regress each of the classes onto a simple dichotomous education covariate and analyse group specific results—I think this is correct? Input: classes= g(2) c(5); knownclass= g (group=0 group=1); Analysis: Type = mixture ; Starts= 25 25; Stseed=54321; Model: %overall% c on eduts; c on g; Model g: %g#1% c on eduts 


Looks good. 


Hi, I am trying to assess the influence of a latent class variable c on distal outcomes “AGG_PHYS” and “AGG_MENT”. I used the following syntax: DATA: FILE IS C:\Users\Owis Eilayyan\Desktop\PhD\Scoring\Dec2017\LCA\DATA\LBP_Datav6A.dat; VARIABLE: NAMES ARE ID red age gender marital children educ empl social Ethnicity hand AGG_PHYS AGG_MENT PainS PainInt ODI HADS_D HADS_A PHQ PF RP BP GH VT RE SF MH Effic FABQph FABQw KeelT KeelS; usevariables are AGG_PHYS AGG_MENT PainS PainInt ODI HADS_D HADS_A Effic FABQph FABQw; missing = .; CLASSES = c (3); AUXILIARY = AGG_PHYS (DU3STEP) AGG_MENT (DU3STEP); ANALYSIS: TYPE = MIXTURE; MODEL: %OVERALL% AGG_PHYS ON C; AGG_MENT ON C; PLOT: TYPE = PLOT3; OUTPUT: TECH1 TECH8 TECH10 TECH11 TECH14; However, I got this error message: “Unknown variable(s) in an ON statement: AGG_PHYS”. How can I run a regression analysis with distal outcomes using DU3STEP command? Thank you, Owis 


You don't say "...ON C" in Mplus, just like you don't regress anything on a nominal variable. for correct use of 3step with a distal outcome, see the 2 papers on our website: Asparouhov, T. & Muthén, B. (2014). Auxiliary variables in mixture modeling: Threestep approaches using Mplus. Structural Equation Modeling: A Multidisciplinary Journal, 21:3, 329341. The posted version corrects several typos in the published version. An earlier version of this paper was posted as web note 15. (Download appendices with Mplus scripts). Asparouhov, T. & Muthén, B. (2014). Auxiliary variables in mixture modeling: Using the BCH method in Mplus to estimate a distal outcome model and an arbitrary second model. Web note 21. 


Thank you Dr. Muthen, Owis 

Lan Luo posted on Thursday, August 23, 2018  7:07 pm



Hello, I also encountered the same problem.Can I get some advice? I want to evaluate the relationship between a binary dependent variable and 5 latent classes + some other independent variables. I have read the paper: Asparouhov, T. & Muthén, B. (2014). Auxiliary variables in mixture modeling: Using the BCH method in Mplus to estimate a distal outcome model and an arbitrary second model. Web note 21. However, this paper mentions that BCH is executed when the dependent is a continuous variable. I also have read the paper: Asparouhov, T. & Muthén, B. (2014) Auxiliary variables in mixture modeling: Threestep approaches using Mplus. Structural Equation Modeling: A Multidisciplinary Journal, 21:3, 329341. However, this paper only provides an example for the linear regression. Is there any methods to suit my case? In my case, the dependent variable is binary (death or not), 5 latent classes, and 5 independent variales (agegrp, gender, insurance, seifa, single). Your help will be greatly appreciated! Lan 


See Table 6 of Web Note 21 which recommends the DCAT auxiliary setting for a categorical distal outcome. 

Lan Luo posted on Sunday, August 26, 2018  4:57 pm



Hi Bengt, Thank you for your advice! Yes, the DCAT auxiliary setting is recommended for the categorical distal outcome. I tried the following syntax: Variable: names = death single female agegrp finyear icu priv char seifa rural dis1dis11; categorical = dis1dis11; usevariables = dis1dis11; classes = c(5); Auxiliary = death(DCAT); Analysis: Type = mixture; starts = 200 50; However, I found it can only evaluate the relationship between the outcome(death) and the latent class variable. Can you advise me how to bring other covariates(single female agegrp finyear icu priv char seifa rural) into the regression as independent variables? In my case, latent variable and other covariates(single female agegrp finyear icu priv char seifa rural) are expected as independent variables. Thanks a lot! Lan 


See the manual approach described in Appendix E from http://www.statmodel.com/download/AppendicesOct28.pdf as referred to in the paper on our website: Asparouhov, T. & Muthén, B. (2014). Auxiliary variables in mixture modeling: Threestep approaches using Mplus. Structural Equation Modeling: A Multidisciplinary Journal, 21:3, 329341. The posted version corrects several typos in the published version. An earlier version of this paper was posted as web note 15. (Download appendices with Mplus scripts). 

Lan Luo posted on Monday, August 27, 2018  4:30 pm



Thanks, Bengt O! The manual approach in Appendix E is an example for a linear regression. According to this paper, can I add a setting "categorical = death" in step 3 to define my dependent variable to be binary? So I can get the result for the logistic regression? 

Lan Luo posted on Monday, August 27, 2018  4:32 pm



I tired the following syntax in step 3: Variable: names = dis1dis11 death single female agegrp finyear C1C5 n; usevariables = death single female agegrp finyear n; classes = c(5); categorical = death; nominal = n; Analysis: Type = mixture; Starts = 200 120; Model: %overall% death on single female agegrp finyear; %C#1% [N#1@1.727]; [N#2@0.533]; [N#3@2.956]; [N#4@0.026]; death on single female agegrp finyear; %C#2% [N#1@1.302]; [N#2@1.813]; [N#3@0.841]; [N#4@11.541]; death on single female agegrp finyear; ..... death on single female agegrp finyear; %C#5% [N#1@2.528]; [N#2@1.863]; [N#3@2.970]; [N#4@3.208]; death on single female agegrp finyear; 

Lan Luo posted on Monday, August 27, 2018  4:33 pm



If such a syntax is feasible for my case? Your help will be greatly appreciated! Thanks again! Lan 


Send your outputs from the first and last step to Support along with your license number. 

Daniel Lee posted on Thursday, June 06, 2019  9:55 am



Hi, I am trying to conduct a mixture model for a very simple regression with a categorical dependent variable (i.e., identifying subgroups for Y on X). My code is not working and I was wondering if you could help me revise it. Thank you so much, as always: Model: %OVERALL% NMENT4 on ORG_REL4; %C#1% NMENT4 on ORG_REL4; NMENT4; [NMENT4$2*1] (1); %C#2% NMENT4 on ORG_REL4; NMENT4; [NMENT4$2*1] (1); 


We need to see your full output to be able to say  send to Support along with your license number. 

Back to top 