Message/Author 


I am trying to use a 6class solution in LCA to predict continuous outcomes (e.g., depression, or t2dep) at Time 2 while controlling for various things at Time 1 (Time 1 depression, or t1dep, sex, age, income, education, and marital status). I found a lot of information about the syntax to predict class membership, but not much about how to use class membership to predict outcomes. When I used the following syntax, I got the ERROR message below it: Classes = c(6); Analysis: Type = Mixture Missing ; ALGORITHM=INTEGRATION; Starts = 500 40; MODEL: %Overall% T2dep on C T1dep rc001re sexrsp rb003red t1age; ERROR: The following MODEL statements are ignored: * Statements in the OVERALL class: T2DEP ON C#1 T2DEP ON C#2 T2DEP ON C#3 T2DEP ON C#4 T2DEP ON C#5 Can you tell me what I am doing wrong? I also tried writing out C#1C#5 in the model statement, but I got the same error message. What is the appropriate syntax for using class to predict an outcome, controlling for various covariates? 


You do not use ON to do this. You allow the means or thresholds of the distal outcome to vary across classes. They are included in the USEVARIABLES statement. See Example 8.6. 


Thanks, Linda. I am looking at Exampl 8.6 however, and it is referring to "GMM with a categorical distal outcome using automatic starting values and random starts"  but my distal outcome is CONTINUOUS (e.g., depression). It is my predictor (class) that is categorical. Am I looking at the wrong example?? 


Also  I don't have a growth mixture model. My classes are based on variables all from Time 1, and I am trying to predict a continuous Time 2 outcome. 


It does not matter whether the distal outcome is continuous or categorical or what the model is. The distal outcome is connected with the categorical latent variable. For a continuous distal outcome, the means vary over classes. For a categorical distal outcome, the thresholds vary across classes. 

Anjali Gupta posted on Saturday, October 17, 2009  10:22 am



Hello, I'm attempting a similar model. I've decided on a 3 class model and would now like to use class membership to predict to a distal predictor (depression); and also use depression to predict to class. However, I do not want depression to be involved with determining class membership. Is this possible? And is there an example of such a model? It's critical that class membership is not based on depression. 


In your first paragraph you mention depression twice: as a DV and as an IV. I assume you mean depression measured at T1 and T2  otherwise you need to clarify. I hear this question often  you have a distal outcome that you want to predict from latent class membership but you don't want the distal to influence your classes. It is interesting that this question never seems to come up in SEM when the latent class variable is replaced by a factor  the factor is also determined in part by the distal. My feeling is that the problem should be reconceptualized  the distal should influence the latent class membership in a first model step. Then, using this model, in a second step a new sample can be considered for which you classify people into the classes not using the distal information, fixing the parameters at the values from the first step. True, you can do the analysis without the distal in the first step and then do the second step. But that first step would not draw on the strength that the latent classes are informed by their relationship to the distal  if you think the distal has different means/probabilities for different classes, why not use that information in forming the classes. Also, you would not have an estimate of how the classes influence the distal. 


Hello, Thank you. Yes, there are 2 depression time points, sorry. I'm new to SEM/MPLUS and haven't used factors/latent variables. However  my motivation to separate the two (distal and class) is my desire to see if the latent class (comprised of economic variables) is related to depression. I don't wish to confound the classes by including depression as an indicator. I'd like to see if people 'more' successful economically have a different relation as to depression compared to people less successful economically. And when you suggest using different samples  I'm unclear if you mean an entirely different set of records (people). I only have my main sample of 632 persons. If at all possible, could you further explain the steps required for both of your suggestions? Thank you. 


You can do LCA on the economic variables only and then (1) fix all those parameters when adding the distal, or (2) get the most likely class membership (if the entropy is big) and let that be an observed variable that predicts the distal. That's one line of thinking. The other line is saying that you can get the economics latent classes more pertinent to predicting depression if you include the depression distal in the modeling. You can for example split your sample and estimate that full model in the first step and then in the second step use the other half of the sample and use the econ items only to form latent classes with parameters fixed at the values from the first run  and see how that latent class membership relates to depression. We teach about similar approaches in "Topic 6" of our short courses  see videos and handouts on our web site. 


Professor Muthén, Reading your very helpful comments above brought a question to my mind. In a standard LCA with ordered categorical outcomes, would one only need to fix the thresholds prior to adding the distal outcomes (and other covariates)? Sincerely, Amir 


Thresholds and also class probabilities, i.e. [c#]. 


Thank you. Where might I read a most basic explanation of how to fix parameters after getting LCA results? I have found some information on growth models with fixed parameters  but I would appreciate a more simple example as I'm not familar with growth curve modeling. 


Hello, If at all possible, I'd appreciate some starting steps to accomplish your suggestions above of: "You can do LCA on the economic variables only and then (1) fix all those parameters when adding the distal, or (2) get the most likely class membership (if the entropy is big) and let that be an observed variable that predicts the distal." From the follow up emails I understand I need to fix both thresholds and probabilities  and, unfortunately, haven't found an example I understand. I 'think' the section of Topic 6 that is related starts on page 142  but I unclear how to translate the results of my LCA into the 2nd step wiith the distal. Thank you, Anjali 


It is a little hard to teach this topic in a short Mplus Discussion format  better to come to our short courses or watch their movies (the Topic 5 mixture movie will shortly be available from our recent Berlin teaching). But let me say a few things: For the simplest approach (2) above, you get the most likely class membership by requesting cprob in the Save option of the Savedata command  see p. 649 of our UG. To fix parameters, see UG ex 7.5 where instead of * you use @. The ex shows the item parameters  you also have to add the class probrelated parameters, for example %overall% [c#1@x c#2@y]; for a 3class solution. 

Jinseok Kim posted on Wednesday, October 21, 2009  4:46 am



Hi Bengt I am conducting LCA with a number of binary class indicators and a continuous distal outcome variable. I've got thresholds for class indicators and means and SE for the distal outcome variable for each class. My question is now whether there is any way to test whether the estimated means of the distal outcome variables differ across the classes. 


You can use MODEL TEST or do loglikelihood difference testing where 2 times the loglikelihood difference is distributed as chisquare. 


Hello, Your previous responses have been most helpful. I've since used 'hard coded' class membership in regressions with distal predictors per the suggestion: "(2) get the most likely class membership (if the entropy is big) and let that be an observed variable that predicts the distal." I have 2 follow up questions: I've learned how to interpret resulting using (using 2 of 3 classes) classes as Dependent variables. Could class be a Independent variable predicting a distal 'outcome'? And how would that be interpreted? Would I use Class as a nominal variable  or test an excluded class by including the 2 remaining classes as independent variables? Second  I can't get my head around the choice between (1) using covariates in class creation  versus (2) creating classes without covariates and including covariates when finding the relations between class membership and distal predictors. I understand this may be an issue specific to the research questions  but any insight would be helpful. Thank you 


Class predicting a distal outcome is seen as the change in the means or thresholds of the distal outcome across classes. There is no ON statement involved. If classes change when you add covariates, it means that there is some model misspecification, for example, the need for direct effects between the covariates and the latent class indicators. 


Hi, I have a question similar to those in this thread but a little more elaborate. I want to use latent classes formed from variables X1X8 as "baseline" covariates in a growth model of Y1Y4, ie growth of Y over 4 time points. Specifically: 1) The "baseline" latent classes are formed from the baseline variables X1X8 only. 2) All individuals are followed up postbaseline with repeated measurements of Y: Y1Y4. I would like to fit a growth model for Y for each individual allowing the intercepts and slopes to depend on the latent classes formed from (1). I would like uncertainty in class formation to be taken into account when when doing the growth model regression, but I definitely do *not* want the Y's to be used in the formation of the latent classes. In my specific example it makes no sense to have class formation using future measurements. (eg classes are "baseline" only) Can this be modelled in MPlus? If so, where should I look for such examples? At worst case I guess I could fix each individual's class and do the second stage regression as if these were fixed in advance, and then try to concoct a posthoc misclassification correction of the effects of each class based on the posterior class probabilities for each individual. I haven't seen this done but I suspect it may have been. Any pointers would be appreciated. Thanks in advance! 


This is a complex topic. In one sense the Y's should contribute to the classes for X1X8 because the Y's are correlated with the X classes so why not use all the information pertaining to the X classes? One approach that might help is to add a latent class variable for the Y's, so having two latent class variables. And then let those two be related. Another approach is to impute m data sets for the X class variable (most likely class) using only X1X8 information. And then do the analyses m times relating X class to the Y's. I think some object to using both X and Y information in forming the X classes because they want to use the X classes for prediction of Y in later stages where only X information is available. If that's the case, I would conjecture that you get a better prediction model when the X classes have been formed based on both X and Y. 


Dear Bengt and Linda Muthen, I have read the above discussion and have a follow up question about fixing class probabilities. I choose the option of estimating latent classes in one half of the sample (i.e. social support at one measurement of friends and parents), then using the other half of the sample to fix the class solution and use this latent class variable to predict a distal outcome (a latent growth curve of social anxiety). you state that: "To fix parameters, see UG ex 7.5 where instead of * you use @. The ex shows the item parameters  you also have to add the class probrelated parameters, for example %overall% [c#1@x c#2@y]; for a 3class solution." What do x and y stand for, exactly? I understand from example 7.5 that I would need to fix the estiamted means and variance for each of the latent classes; but from your response I need to additionally (!) fix the class probabilities. Is that where x and y stand for? How do I calculate/get these? Thanks so much for your help! Maarten van Zalk 


These are the logit values for the intercepts of the categorical latent variable. 

Hyunzee Jung posted on Tuesday, September 02, 2014  2:56 pm



Hi, My 4class latent variable has behaved in a stable manner throughout a series of LCA with covariates. However, when distal outcome variables were introduced, the latent classes changed. I used both DU3STEP (DE3STEP) and DCON. Any insights or inputs to give me some guidance forward will be very much appreciated. Thanks! 


DCON should not change the classes. You may want to send to support with your license number. 

Hyunzee Jung posted on Tuesday, September 02, 2014  4:30 pm



Thanks so much, Bengt. One additional question is whether it makes sense to split my model into twostep process. I have covariates (including a predictor treated as IV), a focal latent class variable, and continuous distal outcome variables. I analyzed the model of covariates and the latent class using the onestep approach. A single run took so much time (over 56 hours). Assuming the latent classes are generated in a stable manner throughout in any situations, I wonder if I can do the analyses, first on covariates to latent class and second on latent class to distal outcome, by separate runs. This is a broad, general question, but if you could give me any advice you can give, I'd very much appreciate it! Thank you again. 


You should be able to do it in steps using DCON. But I don't see why the run would take so long; perhaps you want to send output and data to support along with your license number. 

Hyunzee Jung posted on Tuesday, September 02, 2014  8:06 pm



Thank you again, Bengt. I will send my data and output. Apart from that, would it generally make sense for an analyst to perform analyses in two steps for him or herself as described above? 

Hyunzee Jung posted on Wednesday, September 03, 2014  12:11 am



To give you more information, my IV is a factor (3 indicators) predicting latent classes also have 4 covariates). Further, I am estimating moderating effect of gender. I used “intx  male XWITH f1” and regressed the latent class on ‘intx’ along with other covariates. Q1. The reason that I asked about separate runs for the front (IV, intx, and covariates to Latent class) and the latter (Latent classes to continuous distal outcomes) is because interaction variables for TYPE=MIXTURE are available only if ALGORITHM=INTEGRATION is specified in the ANALYSIS command, however, auxiliary variables with DCON are not available with TYPE=MIXTURE in conjunction with ALGORITHM=INTEGRATION. It seems that my front part of the model and the back part are not compatible each other. My question is, "does it make sense or is it a fine choice to go through two separate runs?" Q2. I am planning to introduce gender as a moderator in the back part of my model as well. Effect of the latent classes on my distal outcomes are hypothesized to differ across gender. Then, interaction variables for TYPE=MIXTURE should come in and ALGORITHM=INTEGRATION should also be specified, which can't be done with DCON. What options would I have to estimate an interaction effect in this latter path? I am posting this because this seems like a more general question than specific to a certain dataset. Thanks so much again! 


Q1. A "manual" approach (as defined in our paper) is fine as long as the class formation doesn't change. Q2. In a manual approach you can use alg=int. 


Hi Muthens, I have read all of the posts in this thread  very helpful. I have a 3profile latent categorical variable created from 4 continuous IVs at time 1. To predict by DV at time 2, I have used the strategy of fixing the LPA parameters to their measurement model solution and then adding in the DV at time 2 as an additional "indicator" of the latent categorical variable. I then used the Wald test to determine if the DV at time 2 means were significantly different across the 3 latent profiles. A reviewer requested I control for the DV at time 1 in my analysis. I am not sure how to go about doing this in Mplus. Would I 1) include the regression of DV at time 2 ON DV at time 1 within every profile? and 2) then test for significantly different DV at time 2 intercepts across profiles? Here is the (abbreviated) syntax associated with my guess, which runs in Mplus; however, I am not sure if the Wald test is answering the reviewer's research question. MODEL: %OVERALL% DV2 on DV1; [DV2]; DV2; %C#1% DV2 on DV1; [DV2] (m1); DV2; %C#2% DV2 on DV1; [DV2] (m2); DV2; %C#3% DV2 on DV1; [DV2] (m3); DV2; MODEL TEST: 0 = m1  m2; 0 = m1  m3; Would greatly appreciate any guidance you can give for how to control for the DV at time 1 in Mplus. 


Have a look at our Mplus Web Note 21 and its Section 3.2. 

Back to top 