Message/Author 

Lewina Lee posted on Wednesday, October 31, 2012  4:09 pm



Dear Drs. Muthen, I would like to do (1) an LPA of 21 continuous variables, and (2)test class membership in relations to 9 covariates (x's) and 2 distal outcomes (y's). I intend to use the procedure for manually implementing the 3step approach described in M+ webnotes #15 v5. In the 3step approach, given that the measurement model is estimated independent of the auxiliary variables, does it make sense to proceed in the following manner? 1. Do class enumeration in Step 1 (e.g., do Step 1 with 1  8 classes) to identify the best one or two models while specifying AUXILIARY = x1, x2,..x9, y1, y2. 2. Do Step 2 (calculating measurement error for most likely class variable) for the best one or two models identified in class enumeration. 3. Do Step 3 (estimating the auxiliary model while specifying the latent class model with measurement errors obtained in Step 2) for the best one or two models from class enumeration. Can latent class membership (C) be regressed on covariates? Can distal outcomes be regressed on class membership in Step 3? Is it accurate to say that class membership will not shift regardless of modifications in the auxiliary model (e.g., adding/removing covariates and distal outcomes)? Thank you, Lewina 

Lewina Lee posted on Wednesday, October 31, 2012  4:18 pm



One more short question in addition to the above: In M+ webnotes #15 v5 Appendix F, Step 3 of the manuallyimplemented 3step model is specified with STARTS=0, are users supposed to follow that in actual analyses? (In Step 1, the authors noted that STARTS=0 was only specified to retain the order of classes as in the data generation step, and that users should remove that in actual analyses.) Thank you, Lewina 


Regarding your questions "Can latent class membership (C) be regressed on covariates? Can distal outcomes be regressed on class membership in Step 3?"  you can do that automatically using R3STEP and DU3STEP, but you can also do it manually. The class membership will not change due to the auxiliaries. For Step 3, STARTS=0 should be used because the class membership is essentially known. 

Lewina Lee posted on Thursday, November 01, 2012  12:01 pm



Thank you for your quick response, Dr. Muthen. In Step 3, if any of the distal outcomes and/or covariates are binary, do I need to specify that in a "CATEGORICAL ARE x1 x2 y1" statement? 


No, that is not available. They are treated as continuous so for binary distals you will get proportions. 

Lewina Lee posted on Wednesday, November 07, 2012  8:15 pm



Dr. Muthen, Regarding my question on 11/1/2012  12:01pm on using the CATEGORICAL ARE statement on distal outcomes  could you please clarify what you meant by "not available"? I tried Step 3 of the manual 3step approach by specifying: MODEL: %OVERALL% Y on x1 x2 x3; c on x1 x2 x3; %C#1% [N#1@ 5.04]; [N#2@ 2.38]; %C#2% [N#1@ 0.53]; [N#2@ 3.28]; Y on x1 x2 x3; %C#3% [N#1@ 4.42]; [N#2@ 2.64]; Y on x1 x2 x3; I tried it with vs. without the "CATEGORICAL = Y" statement. The regression results (pvalues) for Y on X1X3 are comparable in both cases. I see that I got an intercept for Y in each class when Y was modeled as continuous, as opposed to a threshold. Could you please help me understand why I could not use "CATEGORICAL ARE" with binary distals? What do I need to do to obtain P(distal=1), or odds of distal=1 in one class versus another class? Thank you very much for all your help. Lewina 


All variables on the AUXILIARY list are treated as continuous variables for the AUXILIARY functions whether they are on the CATEGORICAL list or not. 

Lewina Lee posted on Thursday, November 08, 2012  1:42 pm



If I am doing the manual 3step approach, I do not need to use the AUXILIARY= statement according to WebNote 15. (I only need to use AUXILIARY= in the automatic 3 step approach, e.g., using DU3STEP). Does that mean, in Step 3 of the manual 3step approach, I can specify covariates and outcomes with CATEGORICAL= ? Because when I used CATEGORICAL= at Step 3 of the manual 3step approach with a binary outcome, I was able to get an output on "LOGISTIC REGRESSION ODDS RATIO RESULTS." I just want to verify that this is ok. Thank you, Lewina 


Yes, it is. But you should not put any observed exogenous covariates on the CATEGORICAL list. This list is for dependent variables only. 

Lewina Lee posted on Thursday, November 08, 2012  4:32 pm



Thank you for the clarification, Linda! 


I am interested in testing whether means on a set of distal outcomes differ across growth trajectory classes (GMM), controlling for a set of covariates. The covariates have direct effects on growth factor means (class indicators) and the class indicators have direct effects on the outcomes within class (constrained equal across class). I used a onestep approach , but a reviewer suggested a 3step approach. Two questions: (1) Can I test whether *adjusted* means for the distal outcomes differ between classes with a manual 3step approach, and (2) Given the direct effects of covariates on class indicators (and class indicator effects on the distal outcomes) with entropy =.63 (obtained from the 1step final model), would the 1step approach be better suited than the 3step approach based on simulation results in webnote 15. Thanks! 


Scott I don't think it is possible to do a 3step approach for this model because you have a "class indicator effects on the distal outcomes". Since the class indicators are latent variables in stage 3 you cant use them (these latent variables, the growth factors, are measured and created in stage 1 only so they wont be available in stage 3). Tihomir 

cogdev posted on Monday, January 28, 2013  7:24 pm



I would like to use latent class membership from one series of indicators (along with a few other continuous covariates), to predict latent profile membership derived from a separate series of indicators. Clustering independently is theoretically important (separate domains), which is why the 3step procedure is appealing. I can manually run the 3step procedure separately for each latent class analysis (at least up to the 2nd step), to get the misclassification stats for each one. A 4profile/class solution fits best in both cases. Something along the lines of Ex7.14 appears to be close to what I need, except that I have a directed prediction from theory (actually more similar to Ex7.19, with a separate clustering variable instead of the factor), and a number of continuous covariates. So, what type of specification am I dealing with here, and how can I implement it (the auxiliary option doesn't seem designed to support this)? I can imagine that an analysis with categorical misclassification might be probabilitybased/fuzzy, or might need some sort of MCMC sampling? As a fall back, entropy is high (>.90) in both cases, so I guess I could 'hardcode' most likely cluster membership and run something like a multinomial logistic regression with covariates? Any help or direction here would be greatly appreciated. 


With entropy of .9 or greater, you can use most likely class membership. You do not need the 3step procedure. 


We are using the manual 3step method to test predictors in a latent transition analysis with three timepoints. Changing the predictors changes our class sizes, particularly for the third timepoint. Do you know why this is happening? 


Are you following the approach in Web Note 15 shown in Appendices L, M, N, O? 


We have measurement noninvariance, so each LCA is estimated separately. The nominal most likely class variable is obtained from each LCA estimation, without constraining any of the item thresholds. Other than that, we are following the approach in Web Note 15 shown in Appendices L, M, N, and O. 


If you don't constrain item thresholds all bets are off for keeping the same class formation. 


You constrained the item thresholds in your individual LCAs so that they would be the same as the LCAs from the initial LTA, which had measurement invariance. You then used the most likely class variable from the individual LCAs to run a second LTA, this time with predictors. When you change your predictors, do the class sizes change? This is the problem we are experiencing. We don't need to constrain item thresholds in our individual LCAs because they don't need to be the same as the LCAs from an initial LTA, because we don't want measurement invariance. The class sizes are changing considerably when we use the most likely class variable to run an LTA with predictors. We aren't changing the most likely class variable, we're just changing the predictors. Does that make sense? 


I actually meant to refer to fixing the nominal parameters (not the threshold invariance)  I assume you are doing that fixing in the 3rd step. So you are following appendices H, I, J. Those don't have a covariate. You could do your analyses on the Appendix K generated data which include x and follow the HIJ steps to see what happens. I don't think we have explored that. 


That's correct, we're fixing the nominal most likely class variable in the 3rd step. You think we should generate data with measurement invariance and a covariate, estimate LCAs without assuming measurement invariance, and then test the influence of that same covariate on our class solutions? Wouldn't that illuminate the consequences of not assuming measurement invariance when it exists? We would like to test the predictive strength of variables without changing the class solutions, like in r3step. It might be useful to generate data with two covariates and then test each independently to see if the class sizes change. However, we already have data and multiple covariates, and we already know the class sizes are changing. Why would this happen when we've fixed the probability of being in one class versus another? 


On your first question, I think it a useful exercise to make sure that class formation doesn't change in this simple case. On your second question, please send inputs, outputs, data, and license number to support so we can diagnose it. 


Will do! 


I see that you are using multiple imputation data. There seems to be a lot of variation across imputations given those huge SEs. A first step would be to analyze only one of those data sets. Also, which variables are imputed  the x's? So that there is no missing on the latent class indicators and the nominal logits don't vary across imputations. 


I am running a manual 3step approach with auxiliary variables following the web note 15, version 7. I have some missing data on my y variable, and I am using weights. When I run the syntax for step 3, I get the following error message: Invalid symbol in data file: "*" at record #: 2, field #: 44 Below is the code, what am I doing wrong? NAMES ARE (...); MISSING ARE all (9999); USEVARIABLES ARE C1_7FP0 RACE3 SEX 7CONCPT N; CLASSES = c(4); NOMINAL = N; CATEGORICAL ARE RACE3 SEX; WEIGHT IS C1_7FP0; Analysis: TYPE = mixture; ESTIMATOR = MLR; STARTS = 600 120; PROCESSORS = 4(STARTS); Model: %overall% C7CONCPT on RACE3 SEX; %C#1% [N#1@5.010]; [N#2@8.791]; [N#3@0.161]; C7CONCPT on RACE3 SEX; C7CONCPT; %C#2% [N#1@9.378]; [N#2@4.421]; [N#3@0.889]; C7CONCPT on RACE3 SEX; C7CONCPT; %C#3% [N#1@1.031]; [N#2@0.496]; [N#3@4.920]; C7CONCPT on RACE3 SEX; C7CONCPT; %C#4% [N#1@3.593]; [N#2@3.888];[N#3@4.885]; C7CONCPT on RACE3 SEX; C7CONCPT; Many thanks in advance 


Hello, I figured out my previous question, but now I have come across another error message (same syntax as above): *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#2 *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#3 *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#2 *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#3 *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#2 *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#3 *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#2 *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#3 *** ERROR The following MODEL statements are ignored: * Statements in Class 1: [ N#2 ] [ N#3 ] * Statements in Class 2: [ N#2 ] [ N#3 ] * Statements in Class 3: [ N#2 ] [ N#3 ] * Statements in Class 4: [ N#2 ] [ N#3 ] *** ERROR One or more MODEL statements were ignored. These statements may be incorrect or are only supported by ALGORITHM=INTEGRATION. I took the parameters from the Logits for the Classication Probabilities table from step 1. What is producing the error? Thank you 


P.S: When I add ALGORITHM=INTEGRATION I still get the error message. Thank you (and apologies for all the postings). 


In the future, please limit posts to one window. If you require more space it is not an appropriate question for Mplus Discussion. Please send your output and license number to support@statmodel.com. 


Dear Drs. Muthen, I would like to use the 3step approach to examine the effects of some covariates on latent trajectory classes. Because I have censored data, I cannot use the auxiliary function (R3STEP), as it does not run with algorithm = integration. I would like to use the manual 3step approach but I do not know how to obtain the logits for the classification probabilities. In my output I only have the average latent class probabilities. How could I obtain classification probabilities or logits in growth mixture models? Thank you so much! 


All covariates are treated as continuous so this is not a problem. Download Version 7.11 to obtain the values you want. 


Thank you so much! I downloaded version 7.11 and got the logits. 


I am running a model predicting continuous and categorical outcomes using a latent class variable and several covariates based on the manual 3 step procedure. I am freely estimating the means and thresholds of the Dvs for each class. I was wondering whether the means and thresholds of the DVs for each class are estimated values based on the mean value of all covariates in the model as a default or do I have to mean center them. We want to present the estimated means in a table for each class, but are unsure how they should be interpreted. Thanks. 


The means of the covariates have an influence if the covariates have direct effects on the outcomes, in which case you can center them. 


Hi, We are interested in using latent class as a predictor of a distal binary outcome (using the manual version of the 3step approach). Specifically, in step 3 we want to test whether the thresholds for the binary outcome differ between our two classes. As far as we understand, we can do this with a Wald test using the Model Test statement  that's what we did. In our first model, we use only latent class as a predictor. We get the Wald test as requested, but we also get an odds ratio and its corresponding significance test, which is exactly what we want. In our second model, we use latent class and some covariates as predictors of the outcome. When we include the covariates, the odds ratio we are interested in is no longer provided, just the requested Wald test. Would it be accurate to compute an odds ratio ourselves using the covariateadjusted thresholds from the output? Or if not, is there another way we can get this information from Mplus? We would like to be able to present the degree to which the odds ratio changes after taking into account the effects of the covariates on the outcome. Thank you very much! 


For both of your models you can express the odds ratio in Model Constraint using parameter labels from Model. That also gives you SEs so you can get a test. With covariates the odds ratio would be based on only the thresholds as you say. 


Thank you very much for your response, that was very helpful. From that, I have a followup question. I'm noticing that when I use Model Test to test threshold #2 = threshold #1 and compare that result to the "Latent Class Odds Ratio Results" that Mplus outputs automatically, or to the OR result from Model Constraint, I get quite different significance test results. The two OR significance tests are identical, but the test for the difference in thresholds from Model Test is quite different from the test of the OR. The only thought I've had is that the OR is being tested against a null value of 0 and not 1, but I'm not sure. I'm hoping someone can shed some light on my confusion. Thanks! 


You are right that the printed OR significance testing is the usual Mplus ratio: (Est  0)/ SE(Est) With ORs, the relevant ratio is instead (Est1)/SE(Est) so you have to do that by hand. Related to your testing you might be interested in the paper on our website Muthén, B., Brown, C.H., Masyn, K., Jo, B., Khoo, S.T., Yang, C.C., Wang, C.P., Kellam, S., Carlin, J., & Liao, J. (2002). General growth mixture modeling for randomized preventive interventions. Biostatistics, 3, 459475. Section 3.2 deals with thresholds for a binary distal outcome. 


Hi. How do I produce the table, "Logits for the Classification Probabilities for Most Likely Latent Class Membership (Column) by Latent Class (Row)"? 


That is included in the output since I think Version 7. 


Thanks. I have Mplus Version 7.2. My output does not have the table I asked about, logits. It does not have the table, "Classification Probabilities for the Most Likely Latent Class Membership (Row) by Latent Class (Column)". The output does have the table, "Average Latent Class Probabilities..." The output is from nonparametric twolevel mixture models that have categorical latent variables on both the within and between levels. These were demonstrated in Henry & Muthen, 2010, "Multilevel Latent Class Analysis: An Application...", Structural Equation Modeling, vol 17. The output states that Auxiliary= is not available for R3STEP, DCON, etc., when there are latent variables on the between and within levels. The desired model is: (1) use Time 1 indicators to derive latent classes, (2) regress a Time 2 distal outcome on Time 1 class membership, (3) use the Time 1 version of the Time 2 distal outcome as a latent class covariate/auxiliary. Can this be done with the manual 3step method if I have the table of logits (or table of classification probabilities to calculate the logits)? Thanks. 


The 3 step methodology has not been developed and used yet for TYPE=TWOLEVEL MIXTURE. 

Laura posted on Tuesday, April 07, 2015  6:43 am



Hi, It is said previously in this page that "With entropy of .9 or greater, you can use most likely class membership. You do not need the 3step procedure." Does this also apply to situations when analysing the predictors of latent classes in LCGA (using multinomial logistic regression, for example)? Is it possible to use the R3step method with the Mplus version 7, or do I need a newer version for that? I tried out an analysis with the auxiliary(r3step) in LCGA. The results were completely different to those obtained with auxiliary(r), and the standard errors were zeros for many of the predictors. Thank you in advance! 


Q1. Yes. Q2. If you don't get stopped when mentioning the R3STEP option then it is available. But you should always use the latest Mplus version. The tables as the end of our BCH paper explains that aux(r) is superseded by aux(r3step). 

Laura posted on Wednesday, April 08, 2015  9:01 am



Ok, thank you. Is there any reference for using the most likely latent class when the entropy is high (over 0.9)? At least in Clark & Muthen (2009) this was compared to some other methods. 


This reference should do it: Asparouhov, T. & Muthén, B. (2014). Auxiliary variables in mixture modeling: Threestep approaches using Mplus. Structural Equation Modeling: A Multidisciplinary Journal, 21:3, 329341. The posted version corrects several typos in the published version. An earlier version of this paper was posted as web note 15. 

db40 posted on Friday, May 15, 2015  6:16 am



Dear Bengt, might I add to Lauras question. The entropy of my latent class is 0.85. Would you consider this high enough so I dont have to use the 3step procedure? 

Jon Heron posted on Friday, May 15, 2015  11:50 am



Well I have a paper in press showing bias even up to entropy of 0.9 and I'm sure you could simulate the odd bizarre example where some estimates were biased at even higher values of entropy depending on the class distribution. I spent the best part of ten years avoiding the onestep model but I'm quite amenable to it now. 


In many cases we found that 0.85 is probably high enough, but as Jon says you cannot be sure. You should read his new paper. 


Drs. Muthen, I am trying to implement a 3step LTA model to ensure that the latent class variable measurements are not affected by inclusion of covariates in the model. In order to do so, I am using MPlus Web Note 15: Appendix KN and the NylundGibson paper titled "A Latent Transition Mixture Model Using the ThreeStep Specification". I ran the LTA model with measurement invariance and separately calculated the mostlikely class variables N1 and N2 for the Step 1 LCA at each of my two time points separately. However, I do not see Step 1 for the overall LTA in either the web note or the paper. I am not clear on how to obtain the nominal variable N1, N2 thresholds for latent class variables C1 and C2 in the LTA. Would I use the "Logits for the Classification Probabilities for the Most Likely Latent Class Membership" in the output of step 1 for C1 and C2 (Appendix L and M) or will these values change when running the LTA? Thanks, Raghav 


Answer to your paragraph 2: Use appendices L, M, and N. There is no "overall LTA". Answer to your last paragraph: Yes, you would use those logits. 


I am trying to do the 3rd step of the manual 3step approach for a LPA with a distal outcome and I have a couple of questions: (1) I can't figure out the auxiliary model input that I need to include so that I get the same output that I would have gotten using the DU3STEP for a distal outcome (which I can't use because of missing data on covariates having to use integration = montecarlo). (2) If I included covariates of the classes in Step 1, do I still include those covariates in the overall model statement in step 3? Thanks! 


1) Take Appendix E and remove "Y on X" http://statmodel.com/download/AppendicesOct28.pdf 2) No 

Back to top 