Message/Author 

Lewina Lee posted on Wednesday, October 31, 2012  4:09 pm



Dear Drs. Muthen, I would like to do (1) an LPA of 21 continuous variables, and (2)test class membership in relations to 9 covariates (x's) and 2 distal outcomes (y's). I intend to use the procedure for manually implementing the 3step approach described in M+ webnotes #15 v5. In the 3step approach, given that the measurement model is estimated independent of the auxiliary variables, does it make sense to proceed in the following manner? 1. Do class enumeration in Step 1 (e.g., do Step 1 with 1  8 classes) to identify the best one or two models while specifying AUXILIARY = x1, x2,..x9, y1, y2. 2. Do Step 2 (calculating measurement error for most likely class variable) for the best one or two models identified in class enumeration. 3. Do Step 3 (estimating the auxiliary model while specifying the latent class model with measurement errors obtained in Step 2) for the best one or two models from class enumeration. Can latent class membership (C) be regressed on covariates? Can distal outcomes be regressed on class membership in Step 3? Is it accurate to say that class membership will not shift regardless of modifications in the auxiliary model (e.g., adding/removing covariates and distal outcomes)? Thank you, Lewina 

Lewina Lee posted on Wednesday, October 31, 2012  4:18 pm



One more short question in addition to the above: In M+ webnotes #15 v5 Appendix F, Step 3 of the manuallyimplemented 3step model is specified with STARTS=0, are users supposed to follow that in actual analyses? (In Step 1, the authors noted that STARTS=0 was only specified to retain the order of classes as in the data generation step, and that users should remove that in actual analyses.) Thank you, Lewina 


Regarding your questions "Can latent class membership (C) be regressed on covariates? Can distal outcomes be regressed on class membership in Step 3?"  you can do that automatically using R3STEP and DU3STEP, but you can also do it manually. The class membership will not change due to the auxiliaries. For Step 3, STARTS=0 should be used because the class membership is essentially known. 

Lewina Lee posted on Thursday, November 01, 2012  12:01 pm



Thank you for your quick response, Dr. Muthen. In Step 3, if any of the distal outcomes and/or covariates are binary, do I need to specify that in a "CATEGORICAL ARE x1 x2 y1" statement? 


No, that is not available. They are treated as continuous so for binary distals you will get proportions. 

Lewina Lee posted on Wednesday, November 07, 2012  8:15 pm



Dr. Muthen, Regarding my question on 11/1/2012  12:01pm on using the CATEGORICAL ARE statement on distal outcomes  could you please clarify what you meant by "not available"? I tried Step 3 of the manual 3step approach by specifying: MODEL: %OVERALL% Y on x1 x2 x3; c on x1 x2 x3; %C#1% [N#1@ 5.04]; [N#2@ 2.38]; %C#2% [N#1@ 0.53]; [N#2@ 3.28]; Y on x1 x2 x3; %C#3% [N#1@ 4.42]; [N#2@ 2.64]; Y on x1 x2 x3; I tried it with vs. without the "CATEGORICAL = Y" statement. The regression results (pvalues) for Y on X1X3 are comparable in both cases. I see that I got an intercept for Y in each class when Y was modeled as continuous, as opposed to a threshold. Could you please help me understand why I could not use "CATEGORICAL ARE" with binary distals? What do I need to do to obtain P(distal=1), or odds of distal=1 in one class versus another class? Thank you very much for all your help. Lewina 


All variables on the AUXILIARY list are treated as continuous variables for the AUXILIARY functions whether they are on the CATEGORICAL list or not. 

Lewina Lee posted on Thursday, November 08, 2012  1:42 pm



If I am doing the manual 3step approach, I do not need to use the AUXILIARY= statement according to WebNote 15. (I only need to use AUXILIARY= in the automatic 3 step approach, e.g., using DU3STEP). Does that mean, in Step 3 of the manual 3step approach, I can specify covariates and outcomes with CATEGORICAL= ? Because when I used CATEGORICAL= at Step 3 of the manual 3step approach with a binary outcome, I was able to get an output on "LOGISTIC REGRESSION ODDS RATIO RESULTS." I just want to verify that this is ok. Thank you, Lewina 


Yes, it is. But you should not put any observed exogenous covariates on the CATEGORICAL list. This list is for dependent variables only. 

Lewina Lee posted on Thursday, November 08, 2012  4:32 pm



Thank you for the clarification, Linda! 


I am interested in testing whether means on a set of distal outcomes differ across growth trajectory classes (GMM), controlling for a set of covariates. The covariates have direct effects on growth factor means (class indicators) and the class indicators have direct effects on the outcomes within class (constrained equal across class). I used a onestep approach , but a reviewer suggested a 3step approach. Two questions: (1) Can I test whether *adjusted* means for the distal outcomes differ between classes with a manual 3step approach, and (2) Given the direct effects of covariates on class indicators (and class indicator effects on the distal outcomes) with entropy =.63 (obtained from the 1step final model), would the 1step approach be better suited than the 3step approach based on simulation results in webnote 15. Thanks! 


Scott I don't think it is possible to do a 3step approach for this model because you have a "class indicator effects on the distal outcomes". Since the class indicators are latent variables in stage 3 you cant use them (these latent variables, the growth factors, are measured and created in stage 1 only so they wont be available in stage 3). Tihomir 

cogdev posted on Monday, January 28, 2013  7:24 pm



I would like to use latent class membership from one series of indicators (along with a few other continuous covariates), to predict latent profile membership derived from a separate series of indicators. Clustering independently is theoretically important (separate domains), which is why the 3step procedure is appealing. I can manually run the 3step procedure separately for each latent class analysis (at least up to the 2nd step), to get the misclassification stats for each one. A 4profile/class solution fits best in both cases. Something along the lines of Ex7.14 appears to be close to what I need, except that I have a directed prediction from theory (actually more similar to Ex7.19, with a separate clustering variable instead of the factor), and a number of continuous covariates. So, what type of specification am I dealing with here, and how can I implement it (the auxiliary option doesn't seem designed to support this)? I can imagine that an analysis with categorical misclassification might be probabilitybased/fuzzy, or might need some sort of MCMC sampling? As a fall back, entropy is high (>.90) in both cases, so I guess I could 'hardcode' most likely cluster membership and run something like a multinomial logistic regression with covariates? Any help or direction here would be greatly appreciated. 


With entropy of .9 or greater, you can use most likely class membership. You do not need the 3step procedure. 


We are using the manual 3step method to test predictors in a latent transition analysis with three timepoints. Changing the predictors changes our class sizes, particularly for the third timepoint. Do you know why this is happening? 


Are you following the approach in Web Note 15 shown in Appendices L, M, N, O? 


We have measurement noninvariance, so each LCA is estimated separately. The nominal most likely class variable is obtained from each LCA estimation, without constraining any of the item thresholds. Other than that, we are following the approach in Web Note 15 shown in Appendices L, M, N, and O. 


If you don't constrain item thresholds all bets are off for keeping the same class formation. 


You constrained the item thresholds in your individual LCAs so that they would be the same as the LCAs from the initial LTA, which had measurement invariance. You then used the most likely class variable from the individual LCAs to run a second LTA, this time with predictors. When you change your predictors, do the class sizes change? This is the problem we are experiencing. We don't need to constrain item thresholds in our individual LCAs because they don't need to be the same as the LCAs from an initial LTA, because we don't want measurement invariance. The class sizes are changing considerably when we use the most likely class variable to run an LTA with predictors. We aren't changing the most likely class variable, we're just changing the predictors. Does that make sense? 


I actually meant to refer to fixing the nominal parameters (not the threshold invariance)  I assume you are doing that fixing in the 3rd step. So you are following appendices H, I, J. Those don't have a covariate. You could do your analyses on the Appendix K generated data which include x and follow the HIJ steps to see what happens. I don't think we have explored that. 


That's correct, we're fixing the nominal most likely class variable in the 3rd step. You think we should generate data with measurement invariance and a covariate, estimate LCAs without assuming measurement invariance, and then test the influence of that same covariate on our class solutions? Wouldn't that illuminate the consequences of not assuming measurement invariance when it exists? We would like to test the predictive strength of variables without changing the class solutions, like in r3step. It might be useful to generate data with two covariates and then test each independently to see if the class sizes change. However, we already have data and multiple covariates, and we already know the class sizes are changing. Why would this happen when we've fixed the probability of being in one class versus another? 


On your first question, I think it a useful exercise to make sure that class formation doesn't change in this simple case. On your second question, please send inputs, outputs, data, and license number to support so we can diagnose it. 


Will do! 


I see that you are using multiple imputation data. There seems to be a lot of variation across imputations given those huge SEs. A first step would be to analyze only one of those data sets. Also, which variables are imputed  the x's? So that there is no missing on the latent class indicators and the nominal logits don't vary across imputations. 


I am running a manual 3step approach with auxiliary variables following the web note 15, version 7. I have some missing data on my y variable, and I am using weights. When I run the syntax for step 3, I get the following error message: Invalid symbol in data file: "*" at record #: 2, field #: 44 Below is the code, what am I doing wrong? NAMES ARE (...); MISSING ARE all (9999); USEVARIABLES ARE C1_7FP0 RACE3 SEX 7CONCPT N; CLASSES = c(4); NOMINAL = N; CATEGORICAL ARE RACE3 SEX; WEIGHT IS C1_7FP0; Analysis: TYPE = mixture; ESTIMATOR = MLR; STARTS = 600 120; PROCESSORS = 4(STARTS); Model: %overall% C7CONCPT on RACE3 SEX; %C#1% [N#1@5.010]; [N#2@8.791]; [N#3@0.161]; C7CONCPT on RACE3 SEX; C7CONCPT; %C#2% [N#1@9.378]; [N#2@4.421]; [N#3@0.889]; C7CONCPT on RACE3 SEX; C7CONCPT; %C#3% [N#1@1.031]; [N#2@0.496]; [N#3@4.920]; C7CONCPT on RACE3 SEX; C7CONCPT; %C#4% [N#1@3.593]; [N#2@3.888];[N#3@4.885]; C7CONCPT on RACE3 SEX; C7CONCPT; Many thanks in advance 


Hello, I figured out my previous question, but now I have come across another error message (same syntax as above): *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#2 *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#3 *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#2 *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#3 *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#2 *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#3 *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#2 *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#3 *** ERROR The following MODEL statements are ignored: * Statements in Class 1: [ N#2 ] [ N#3 ] * Statements in Class 2: [ N#2 ] [ N#3 ] * Statements in Class 3: [ N#2 ] [ N#3 ] * Statements in Class 4: [ N#2 ] [ N#3 ] *** ERROR One or more MODEL statements were ignored. These statements may be incorrect or are only supported by ALGORITHM=INTEGRATION. I took the parameters from the Logits for the Classication Probabilities table from step 1. What is producing the error? Thank you 


P.S: When I add ALGORITHM=INTEGRATION I still get the error message. Thank you (and apologies for all the postings). 


In the future, please limit posts to one window. If you require more space it is not an appropriate question for Mplus Discussion. Please send your output and license number to support@statmodel.com. 


Dear Drs. Muthen, I would like to use the 3step approach to examine the effects of some covariates on latent trajectory classes. Because I have censored data, I cannot use the auxiliary function (R3STEP), as it does not run with algorithm = integration. I would like to use the manual 3step approach but I do not know how to obtain the logits for the classification probabilities. In my output I only have the average latent class probabilities. How could I obtain classification probabilities or logits in growth mixture models? Thank you so much! 


All covariates are treated as continuous so this is not a problem. Download Version 7.11 to obtain the values you want. 


Thank you so much! I downloaded version 7.11 and got the logits. 


I am running a model predicting continuous and categorical outcomes using a latent class variable and several covariates based on the manual 3 step procedure. I am freely estimating the means and thresholds of the Dvs for each class. I was wondering whether the means and thresholds of the DVs for each class are estimated values based on the mean value of all covariates in the model as a default or do I have to mean center them. We want to present the estimated means in a table for each class, but are unsure how they should be interpreted. Thanks. 


The means of the covariates have an influence if the covariates have direct effects on the outcomes, in which case you can center them. 


Hi, We are interested in using latent class as a predictor of a distal binary outcome (using the manual version of the 3step approach). Specifically, in step 3 we want to test whether the thresholds for the binary outcome differ between our two classes. As far as we understand, we can do this with a Wald test using the Model Test statement  that's what we did. In our first model, we use only latent class as a predictor. We get the Wald test as requested, but we also get an odds ratio and its corresponding significance test, which is exactly what we want. In our second model, we use latent class and some covariates as predictors of the outcome. When we include the covariates, the odds ratio we are interested in is no longer provided, just the requested Wald test. Would it be accurate to compute an odds ratio ourselves using the covariateadjusted thresholds from the output? Or if not, is there another way we can get this information from Mplus? We would like to be able to present the degree to which the odds ratio changes after taking into account the effects of the covariates on the outcome. Thank you very much! 


For both of your models you can express the odds ratio in Model Constraint using parameter labels from Model. That also gives you SEs so you can get a test. With covariates the odds ratio would be based on only the thresholds as you say. 


Thank you very much for your response, that was very helpful. From that, I have a followup question. I'm noticing that when I use Model Test to test threshold #2 = threshold #1 and compare that result to the "Latent Class Odds Ratio Results" that Mplus outputs automatically, or to the OR result from Model Constraint, I get quite different significance test results. The two OR significance tests are identical, but the test for the difference in thresholds from Model Test is quite different from the test of the OR. The only thought I've had is that the OR is being tested against a null value of 0 and not 1, but I'm not sure. I'm hoping someone can shed some light on my confusion. Thanks! 


You are right that the printed OR significance testing is the usual Mplus ratio: (Est  0)/ SE(Est) With ORs, the relevant ratio is instead (Est1)/SE(Est) so you have to do that by hand. Related to your testing you might be interested in the paper on our website Muthén, B., Brown, C.H., Masyn, K., Jo, B., Khoo, S.T., Yang, C.C., Wang, C.P., Kellam, S., Carlin, J., & Liao, J. (2002). General growth mixture modeling for randomized preventive interventions. Biostatistics, 3, 459475. Section 3.2 deals with thresholds for a binary distal outcome. 


Hi. How do I produce the table, "Logits for the Classification Probabilities for Most Likely Latent Class Membership (Column) by Latent Class (Row)"? 


That is included in the output since I think Version 7. 


Thanks. I have Mplus Version 7.2. My output does not have the table I asked about, logits. It does not have the table, "Classification Probabilities for the Most Likely Latent Class Membership (Row) by Latent Class (Column)". The output does have the table, "Average Latent Class Probabilities..." The output is from nonparametric twolevel mixture models that have categorical latent variables on both the within and between levels. These were demonstrated in Henry & Muthen, 2010, "Multilevel Latent Class Analysis: An Application...", Structural Equation Modeling, vol 17. The output states that Auxiliary= is not available for R3STEP, DCON, etc., when there are latent variables on the between and within levels. The desired model is: (1) use Time 1 indicators to derive latent classes, (2) regress a Time 2 distal outcome on Time 1 class membership, (3) use the Time 1 version of the Time 2 distal outcome as a latent class covariate/auxiliary. Can this be done with the manual 3step method if I have the table of logits (or table of classification probabilities to calculate the logits)? Thanks. 


The 3 step methodology has not been developed and used yet for TYPE=TWOLEVEL MIXTURE. 

Laura posted on Tuesday, April 07, 2015  6:43 am



Hi, It is said previously in this page that "With entropy of .9 or greater, you can use most likely class membership. You do not need the 3step procedure." Does this also apply to situations when analysing the predictors of latent classes in LCGA (using multinomial logistic regression, for example)? Is it possible to use the R3step method with the Mplus version 7, or do I need a newer version for that? I tried out an analysis with the auxiliary(r3step) in LCGA. The results were completely different to those obtained with auxiliary(r), and the standard errors were zeros for many of the predictors. Thank you in advance! 


Q1. Yes. Q2. If you don't get stopped when mentioning the R3STEP option then it is available. But you should always use the latest Mplus version. The tables as the end of our BCH paper explains that aux(r) is superseded by aux(r3step). 

Laura posted on Wednesday, April 08, 2015  9:01 am



Ok, thank you. Is there any reference for using the most likely latent class when the entropy is high (over 0.9)? At least in Clark & Muthen (2009) this was compared to some other methods. 


This reference should do it: Asparouhov, T. & Muthén, B. (2014). Auxiliary variables in mixture modeling: Threestep approaches using Mplus. Structural Equation Modeling: A Multidisciplinary Journal, 21:3, 329341. The posted version corrects several typos in the published version. An earlier version of this paper was posted as web note 15. 

db40 posted on Friday, May 15, 2015  6:16 am



Dear Bengt, might I add to Lauras question. The entropy of my latent class is 0.85. Would you consider this high enough so I dont have to use the 3step procedure? 

Jon Heron posted on Friday, May 15, 2015  11:50 am



Well I have a paper in press showing bias even up to entropy of 0.9 and I'm sure you could simulate the odd bizarre example where some estimates were biased at even higher values of entropy depending on the class distribution. I spent the best part of ten years avoiding the onestep model but I'm quite amenable to it now. 


In many cases we found that 0.85 is probably high enough, but as Jon says you cannot be sure. You should read his new paper. 


Drs. Muthen, I am trying to implement a 3step LTA model to ensure that the latent class variable measurements are not affected by inclusion of covariates in the model. In order to do so, I am using MPlus Web Note 15: Appendix KN and the NylundGibson paper titled "A Latent Transition Mixture Model Using the ThreeStep Specification". I ran the LTA model with measurement invariance and separately calculated the mostlikely class variables N1 and N2 for the Step 1 LCA at each of my two time points separately. However, I do not see Step 1 for the overall LTA in either the web note or the paper. I am not clear on how to obtain the nominal variable N1, N2 thresholds for latent class variables C1 and C2 in the LTA. Would I use the "Logits for the Classification Probabilities for the Most Likely Latent Class Membership" in the output of step 1 for C1 and C2 (Appendix L and M) or will these values change when running the LTA? Thanks, Raghav 


Answer to your paragraph 2: Use appendices L, M, and N. There is no "overall LTA". Answer to your last paragraph: Yes, you would use those logits. 


I am trying to do the 3rd step of the manual 3step approach for a LPA with a distal outcome and I have a couple of questions: (1) I can't figure out the auxiliary model input that I need to include so that I get the same output that I would have gotten using the DU3STEP for a distal outcome (which I can't use because of missing data on covariates having to use integration = montecarlo). (2) If I included covariates of the classes in Step 1, do I still include those covariates in the overall model statement in step 3? Thanks! 


1) Take Appendix E and remove "Y on X" http://statmodel.com/download/AppendicesOct28.pdf 2) No 

John Woo posted on Wednesday, August 12, 2015  12:38 pm



Hi, adding to Laura and db40 questions, does the average latent class probabilities for most likely latent class membership also matter? If entropy is below 0.9 but the average probabilities for the most likely are above 0.9 for all classes (i.e., the diagonals), can I still consider skipping the 3step approach? Thank you. 


Probably. 

Jon Heron posted on Thursday, August 13, 2015  7:58 am



My paper is still not published, but I found the offdiagonal elements of the Dmatrix (Mplus' second classification matrix) to be important. Setting entropy aside, if you have three classes and the offdiagonal elements of D relating to classes 1 & 2 are effectively zero (i.e. elements [1,2] and [2,1]) whilst those for classes 2 & 3 are not, then a parameter such a covariate effect comparing classes 1 and 2 (e.g. risk of class 2 relative to class 1) would be less biased than the same effect for class 3 relative to class 2. Bottom line (1) entropy and this "class separation" are both important (2) it's easier to do a onestep or new 3step than work out whether a simpler approach is adequate. 

John Woo posted on Wednesday, August 19, 2015  11:50 am



Hi, if i understand correctly, one of the rationales for 3step approach (or even 1step approach) is the idea that latent class formation is independent of the influences of the covariates. [I am leaving aside the matter of distal outcome.] In case of GMM, where covariates have potential paths to both the growth factors and latent class structure, does this 3step rationale apply only to the paths towards the latent class structure and not the growth factors? That is, when I run 3step approach, is it consistent with the 3step rationale that I include the covariates predicting the growth factors? I am a bit confused because it seems class formation can be influenced by growth factors, and, if the covariates influence the growth factors, then they indirectly influence the class formation as well? Thank you in advance. 


In GMM with effects from covariates on not only c but also i and s, you shouldn't use the R3STEP approach because your model has direct effects beyond those on c. 

Ali posted on Friday, March 11, 2016  2:40 am



I used the 3step estimation. First, I estimated LCA, and I got 3 classes. Second, I fixed the log ratio to the parameters [N#1][N#2]in the class 1,2,and 3.The third step I the Linear regression auxiliary model which is almost the same as the web note 15(page.15), but I have two predictors(extrinsic and intrinsic motivation) and five plausible values on math achievement(y). So, I run the 3rd step separately in terms of intrinsic and extrinsic motivation . The slopes are significant on all classes, no matter when I used intrinsic motivation or extrinsic motivation as a predictor. Later, I put the extrinsic and intrinsic motivation as predictors(the correlation between the predictors is 0.67), and it turned out that the extrinsic and intrinsic motivation are not significant on class 2. So, I am wondering that it is better to run the model with one predictor or two predictors. 


You can use the likelihood ratio test to decide this. 

Ali posted on Thursday, March 17, 2016  3:19 am



But, I have five plausible values for the dependent variable, so I got the mean of likelihood across five plausible values.So, maybe the likelihood ratio test could not work to compare the model with one predictor or two predictors. Is possible to compare models with five plausible values ? 


Because of the plausible values LRT is much more complicated and not available in Mplus. You can use Model Test though (include both predictors and test various combinations of coefficients). 

Ali posted on Wednesday, April 13, 2016  12:53 am



Hello, I posted the post on March 11th. I am using the LCA 3step estimation manually. I have two predictors to predict the five plausible values in students' math score. But, I had missing data on two predictors due to missing at random . I tried to do imputation on two predictors, but it seems that I can not do it, because I typed"TYPE=Imputation" in the DATA command for the five plausible values.After I run it, I still had around 12000 sample size,but the sample size decreased around 50%. Is it any way to deal with the missing data on the predictors? 


Run the 5 data sets one at a time and combine the results using the usual Imputation rules. see page 5 http://www.ats.ucla.edu/stat/sas/library/multipleimputation.pdf 

Ali posted on Thursday, April 14, 2016  12:06 am



Do you mean that I impute the missing data on the predictors for five times at the first time? For example, the first data with the first plausible values, so I impute the missing data on the predictors. And, do it five time.Then, run the five data sets in the same time? 


Yes  it is standard MI. You would have to combine the results from the 5 runs manually as described in this Section "Combining Inferences from Imputed Data Sets" in the above link. 


Hi, I need the odds ratio and confidence intervals from the latent variable multinominal logistic regression using the 3step procedure (R3STEP). I could calculate this manually by exponating the estimate, but I still need the confidence intervals. When I add CINTERVAL to the output: statement I only get the ci for "Model Results", "The probability scale" and for the "Latent Class Odds Ratio", but none from the multinominal logistic regression analysis. Is there a way to get the odds ratio and confidence intervals using the AUXILARY R3STEP procedure? 

Jon Heron posted on Friday, April 22, 2016  5:46 am



Just derive the confidence interval for the logodds and exponentiate that too OR = exp(Estimate) lower bound = exp(Estimate  1.96*S.E) upper bound = exp(Estimate + 1.96*S.E) 


See also our FAQs on odds ratios. 

Megan Ames posted on Monday, July 25, 2016  12:42 pm



Hello Drs. Muthen, I am attempting to run a manual 3step latent class growth model with several distal outcomes and covariates. I am confused by how to get output regarding individual pairwise comparisons between the 5 classes. Below is my attempt at the syntax I would like to be able to compare each of the class means for the distal outcome (dep13). Also, is there a way to include more than one outcome in the model? Model: %OVERALL% dep13 on sexw1 phhhc dep3; %C#1% [N#1@2.602];... dep13 (m1); %C#2% [N#1@1.274];... dep13 (m2); %C#3% [N#1@1.828];... dep13 (m3); %C#4% [N#1@1.606];... dep13 (m4); %C#5% [N#1@2.751];... MODEL TEST: m1=m2; m1=m3; m1=m4; m1=m5; etc... 


You can use Model Constraint to express any difference you like, such as pdiff = p1  p2; The Model Test approach you show tests all equalities at once. You can have several distals but they are done one at a time. 

Megan Ames posted on Friday, July 29, 2016  12:44 pm



Thank you for your quick reply. I am still unclear how/where the pairwise comparisons are called for? In the onestep approach, chisquare statistics for each of the class mean comparisons is provided at the end of the output; however, in the manual threestep approach this output is not provided. I attempted the following syntax without success: USEVAR ARE dep13 sexw1 phhhc dep3 n; MISSING = all (999); IDVARIABLE = responid; classes = c(5); nominal = n; Analysis: TYPE = MIXTURE; estimator = MLR; ALGORITHM=INTEGRATION; Model: %OVERALL% dep13 on sexw1 phhhc dep3; %C#1% [N#1@2.602]; etc. dep13; [dep13] (p1); %C#2% [N#1@1.274]; etc. dep13; [dep13] (p2); %C#3% [N#1@1.828]; etc. dep13; [dep13] (p3); %C#4% [N#1@1.606]; etc. dep13; [dep13] (p4); %C#5% [N#1@2.751]; etc. dep13; [dep13] (p5); Model constraint: p1=p2; p1=p3; p1=p4; etc.... 


Send your output and license number to Support. 

Yajing Zhu posted on Monday, August 01, 2016  7:36 am



Dear Profs, I have 2 questions regarding the 3step approach with 1 continuous distal outcome , using modal assignment. 1. In step3,essentially only the relationship between modal class and latent class is fixed, it is essentially again a latent class model with one loading fixed. How come the latent class proportion remain unchanged in step3? (it is not fixed at values in previous steps) 2.Is step3 also subject to the assumption of conditional independence between the distal outcome and the modal class? (it seems so to me!) Thanks for your support Yajing 


1. Failure to maintain the latent class proportions are discussed in Section 7.1 of Web Note 15. See also the better BCH method in Web Note 21. 2. Yes. 

Yajing Zhu posted on Tuesday, August 02, 2016  8:37 am



Thanks, Prof Muthen for confirming this. Following my last post item 1, I therefore wonder why in Step3, you do not fix P(C=latent class) to be used in step 3, but rather, using estimates from step 1 as starting values (Is this what you suggested in webnote 15 when you discuss the difference between automatic and manual way?)? If this is implemented in the manual step3, class proportions shall be maintained, is it? [I am saying this as in step 1, we can calculate P(CM=modal class) and we have P(M) as M is observed, can't we use empirical distribution of M and use bays total prob. theorem to get P(C)] Please correct me if I am wrong somewhere. Thanks for your support Best wishes, Yajing 


The 3step method tries to classify each subject in the correct way. It is not enough that the class sizes are unchanged, the subjects occupying the classes also need to be unchanged. 

Fan Xizhen posted on Tuesday, September 27, 2016  7:58 pm



Dear Drs. Muthen, I am attempting to run a 3step latent profile model with several distal outcomes using R3STEP.I got the output of "EQUALITY TESTS OF MEANS ACROSS CLASSES". But I am confused why it provide Chisquare instead of F value, so is it the same with F value of ANOVA? Are "EQUALITY TESTS" here similar to ANOVA that can be used to conduct pairwise comparisons? Thank you! 


Q1. Yes, they are analogous. Q2. Yes. 

Fan Xizhen posted on Wednesday, September 28, 2016  5:27 pm



Thank you for your quick answer, Prof Muthen. 


Hello, I am new to Mplus and attempting to perform a LCGA and then relating predictor variables to the latent classes using 3step method. I tried to use the syntax for the automatic 3step method. Part of my syntax for the "step 1": VARIABLE: NAMES ARE numero EDS1EDS7; USEVAR = EDS1EDS7; CENSORED = EDS1EDS7(b); MISSING = all (999); CLASSES = c(3); ANALYSIS: TYPE = MIXTURE; ALGORITHM=INTEGRATION; STARTS = 10 2; STITERATIONS = 10; MODEL: %OVERALL% i s q  EDS1@0 EDS2@1 EDS3@2 EDS4@3 EDS5@4 EDS6@5 EDS7@6; iq@0; Part of my syntax for the automatic 3step: VARIABLE: NAMES ARE numero ocpd EDS1EDS7; USEVAR = EDS1EDS7; CENSORED = EDS1EDS7(b); MISSING = all (999); CLASSES = c(3); AUXILIARY = ocpd (R3STEP); ANALYSIS: TYPE = MIXTURE; I can't find the model that I found before, while I was performing my LCGA. The classes change (mean scores and class sizes), whereas I thought the advantage of a 3step method was that the classes wouldn't change anymore? 


In addition: Maybe it is better to use the manual 3step method? However, I don't quite understand the lower part of the syntax provided in the M+webnotes: %c#1% [n#1@1.901]; [n#2@0.990]; %c#2% [n#1@0.486]; [n#2@1.936]; %c#3% [n#1@2.100]; [n#2@2.147]; What are these numbers, e.g. 1.901 and 0.990? I think I am doing something wrong, I'm just not sure what it is. Thank you in advance. 


You want to send your 2 outputs to Support along with your license number: Step 1 and the R3Step run. 


I'm trying to run a twolevel multinomial logistic regression on latent class assignments, as the third step in the threestep procedure. I saw a post above in 2015 saying it hasn't been developed. Can I ask if it has been developed now? Thanks! 


No 3step for 2level yet. 

Chris Giebe posted on Sunday, August 27, 2017  7:12 am



Hello, I'm still fairly new to Mplus and LCA and may need some help with terminology. I'm using the manual BCH method of estimating LCA with 4 distal outcomes, constant slopes across 4 classes, using a Wald Test to test mean equivalence on my outcomes, and controling for one variable (covariate). I have gone through the enoumeration process without covariates no problem. Both steps in the manual BCH seem to be working fine as well. However, when I look at the class specific means in the output for each of the 4 distal outcome variables, I'm unsure if the classes are still the same as they were for the previous (i.e. if c#1 is still c#1, ect.) I've read something about using SVALUES to fix classes? I'm not even sure what that means? I've seen people post fragments of syntax containing [N@0.2232] or whatever number, and it seems like it's what I need... but I'm neither sure of that, nor how to get what I need from the Mplus output. I'd like for my c#1c#4 to be the same for all 4 distal outcomes in step 3. Thanks for your help. 


Dear Dr. Muthén, When I run the analysis for the third step, I receive an error for a 4 class GMM model with a binary outcome variable. I would like to test the differences between variable schoprob (0 or 1) over the 4 classes while controlling for covariates. I receive the following error messages: THE ESTIMATED COVARIANCE MATRIX FOR THE Y VARIABLES IN CLASS 1 COULD NOT BE INVERTED. PROBLEM INVOLVING VARIABLE SCHOPROB. COMPUTATION COULD NOT BE COMPLETED IN ITERATION 5. CHANGE YOUR MODEL AND/OR STARTING VALUES. THIS MAY BE DUE TO A ZERO ESTIMATED VARIANCE, THAT IS, NO WITHINCLASS VARIATION FOR THE VARIABLE. THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ERROR IN THE COMPUTATION. CHANGE YOUR MODEL AND/OR STARTING VALUES. When I look at the proportions of schoprob across the four classes, the counts are fairly distributed: Scho prob 0 1 1 2333 1012 2 117 130 3 75 64 4 78 48 I would state that the variable has within class variation. The starting values are zero as specified for this step. Do you have any other ideas why the model does not run and how I could make it work? Sincerely, Jasmijn 


Please send to Support along with your license number. 

DavidBoyda posted on Tuesday, February 27, 2018  8:35 am



Dear Professor, Iam conducting a linear regression auxiliary model but have 7 covariates and have regressed both of my distal outcomes on the 7 covs (In the model command and under each class (as per appendix F of the 3step approach) The estimation process is taking a long time. is there anyways I can speed this up? Can I use the OPTSEED command in step 3 after step 1, if I have ascertained the optimal class during step 1? kind regards, D 


Perhaps you don't use Starts=0. Send your step 3 output to Support so we can see what's going on; include your license number. 

DavidBoyda posted on Monday, March 05, 2018  1:05 pm



Hi, are the thresholds estimates in between classes and distals standardised or unstandardised? If they are un, is there a way to covnert them? I have request standardised estimates. Class 1 Thresholds Y1$1 2.149 0.293 7.327 0.000 


They are unstandardized unless they are printed in the standardized section. If they don't appear in the standardized section Mplus doesn't provide them in this particular case. You can standardize them using the y* mean and variance. 

DavidBoyda posted on Monday, March 05, 2018  10:02 pm



Hi, aplogogies, actually except age, these are all binary, so I am supposed to use stdY. I will convert age to StdYX. thanks. 


I have a question about estimated means of covariates using the manual 3step approach. I am using the manual 3step method to look at the effect of some covariates on class membership. There are four latent classes. My code for the final step looks like: Model %overall% c#1 on x1 ; c#1 on x2 ; c#1 on x3 ; c#2 on x1 ; c#2 on x2 ; c#2 on x3 ; c#3 on x1 ; c#3 on x2 ; c#3 on x3 ; %c#1% [n#1@4.889] ; etc. I am interested in presenting the estimated mean of my key covariate x1 in each latent class, adjusted for the other covariates in the model. Tech4 provides modelestimated means and standard errors for the covariates in each class under ‘estimated means for the latent variables’– I am wondering (1) whether these values are appropriate to use as modelestimated means of covariates in each class? And (2) how these means and SEs in tech4 are calculated? 


(1) Yes (2) They are estimated from the H1 model ML information matrix where the sample size for each class is estimated from the mixture model and the sample mean and variance are posterior probability weighted. 

Anna Austin posted on Tuesday, September 18, 2018  8:25 am



Hello! I am conducting the 3step approach manually to examine predictors in a 3 class LCGA model. Is the below code the correct code for the MODEL statement in the third step? Is there anything else I need to include? Thank you! MODEL: %OVERALL% C on x1 x2 x3 x4; !x14 are predictors of interest, C is class; %c#1% [ModalC#1@5.824 ModalC#2@2.519]; !ModalC is modal class assignment, here fixed logits based on output from step 1 to account for classification uncertainty; %c#2% [ModalC#1@1.168 ModalC#2@3.009]; !ModalC is modal class assignment, here fixed logits based on output from step 1 to account for classification uncertainty; %c#3% [ModalC#1@0.397 ModalC#2@0.010]; !ModalC is modal class assignment, here fixed logits based on output from step 1 to account for classification uncertainty; 


See Web Note 15's Appendix with Mplus scripts, especially Appendix A at http://www.statmodel.com/download/AppendicesOct28.pdf 

Anna Austin posted on Wednesday, September 19, 2018  10:25 am



Thank you! An additional question: In the output from step 3 of the 3 step approach, there is output labeled "LOGISTIC REGRESSION ODDS RATIO RESULTS" and output labeled "ALTERNATIVE PARAMETERIZATIONS FOR THE CATEGORICAL LATENT VARIABLE REGRESSION". Which is the appropriate output to use to for reporting odds ratios of covariates associated with the latent classes? I understand that for the "ALTERNATIVE PARAMETERIZATIONS FOR THE CATEGORICAL LATENT VARIABLE REGRESSION" output I will need to exponentiate the parameter estimates to get odds ratios. However, these two outputs seem to provide different SEs. Thank you! 


Those will appear in the next Mplus version, 8.2. For now, please see the FAQ on our website which shows how to do it yourself: Odds ratio confidence interval from logOR estimate and SE 

Anna Austin posted on Wednesday, September 19, 2018  10:43 am



Thank you! The FAQ is helpful. My question is whether I use the SE provided under "LOGISTIC REGRESSION ODDS RATIO RESULTS" or under "ALTERNATIVE PARAMETERIZATIONS FOR THE CATEGORICAL LATENT VARIABLE REGRESSION" for calculating the odds ratios. I think "ALTERNATIVE PARAMETERIZATIONS FOR THE CATEGORICAL LATENT VARIABLE REGRESSION" is correct, but I wanted to check. 


"Alternative Parameterizations..." use different reference classes than the default last class that is presented under "Logistic Reg...". So if you want one of the alternatives, that's where you would get the ingredients to compute their ORs and SEs. 

Back to top 