Message/Author 

Lewina Lee posted on Wednesday, October 31, 2012  4:09 pm



Dear Drs. Muthen, I would like to do (1) an LPA of 21 continuous variables, and (2)test class membership in relations to 9 covariates (x's) and 2 distal outcomes (y's). I intend to use the procedure for manually implementing the 3step approach described in M+ webnotes #15 v5. In the 3step approach, given that the measurement model is estimated independent of the auxiliary variables, does it make sense to proceed in the following manner? 1. Do class enumeration in Step 1 (e.g., do Step 1 with 1  8 classes) to identify the best one or two models while specifying AUXILIARY = x1, x2,..x9, y1, y2. 2. Do Step 2 (calculating measurement error for most likely class variable) for the best one or two models identified in class enumeration. 3. Do Step 3 (estimating the auxiliary model while specifying the latent class model with measurement errors obtained in Step 2) for the best one or two models from class enumeration. Can latent class membership (C) be regressed on covariates? Can distal outcomes be regressed on class membership in Step 3? Is it accurate to say that class membership will not shift regardless of modifications in the auxiliary model (e.g., adding/removing covariates and distal outcomes)? Thank you, Lewina 

Lewina Lee posted on Wednesday, October 31, 2012  4:18 pm



One more short question in addition to the above: In M+ webnotes #15 v5 Appendix F, Step 3 of the manuallyimplemented 3step model is specified with STARTS=0, are users supposed to follow that in actual analyses? (In Step 1, the authors noted that STARTS=0 was only specified to retain the order of classes as in the data generation step, and that users should remove that in actual analyses.) Thank you, Lewina 


Regarding your questions "Can latent class membership (C) be regressed on covariates? Can distal outcomes be regressed on class membership in Step 3?"  you can do that automatically using R3STEP and DU3STEP, but you can also do it manually. The class membership will not change due to the auxiliaries. For Step 3, STARTS=0 should be used because the class membership is essentially known. 

Lewina Lee posted on Thursday, November 01, 2012  12:01 pm



Thank you for your quick response, Dr. Muthen. In Step 3, if any of the distal outcomes and/or covariates are binary, do I need to specify that in a "CATEGORICAL ARE x1 x2 y1" statement? 


No, that is not available. They are treated as continuous so for binary distals you will get proportions. 

Lewina Lee posted on Wednesday, November 07, 2012  8:15 pm



Dr. Muthen, Regarding my question on 11/1/2012  12:01pm on using the CATEGORICAL ARE statement on distal outcomes  could you please clarify what you meant by "not available"? I tried Step 3 of the manual 3step approach by specifying: MODEL: %OVERALL% Y on x1 x2 x3; c on x1 x2 x3; %C#1% [N#1@ 5.04]; [N#2@ 2.38]; %C#2% [N#1@ 0.53]; [N#2@ 3.28]; Y on x1 x2 x3; %C#3% [N#1@ 4.42]; [N#2@ 2.64]; Y on x1 x2 x3; I tried it with vs. without the "CATEGORICAL = Y" statement. The regression results (pvalues) for Y on X1X3 are comparable in both cases. I see that I got an intercept for Y in each class when Y was modeled as continuous, as opposed to a threshold. Could you please help me understand why I could not use "CATEGORICAL ARE" with binary distals? What do I need to do to obtain P(distal=1), or odds of distal=1 in one class versus another class? Thank you very much for all your help. Lewina 


All variables on the AUXILIARY list are treated as continuous variables for the AUXILIARY functions whether they are on the CATEGORICAL list or not. 

Lewina Lee posted on Thursday, November 08, 2012  1:42 pm



If I am doing the manual 3step approach, I do not need to use the AUXILIARY= statement according to WebNote 15. (I only need to use AUXILIARY= in the automatic 3 step approach, e.g., using DU3STEP). Does that mean, in Step 3 of the manual 3step approach, I can specify covariates and outcomes with CATEGORICAL= ? Because when I used CATEGORICAL= at Step 3 of the manual 3step approach with a binary outcome, I was able to get an output on "LOGISTIC REGRESSION ODDS RATIO RESULTS." I just want to verify that this is ok. Thank you, Lewina 


Yes, it is. But you should not put any observed exogenous covariates on the CATEGORICAL list. This list is for dependent variables only. 

Lewina Lee posted on Thursday, November 08, 2012  4:32 pm



Thank you for the clarification, Linda! 


I am interested in testing whether means on a set of distal outcomes differ across growth trajectory classes (GMM), controlling for a set of covariates. The covariates have direct effects on growth factor means (class indicators) and the class indicators have direct effects on the outcomes within class (constrained equal across class). I used a onestep approach , but a reviewer suggested a 3step approach. Two questions: (1) Can I test whether *adjusted* means for the distal outcomes differ between classes with a manual 3step approach, and (2) Given the direct effects of covariates on class indicators (and class indicator effects on the distal outcomes) with entropy =.63 (obtained from the 1step final model), would the 1step approach be better suited than the 3step approach based on simulation results in webnote 15. Thanks! 


Scott I don't think it is possible to do a 3step approach for this model because you have a "class indicator effects on the distal outcomes". Since the class indicators are latent variables in stage 3 you cant use them (these latent variables, the growth factors, are measured and created in stage 1 only so they wont be available in stage 3). Tihomir 

cogdev posted on Monday, January 28, 2013  7:24 pm



I would like to use latent class membership from one series of indicators (along with a few other continuous covariates), to predict latent profile membership derived from a separate series of indicators. Clustering independently is theoretically important (separate domains), which is why the 3step procedure is appealing. I can manually run the 3step procedure separately for each latent class analysis (at least up to the 2nd step), to get the misclassification stats for each one. A 4profile/class solution fits best in both cases. Something along the lines of Ex7.14 appears to be close to what I need, except that I have a directed prediction from theory (actually more similar to Ex7.19, with a separate clustering variable instead of the factor), and a number of continuous covariates. So, what type of specification am I dealing with here, and how can I implement it (the auxiliary option doesn't seem designed to support this)? I can imagine that an analysis with categorical misclassification might be probabilitybased/fuzzy, or might need some sort of MCMC sampling? As a fall back, entropy is high (>.90) in both cases, so I guess I could 'hardcode' most likely cluster membership and run something like a multinomial logistic regression with covariates? Any help or direction here would be greatly appreciated. 


With entropy of .9 or greater, you can use most likely class membership. You do not need the 3step procedure. 


We are using the manual 3step method to test predictors in a latent transition analysis with three timepoints. Changing the predictors changes our class sizes, particularly for the third timepoint. Do you know why this is happening? 


Are you following the approach in Web Note 15 shown in Appendices L, M, N, O? 


We have measurement noninvariance, so each LCA is estimated separately. The nominal most likely class variable is obtained from each LCA estimation, without constraining any of the item thresholds. Other than that, we are following the approach in Web Note 15 shown in Appendices L, M, N, and O. 


If you don't constrain item thresholds all bets are off for keeping the same class formation. 


You constrained the item thresholds in your individual LCAs so that they would be the same as the LCAs from the initial LTA, which had measurement invariance. You then used the most likely class variable from the individual LCAs to run a second LTA, this time with predictors. When you change your predictors, do the class sizes change? This is the problem we are experiencing. We don't need to constrain item thresholds in our individual LCAs because they don't need to be the same as the LCAs from an initial LTA, because we don't want measurement invariance. The class sizes are changing considerably when we use the most likely class variable to run an LTA with predictors. We aren't changing the most likely class variable, we're just changing the predictors. Does that make sense? 


I actually meant to refer to fixing the nominal parameters (not the threshold invariance)  I assume you are doing that fixing in the 3rd step. So you are following appendices H, I, J. Those don't have a covariate. You could do your analyses on the Appendix K generated data which include x and follow the HIJ steps to see what happens. I don't think we have explored that. 


That's correct, we're fixing the nominal most likely class variable in the 3rd step. You think we should generate data with measurement invariance and a covariate, estimate LCAs without assuming measurement invariance, and then test the influence of that same covariate on our class solutions? Wouldn't that illuminate the consequences of not assuming measurement invariance when it exists? We would like to test the predictive strength of variables without changing the class solutions, like in r3step. It might be useful to generate data with two covariates and then test each independently to see if the class sizes change. However, we already have data and multiple covariates, and we already know the class sizes are changing. Why would this happen when we've fixed the probability of being in one class versus another? 


On your first question, I think it a useful exercise to make sure that class formation doesn't change in this simple case. On your second question, please send inputs, outputs, data, and license number to support so we can diagnose it. 


Will do! 


I see that you are using multiple imputation data. There seems to be a lot of variation across imputations given those huge SEs. A first step would be to analyze only one of those data sets. Also, which variables are imputed  the x's? So that there is no missing on the latent class indicators and the nominal logits don't vary across imputations. 


I am running a manual 3step approach with auxiliary variables following the web note 15, version 7. I have some missing data on my y variable, and I am using weights. When I run the syntax for step 3, I get the following error message: Invalid symbol in data file: "*" at record #: 2, field #: 44 Below is the code, what am I doing wrong? NAMES ARE (...); MISSING ARE all (9999); USEVARIABLES ARE C1_7FP0 RACE3 SEX 7CONCPT N; CLASSES = c(4); NOMINAL = N; CATEGORICAL ARE RACE3 SEX; WEIGHT IS C1_7FP0; Analysis: TYPE = mixture; ESTIMATOR = MLR; STARTS = 600 120; PROCESSORS = 4(STARTS); Model: %overall% C7CONCPT on RACE3 SEX; %C#1% [N#1@5.010]; [N#2@8.791]; [N#3@0.161]; C7CONCPT on RACE3 SEX; C7CONCPT; %C#2% [N#1@9.378]; [N#2@4.421]; [N#3@0.889]; C7CONCPT on RACE3 SEX; C7CONCPT; %C#3% [N#1@1.031]; [N#2@0.496]; [N#3@4.920]; C7CONCPT on RACE3 SEX; C7CONCPT; %C#4% [N#1@3.593]; [N#2@3.888];[N#3@4.885]; C7CONCPT on RACE3 SEX; C7CONCPT; Many thanks in advance 


Hello, I figured out my previous question, but now I have come across another error message (same syntax as above): *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#2 *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#3 *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#2 *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#3 *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#2 *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#3 *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#2 *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#3 *** ERROR The following MODEL statements are ignored: * Statements in Class 1: [ N#2 ] [ N#3 ] * Statements in Class 2: [ N#2 ] [ N#3 ] * Statements in Class 3: [ N#2 ] [ N#3 ] * Statements in Class 4: [ N#2 ] [ N#3 ] *** ERROR One or more MODEL statements were ignored. These statements may be incorrect or are only supported by ALGORITHM=INTEGRATION. I took the parameters from the Logits for the Classication Probabilities table from step 1. What is producing the error? Thank you 


P.S: When I add ALGORITHM=INTEGRATION I still get the error message. Thank you (and apologies for all the postings). 


In the future, please limit posts to one window. If you require more space it is not an appropriate question for Mplus Discussion. Please send your output and license number to support@statmodel.com. 


Dear Drs. Muthen, I would like to use the 3step approach to examine the effects of some covariates on latent trajectory classes. Because I have censored data, I cannot use the auxiliary function (R3STEP), as it does not run with algorithm = integration. I would like to use the manual 3step approach but I do not know how to obtain the logits for the classification probabilities. In my output I only have the average latent class probabilities. How could I obtain classification probabilities or logits in growth mixture models? Thank you so much! 


All covariates are treated as continuous so this is not a problem. Download Version 7.11 to obtain the values you want. 


Thank you so much! I downloaded version 7.11 and got the logits. 


I am running a model predicting continuous and categorical outcomes using a latent class variable and several covariates based on the manual 3 step procedure. I am freely estimating the means and thresholds of the Dvs for each class. I was wondering whether the means and thresholds of the DVs for each class are estimated values based on the mean value of all covariates in the model as a default or do I have to mean center them. We want to present the estimated means in a table for each class, but are unsure how they should be interpreted. Thanks. 


The means of the covariates have an influence if the covariates have direct effects on the outcomes, in which case you can center them. 


Hi, We are interested in using latent class as a predictor of a distal binary outcome (using the manual version of the 3step approach). Specifically, in step 3 we want to test whether the thresholds for the binary outcome differ between our two classes. As far as we understand, we can do this with a Wald test using the Model Test statement  that's what we did. In our first model, we use only latent class as a predictor. We get the Wald test as requested, but we also get an odds ratio and its corresponding significance test, which is exactly what we want. In our second model, we use latent class and some covariates as predictors of the outcome. When we include the covariates, the odds ratio we are interested in is no longer provided, just the requested Wald test. Would it be accurate to compute an odds ratio ourselves using the covariateadjusted thresholds from the output? Or if not, is there another way we can get this information from Mplus? We would like to be able to present the degree to which the odds ratio changes after taking into account the effects of the covariates on the outcome. Thank you very much! 


For both of your models you can express the odds ratio in Model Constraint using parameter labels from Model. That also gives you SEs so you can get a test. With covariates the odds ratio would be based on only the thresholds as you say. 


Thank you very much for your response, that was very helpful. From that, I have a followup question. I'm noticing that when I use Model Test to test threshold #2 = threshold #1 and compare that result to the "Latent Class Odds Ratio Results" that Mplus outputs automatically, or to the OR result from Model Constraint, I get quite different significance test results. The two OR significance tests are identical, but the test for the difference in thresholds from Model Test is quite different from the test of the OR. The only thought I've had is that the OR is being tested against a null value of 0 and not 1, but I'm not sure. I'm hoping someone can shed some light on my confusion. Thanks! 


You are right that the printed OR significance testing is the usual Mplus ratio: (Est  0)/ SE(Est) With ORs, the relevant ratio is instead (Est1)/SE(Est) so you have to do that by hand. Related to your testing you might be interested in the paper on our website Muthén, B., Brown, C.H., Masyn, K., Jo, B., Khoo, S.T., Yang, C.C., Wang, C.P., Kellam, S., Carlin, J., & Liao, J. (2002). General growth mixture modeling for randomized preventive interventions. Biostatistics, 3, 459475. Section 3.2 deals with thresholds for a binary distal outcome. 


Hi. How do I produce the table, "Logits for the Classification Probabilities for Most Likely Latent Class Membership (Column) by Latent Class (Row)"? 


That is included in the output since I think Version 7. 


Thanks. I have Mplus Version 7.2. My output does not have the table I asked about, logits. It does not have the table, "Classification Probabilities for the Most Likely Latent Class Membership (Row) by Latent Class (Column)". The output does have the table, "Average Latent Class Probabilities..." The output is from nonparametric twolevel mixture models that have categorical latent variables on both the within and between levels. These were demonstrated in Henry & Muthen, 2010, "Multilevel Latent Class Analysis: An Application...", Structural Equation Modeling, vol 17. The output states that Auxiliary= is not available for R3STEP, DCON, etc., when there are latent variables on the between and within levels. The desired model is: (1) use Time 1 indicators to derive latent classes, (2) regress a Time 2 distal outcome on Time 1 class membership, (3) use the Time 1 version of the Time 2 distal outcome as a latent class covariate/auxiliary. Can this be done with the manual 3step method if I have the table of logits (or table of classification probabilities to calculate the logits)? Thanks. 


The 3 step methodology has not been developed and used yet for TYPE=TWOLEVEL MIXTURE. 

Laura posted on Tuesday, April 07, 2015  6:43 am



Hi, It is said previously in this page that "With entropy of .9 or greater, you can use most likely class membership. You do not need the 3step procedure." Does this also apply to situations when analysing the predictors of latent classes in LCGA (using multinomial logistic regression, for example)? Is it possible to use the R3step method with the Mplus version 7, or do I need a newer version for that? I tried out an analysis with the auxiliary(r3step) in LCGA. The results were completely different to those obtained with auxiliary(r), and the standard errors were zeros for many of the predictors. Thank you in advance! 


Q1. Yes. Q2. If you don't get stopped when mentioning the R3STEP option then it is available. But you should always use the latest Mplus version. The tables as the end of our BCH paper explains that aux(r) is superseded by aux(r3step). 

Laura posted on Wednesday, April 08, 2015  9:01 am



Ok, thank you. Is there any reference for using the most likely latent class when the entropy is high (over 0.9)? At least in Clark & Muthen (2009) this was compared to some other methods. 


This reference should do it: Asparouhov, T. & Muthén, B. (2014). Auxiliary variables in mixture modeling: Threestep approaches using Mplus. Structural Equation Modeling: A Multidisciplinary Journal, 21:3, 329341. The posted version corrects several typos in the published version. An earlier version of this paper was posted as web note 15. 

db40 posted on Friday, May 15, 2015  6:16 am



Dear Bengt, might I add to Lauras question. The entropy of my latent class is 0.85. Would you consider this high enough so I dont have to use the 3step procedure? 

Jon Heron posted on Friday, May 15, 2015  11:50 am



Well I have a paper in press showing bias even up to entropy of 0.9 and I'm sure you could simulate the odd bizarre example where some estimates were biased at even higher values of entropy depending on the class distribution. I spent the best part of ten years avoiding the onestep model but I'm quite amenable to it now. 


In many cases we found that 0.85 is probably high enough, but as Jon says you cannot be sure. You should read his new paper. 


Drs. Muthen, I am trying to implement a 3step LTA model to ensure that the latent class variable measurements are not affected by inclusion of covariates in the model. In order to do so, I am using MPlus Web Note 15: Appendix KN and the NylundGibson paper titled "A Latent Transition Mixture Model Using the ThreeStep Specification". I ran the LTA model with measurement invariance and separately calculated the mostlikely class variables N1 and N2 for the Step 1 LCA at each of my two time points separately. However, I do not see Step 1 for the overall LTA in either the web note or the paper. I am not clear on how to obtain the nominal variable N1, N2 thresholds for latent class variables C1 and C2 in the LTA. Would I use the "Logits for the Classification Probabilities for the Most Likely Latent Class Membership" in the output of step 1 for C1 and C2 (Appendix L and M) or will these values change when running the LTA? Thanks, Raghav 


Answer to your paragraph 2: Use appendices L, M, and N. There is no "overall LTA". Answer to your last paragraph: Yes, you would use those logits. 


I am trying to do the 3rd step of the manual 3step approach for a LPA with a distal outcome and I have a couple of questions: (1) I can't figure out the auxiliary model input that I need to include so that I get the same output that I would have gotten using the DU3STEP for a distal outcome (which I can't use because of missing data on covariates having to use integration = montecarlo). (2) If I included covariates of the classes in Step 1, do I still include those covariates in the overall model statement in step 3? Thanks! 


1) Take Appendix E and remove "Y on X" http://statmodel.com/download/AppendicesOct28.pdf 2) No 

John Woo posted on Wednesday, August 12, 2015  12:38 pm



Hi, adding to Laura and db40 questions, does the average latent class probabilities for most likely latent class membership also matter? If entropy is below 0.9 but the average probabilities for the most likely are above 0.9 for all classes (i.e., the diagonals), can I still consider skipping the 3step approach? Thank you. 


Probably. 

Jon Heron posted on Thursday, August 13, 2015  7:58 am



My paper is still not published, but I found the offdiagonal elements of the Dmatrix (Mplus' second classification matrix) to be important. Setting entropy aside, if you have three classes and the offdiagonal elements of D relating to classes 1 & 2 are effectively zero (i.e. elements [1,2] and [2,1]) whilst those for classes 2 & 3 are not, then a parameter such a covariate effect comparing classes 1 and 2 (e.g. risk of class 2 relative to class 1) would be less biased than the same effect for class 3 relative to class 2. Bottom line (1) entropy and this "class separation" are both important (2) it's easier to do a onestep or new 3step than work out whether a simpler approach is adequate. 

John Woo posted on Wednesday, August 19, 2015  11:50 am



Hi, if i understand correctly, one of the rationales for 3step approach (or even 1step approach) is the idea that latent class formation is independent of the influences of the covariates. [I am leaving aside the matter of distal outcome.] In case of GMM, where covariates have potential paths to both the growth factors and latent class structure, does this 3step rationale apply only to the paths towards the latent class structure and not the growth factors? That is, when I run 3step approach, is it consistent with the 3step rationale that I include the covariates predicting the growth factors? I am a bit confused because it seems class formation can be influenced by growth factors, and, if the covariates influence the growth factors, then they indirectly influence the class formation as well? Thank you in advance. 


In GMM with effects from covariates on not only c but also i and s, you shouldn't use the R3STEP approach because your model has direct effects beyond those on c. 

Ali posted on Friday, March 11, 2016  2:40 am



I used the 3step estimation. First, I estimated LCA, and I got 3 classes. Second, I fixed the log ratio to the parameters [N#1][N#2]in the class 1,2,and 3.The third step I the Linear regression auxiliary model which is almost the same as the web note 15(page.15), but I have two predictors(extrinsic and intrinsic motivation) and five plausible values on math achievement(y). So, I run the 3rd step separately in terms of intrinsic and extrinsic motivation . The slopes are significant on all classes, no matter when I used intrinsic motivation or extrinsic motivation as a predictor. Later, I put the extrinsic and intrinsic motivation as predictors(the correlation between the predictors is 0.67), and it turned out that the extrinsic and intrinsic motivation are not significant on class 2. So, I am wondering that it is better to run the model with one predictor or two predictors. 


You can use the likelihood ratio test to decide this. 

Ali posted on Thursday, March 17, 2016  3:19 am



But, I have five plausible values for the dependent variable, so I got the mean of likelihood across five plausible values.So, maybe the likelihood ratio test could not work to compare the model with one predictor or two predictors. Is possible to compare models with five plausible values ? 


Because of the plausible values LRT is much more complicated and not available in Mplus. You can use Model Test though (include both predictors and test various combinations of coefficients). 

Ali posted on Wednesday, April 13, 2016  12:53 am



Hello, I posted the post on March 11th. I am using the LCA 3step estimation manually. I have two predictors to predict the five plausible values in students' math score. But, I had missing data on two predictors due to missing at random . I tried to do imputation on two predictors, but it seems that I can not do it, because I typed"TYPE=Imputation" in the DATA command for the five plausible values.After I run it, I still had around 12000 sample size,but the sample size decreased around 50%. Is it any way to deal with the missing data on the predictors? 


Run the 5 data sets one at a time and combine the results using the usual Imputation rules. see page 5 http://www.ats.ucla.edu/stat/sas/library/multipleimputation.pdf 

Ali posted on Thursday, April 14, 2016  12:06 am



Do you mean that I impute the missing data on the predictors for five times at the first time? For example, the first data with the first plausible values, so I impute the missing data on the predictors. And, do it five time.Then, run the five data sets in the same time? 


Yes  it is standard MI. You would have to combine the results from the 5 runs manually as described in this Section "Combining Inferences from Imputed Data Sets" in the above link. 


Hi, I need the odds ratio and confidence intervals from the latent variable multinominal logistic regression using the 3step procedure (R3STEP). I could calculate this manually by exponating the estimate, but I still need the confidence intervals. When I add CINTERVAL to the output: statement I only get the ci for "Model Results", "The probability scale" and for the "Latent Class Odds Ratio", but none from the multinominal logistic regression analysis. Is there a way to get the odds ratio and confidence intervals using the AUXILARY R3STEP procedure? 

Jon Heron posted on Friday, April 22, 2016  5:46 am



Just derive the confidence interval for the logodds and exponentiate that too OR = exp(Estimate) lower bound = exp(Estimate  1.96*S.E) upper bound = exp(Estimate + 1.96*S.E) 


See also our FAQs on odds ratios. 

Back to top 