Message/Author 

Ann Nguyen posted on Friday, October 17, 2014  3:27 pm



I am using the manual threestep latent class analysis approach (Asparouhov & Muthén, 2013) in order to predict a series of auxiliary distal outcomes. In step 1, I use my entire dataset to estimate the latent class model. Thus, the “most likely class” variable derived in step 2 is based on the entire dataset as well. In step 3, I use the “most likely class” variable from step 2 to predict multiple distal outcomes. (Step 3 was run separately for each distal outcome.) All except for one of my distal outcome variables includes all respondents in the dataset. The one exception is the distal outcome variable “depression severity.” In order for a respondent in this dataset to have a “depression severity” score, they must first be diagnosed with depression. Only a small proportion of respondents in this dataset have a depression diagnosis. Thus, in step 3, I am only using a subsample (i.e., respondents with a depression diagnosis) of the original dataset to predict “depression severity.” I am wondering if, in this instance of predicting the “depression severity” distal outcome, I am supposed to use only the ‘depressive subsample’ in step 1 (and subsequently step 2) rather than the entire sample. 


It is better to use the entire sample in Step 1. This is in line with a onestep approach. For a new distal outcome method called BCH that was introduced in Mplus Version 7.3, see the new Mplus Web Note 21 on our website: Asparouhov, T. & Muthén B. (2014). Auxiliary variables in mixture modeling: Using the BCH method in Mplus to estimate a distal outcome model and an arbitrary second model. Mplus Web Notes: No. 21. 

Ann Nguyen posted on Tuesday, October 21, 2014  10:28 am



Thank you for help, Dr. Muthen. As always, I am very impressed with (and appreciate) your and your colleagues' quick response. I have a follow up question for you. If I use the entire sample in step 1, is there a way to request the N for the regression analysis (with "depression severity" as the distal outcome) in step 3. The “SUMMARY OF ANALYSIS” section in step 3's output gives me “Number of observations,” but this appears to be the N for the entire sample and not the depressed subsample, which would have been used for the regression analysis. 


If you run the DCON option you get this information. 


I’m attempting to use the manual 3 step approach (as presented in webnote 15 and the associated appendices) to examine the relationships among latent class membership and several auxiliary covariates and one categorical distal outcome. 3,940 students are nested within 372 schools; all of my indicators and auxiliaries are categorical. Further, I am interested in using the parametric approach to modeling the latent class random intercepts (UG ex 10.6). I also have several predictors at the within and between levels. Questions: (1) For steps one and two of the manual 3 step, should I use TYPE=TW0LEVEL MIXTURE and then gather the resulting misclassification logits for use in the secondary model? (2) If 1 is correct, should I specify the between level factor (as in UG ex 10.6) before I gather the logits or after as part of my secondary model? I’ve compared class proportions, BIC, classification logits, etc. using TYPE=COMPLEX, TYPE=TWOLEVEL MIXTURE (without the between level factor) and TYPE=TWOLEVEL MIXTURE with the between level factor. Each method suggests the same number of classes, but the entropy, and classification logits are quite variable across the three approaches. Do you have any advice on how best to use the manual three step process (or another process) in the case of multilevel data? Thanks in advance for your time and expertise! 


I can recommend only TYPE=COMPLEX MIXTURE. The 3 step methodology has not been developed and used yet for TYPE=TWOLEVEL MIXTURE. 

Jen Doty posted on Saturday, January 03, 2015  1:59 pm



I have used GMM to identify 3 classes of closeness with mother overtime between generation1 & generation2. I would like to use these classes to predict a distal outcome in the children of generation3, but I have multiple children in families. Would it be possible to use TYPE = MIXTURE COMPLEX in step 3 of the manual 3step procedure, or would I need to use only 1 child from each family? 


Not sure that's implemented; try it. But wouldn't you want to consider similar age categories of children and therefore maybe use only 1 child from each family (and delete families who have no child in a certain category. 

db40 posted on Tuesday, May 12, 2015  6:25 am



Dear Bengt, I am somwhat confused. I am looking to estimate a 4class solution which has a number of covariates but also a 6 distal outcomes. I have been looking over webnote 15 to fimilarise myself with it (manual approach). Then i stumbled upon webnote 21 (Using the BCH method to estimate a distal outcome model...) Could you clarify which of these procedures I need to estimate this model? 


Our recommendations are shown in Table 6 and 7 of the BCH paper. The covariates can be handled in an R3STEP run. How to do the distals including covariates is described in a section of the BCH paper. 

db40 posted on Tuesday, May 12, 2015  8:47 am



Oh thank you. I guess since my variables are categorical I need to use the DCAT option. 

db40 posted on Wednesday, May 13, 2015  7:24 am



Dear Bengt, I have a query regarding one of your examples in the Asparouhov & Muthen (2014) paper "...using the BCH Method..." On page 10 the variable command details 10 variables for step 1. U1U8 y x ; On the second step (page11) there is now five more variables detailed presumably for the BCH weights. U1U8 y x w1w4 MLC ; I am unable to get the second step to run and I am guessing its because there are now five extra variables which do not match the .dat file since i get this error. "Unexpected end of file reached in data file." May I ask if the second step should be calling the .dat file that is saved out in step 1 (named as 2.dat)? 


Drs. Muthen and Asparouhov, Is there a recommended method for examining latent classes as a predictor of timeuntil event? 


db40: Yes, there is a typo  it should say 2.dat. 


Answer to Thomas Olino: The recommend method is illustrated with User's Guide example 8.17 and is related to the setting Basehazard=OFF(equal) setting which is actually the default setting. The effect of the latent class is summarized in the class specific mean for the survival variable which also has a mean of zero in the reference/last class. The method originates in Larsen, K. (2004), “Joint Analysis of TimetoEvent and Multiple Binary Indicators of Latent Classes,” Biometrics,60(1), 85–92. 

Ann Nguyen posted on Tuesday, June 30, 2015  10:33 am



I am using the manual threestep latent class analysis approach (Asparouhov & Muthén, 2013) in order to predict an auxiliary distal outcome (DISTRESS). My goal is to examine whether the latent classes predict DISTRESS. I understand how to set up this syntax. However, I am also interested in how the relationship between the latent classes and DISTRESS varies across 3 age groups. I think I need to add KNOWNCLASS to the analysis in order to analyze by age groups, but I do not know how to set up the syntax so that I would get auxiliary distal outcome analyses for each age group/”known class.” Is this possible? 


Use the dot option: %known#1.unknown#1% [distress]; etc 

Ann Nguyen posted on Thursday, July 02, 2015  8:42 am



Thank you, Dr. Muthen. 

Ann Nguyen posted on Thursday, September 10, 2015  8:36 am



I am using a manual 3 step LCA approach to predict the effects of demographic covariates (sex & educ) on latent class membership. I'm also interested in understanding how the relationships between the covariates and class membership vary by age. Thus, in step 3 I am using age (3 categories) as a knownclass variable. However, with the syntax below, I end up getting regression results for age by class (e.g., pattern 1 1), but I am only interested in results by age groups. That is, I would like multinomial logistic regression results (similar to those generated from the R3STEP command) for latent classes regressed on covariates for each age group. Could you please advise me how to correct the following syntax to achieve this? CLASSES = age(3) C(5); KNOWNCLASS = age (age3cat=1 age3cat=2 age3cat=3); ... MODEL: %OVERALL% c#1c#4 on sex educ; MODEL C: %c#1% [n#1@2.421]; ... %c#2% [n#1@0.052]; ... %c#3% [n#1@0.572]; ... %c#4% [n#1@0.012]; ... %c#5% [n#1@2.722]; ... MODEL AGE: %age#1% c#1c#4 on sex educ ; %age#2% c#1c#4 on sex educ ; %age#3% c#1c#4 on sex educ ; 


I don't see any problems with your syntax (except this %age#3% c#1c#4 on sex educ ; You should remove that  the last class results are what you get as %overall%, while the other classes effects are added to the overall) Consider User's Guide Table on page 499  which explains the parameterization. Also consider Section 4.2 in web note 15 http://statmodel.com/download/webnotes/webnote15.pdf If you are still in doubt, run each regression separately  using 3 separate runs, one for each age group. 

Dina Dajani posted on Monday, March 07, 2016  10:36 am



Hello, I am using the manual 3step approach to determine the effect of a latent class variable on a distal outcome (with a covariate). So, my paths are Y on C and Y on X. When I run the first step, I get in my output the average latent class probabilities for the most likely latent class membership, but I do not get the table with the logits for classfication probabilities, which I need to run step 3. Is there a command that I am missing to specify that output? Thank you, Dina Dajani 


I think the output should be there. Send output to Support along with your license number. 

Dina Dajani posted on Tuesday, March 08, 2016  8:19 am



I figured out that I was using an older version of Mplus but when I used Mplus 7.4 I got the logit output. I had a question about whether I am specifying my model correctly. To run the regression described above, my syntax is below. Is it correct? USEVARIABLES are Y X N; NOMINAL=N; MODEL: %overall% Y on X; C#1 on X; C#2 on X; %C#1% [N#1@3.605]; [N#2@.037]; [Y] (m1); %C#2% [N#1@.589]; [N#2@2.423]; [Y] (m2); %C#3% [N#1@4.535]; [N#2@3.855]; [Y] (m3); Model test: m1=m2; m1=m3; m2=m3; 


Looks right, but also check against section 3.2 of our BCH paper. 

Dina Dajani posted on Tuesday, March 08, 2016  10:37 am



Thank you very much for your quick reply. The issue I am having is with a "singular covariance matrix" and so the Wald test cannot be computed (which is the main result I am interested in). I believe the issue is that there is no variance of some covariates in certain classes. Is there any way to resolve this? WARNING: THE SAMPLE VARIANCE OF X1 IN CLASS 1 IS 0.000. WARNING: THE SAMPLE CORRELATION OF X2 AND X1 IN CLASS 3 IS 1.000. ONE OR MORE MULTINOMIAL LOGIT PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX... THE FOLLOWING PARAMETERS WERE FIXED: Parameter 11, C#1 ON X1 Parameter 15, C#2 ON X2 THE MODEL ESTIMATION TERMINATED NORMALLY WALD'S TEST COULD NOT BE COMPUTED BECAUSE OF A SINGULAR COVARIANCE MATRIX. 


Please send the output and your license number to support@statmodel.com. 

Ali posted on Wednesday, September 28, 2016  3:54 am



I am trying the 3step manual approach to see the relationship between class membership and 11 countries. However, I have error message" Unknown class label: %C#1%" Here is my codes: Names are ST53Q01 ST53Q02 ST53Q03 ST53Q04 SENWGT_S SCHOOLID STIDSTD ST04Q01 ESCS PV1MATH PV2MATH PV3MATH PV4MATH PV5MATH COUNTRY CPROB1 CPROB2 CPROB3 n; Usevariables are n ; Classes=CNT(11)c(3); Nominal is n; Knownclass = CNT(COUNTRY=1 COUNTRY=2 COUNTRY=3 COUNTRY=4 COUNTRY=5 COUNTRY=6 COUNTRY=7 COUNTRY=8 COUNTRY=9 COUNTRY=10 COUNTRY=11); WEIGHT=SENWGT_S; Analysis: Type=Mixture; Starts=0; Model: %overall% c on CNT; Model C; %C#1% [N#1@1.734]; [N#2@1.446]; %C#2% [N#1@2.435]; [N#2@1.202]; %C#3% [N#1@0.289]; [N#2@2.901]; 


You have a typo Model C; This line should end with a colon, not a semicolon. 

Ali posted on Thursday, September 29, 2016  1:34 am



Thank you! I added a colon after Model C, but it has an error message"*** ERROR in MODEL commandUnknown class model name C specified in Cspecific MODEL command." Here is my codes: Variable: Names are ST53Q01 ST53Q02 ST53Q03 ST53Q04 SENWGT_S SCHOOLID STIDSTD ST04Q01 ESCS PV1MATH PV2MATH PV3MATH PV4MATH PV5MATH COUNTRY CPROB1 CPROB2 CPROB3 n; Usevariables are n COUNTRY ; Classes=CNT(11)c(3); Nominal is n; Knownclass = CNT(COUNTRY=1 COUNTRY=2 COUNTRY=3 COUNTRY=4 COUNTRY=5 COUNTRY=6 COUNTRY=7 COUNTRY=8 COUNTRY=9 COUNTRY=10 COUNTRY=11); WEIGHT=SENWGT_S; Analysis: Type=Mixture; Starts=0; Model: %overall% c on CNT; Model C: %C#1% [N#1@1.734]; [N#2@1.446]; %C#2% [N#1@2.435]; [N#2@1.202]; %C#3% [N#1@0.289]; [N#2@2.901]; 


You say: Classes=CNT(11)c(3); Try adding a space before c(3). 

RuoShui posted on Tuesday, January 10, 2017  3:32 pm



Dear Dr. Muthen, May I please confirm with you that the equality tests of means across classes using the BCH procedure were based on Wald's test? Thank you. 


Yes. 

db40 posted on Monday, November 27, 2017  1:22 pm



Dear Dr Muthen, If I was conduct a 3 class LCA with covariates using (R3STEP) for age and gender, should I expect the estimates etc be the same when conducting the same model manually via the 3 step procedure? 2) also, should I expect the class counts and proportions from the 1st step be the same as the most likely class nominal variable? thank you 


Q1: Yes Q2: No  but the Most Likely Class values should be the same between the first and last steps. 

nidhi gupta posted on Wednesday, April 04, 2018  6:35 am



Dear Dr Muthen I am performing a manual 3step approach where i am using the suggested webnote by T Asparouhov  ‎2014. i want to get regression results at step 3 stratified for one variable. however when i tried modelling, i get following error "External training variables are not supported when KNOWNCLASS option is used." these are my codes VARIABLE: NAMES are V1V8 a g b f w s d al i mi W1W4 C1C4 m; Usevar are a g b s d al W1 W2 W3 W4; Classes = C(4) cg(2); KNOWNCLASS=cg(mi=0, mi=1); Missing=*; Training=W1, W2, W3, W4(bch); Data: file=step2.dat; Analysis: OPTSEED = 851945; TYPE = MIXTURE; ESTIMATOR = MLR; Model: %overall% C on a g s d al ; b on a g s d al ; %C#1% b on a g s d al; %C#2% b on a g s d al; %C#3% b on a g s d al; %C#4% b on a g s d al; Do you please help me? 


Then you may need to represent the groups using dummy variable covariates. 


Dear Dr. Muthen, Hello, I have two latent class variables (i.e., c1#1, c1#2, c1#3; c2#1, c2#2, c2#3) and a continuous distal outcome. Then, try to test the effects (main or possibly interaction?) of two class variables on an outcome. Do you have any suggestion for this analysis? Although I have done BCH, DCON, 3step analysis (manual step estimation), etc, I couldn't get what I want. Always thanks for your help. Warmly, Tacksoo 


The effects of the latent class variables is seen in the classvarying intercepts of the distal outcomes. 


Thank you for quick reply. By the way, I already checked the class varying intercepts of the distal outcomes. Interestingly, I got an error message when I simultaneously tested the effects of both variables. My code is; MODEL C1: %C1#1% [score](a1); [N1#1@2.9]; [N1#2@0.7]; %C1#2% [score](a2); [N1#1@11.8]; [N1#2@13.8]; %C1#3% [score](a3); [N1#1@2.4]; [N1#2@13.7]; MODEL C2: %C2#1% [score] (a4); [N2#1@3.1]; [N2#2@1.4]; %C2#2% [score] (a5); [N2#1@0.04]; [N2#2@2.6]; %C2#3% [score] (a6); [N2#1@3.8]; [N2#2@1.9]; MODEL TEST: 0 = a1a2; 0 = a1a3; 0= a4a5; 0= a4a6; I had the result of "score on C1", if I removed some lines regarding "C2" (i.e., [score] (a4), [score] (a5), [score] (a6), 0=a4a5, 0=a4a6) . But, when I tried to run the whole, there was an error message "Unknown parameter label in MODEL TEST: A1". Or, just run separately and report results from two different ones? 


Send the output for your run with the error message so we can see what you did wrong  send to Support along with your license number. 


Hello: I created profiles of families based on parents' characteristics and now I am interested in examining child outcomes. We have siblings in the data so the outcome variables are nested. Is there a way to use the BCH method to account for nesting in the dependent variables even though this is not relevant for creating the profiles? 


You can use type=complex mixture with the manual BCH to account for the nesting of observations in the second stage estimation using cluster=familyID. 


I have used the 3step manual approach (ml) before for a LCA model with covariates and a distal outcome, but I have a question about whether I can use the BCH manual method as well for a particular question. If I only have binary covariates (distal outcomes are continuous), would BCH be ok to use? It's unclear whether BCH is recommended for binary vs. continuous covariates. Thank you. 


Yes, BCH is ok to use here. The scale of the covariates is not relevant for that. 

Livia S. posted on Thursday, February 27, 2020  3:37 am



Dear Drs. Muthen and Asparouhov, Is the R3STEP approach applicable to a classinvariant (CI) growth mixture model? I cannot find examples, mostly are on GMMCV. My thought is that only the manual one is possible for GMMCIs. This is because, after running the unconditional GMMCI, the uncertainty rates obtained need to be added in the third step. Is that correct? Otherwhise, which syntax should I add to the R3STEP to specify that my model has class invariance? Thank you. Yours Sincerely, L. 


What is classinvariant GMM? 

Livia S. posted on Thursday, February 27, 2020  2:01 pm



Dear Dr. Muthen, Sorry, I forgot to mention I am using an approach suggested in the book "HIGHERORDER GROWTH CURVES AND MIXTURE MODELING WITH MPLUS" by Wickrama et al. (2016). "The second approach is a GMMCI, where variances and covariances are constrained to be the same across all classes (classinvariant variances and covariances). This is the default model in Mplus. The other approach is a GMMCV, where variances and covariances are freed to be estimated for all classes (classvarying variances and covariances)." (p.225) Hope this makes my question clearer. Best regards, L. 


I don't see offhand why R3STEP would be problematic for GMMCI. The GMMCI is the first step and is also used in the last step. You can specify that model. 

shonnslc posted on Monday, May 18, 2020  10:54 pm



Hi I am trying the 3step approach for the arbitrary secondary model. I have some questions: 1. I used Appendix D in Web Notes #15 to generate the data. However, in step 1, "Classification Probabilities for the Most Likely Latent Class Membership (Column) by Latent Class (Row)" in the Mplus output do not match the number on p. 18. This is what I had: .830 .046 .124 .072 .811 .117 .099 .094 .807 2. Can I directly use the values in the 3x3 matrix of the "Logits for the Classification Probabilities for the Most Likely Latent Class Membership (Column)by Latent Class (Row)" in the Mplus output instead of computing the measurement error based on the formula on p. 4? 3. When I used the formula on p.4 to compute the log ratios for each class c (p. 17) and probabilities provided in the above. The values do not match what is provided in Mplus output: N1 = 346; N2 = 306; N3 = 348 q11 =.836;log(q11/q31)= 2.12 q21 =.064;log(q21/q31)= .45 q31 =.100 q12 =.053;log(q12/q32)= .72 q22 =.836;log(q22/q32)= 2.03 q32 =.110 q13 =.119;log(q13/q33)= 1.88 q23 =.100;log(q23/q33)= 2.06 q33 =.781 4. I found that there is still shift in number of people in each class: Step 1 N1: 346 N2: 306 N3: 348 Step 3 N1: 340 N2: 304 N3: 356 Thank you very much!! 


1. You are getting the middle section of Table 1. To get the top section of Table 1 use output:ppclass; 2. Yes 3. See the footnotes on page 7 https://www.statmodel.com/download/3stepOct28.pdf This explains which table to use at which step. You started with the wrong Table. You would have to start with the Table that you get with output:ppclass; 2. Small shifts are expected. 

mboer posted on Wednesday, May 20, 2020  2:20 am



Dear Prof. Muthen, I have estimated an unconditional 3class GMMZIP model with intercept variance. Next, I would like to predict class membership (N) with gender using the manual R3step approach. Please find my syntax below. variable: NAMES = IGD_A IGD_B IGD_C IGD_D RESP FEMALE I S II SI C_I C_S C_II C_SI CPROB1 CPROB2 CPROB3 N; analysis: TYPE=MIXTURE; model: %OVERALL% C#1 C#2 ON FEMALE; %C#1% [N#1@1.800]; [N#2@0.395]; %C#2% [N#1@1.332]; [N#2@0.016]; %C#3% [N#1@8.098]; [N#2@3.164]; The output generated from this syntax shows that the univariate counts for N match the final class counts as observed in the first step (i.e. the GMM ZIP model). However, the final class counts as observed in the third step are different from the final class counts in the first step. Why do these final class counts differ? Or did I perhaps missspecify the third step / is this approach not appropriate for random intercept models? 


We need to see the outputs from all 3 steps  send to Support along with your license number. 

Back to top 