Ann Nguyen posted on Friday, October 17, 2014 - 3:27 pm
I am using the manual three-step latent class analysis approach (Asparouhov & Muthén, 2013) in order to predict a series of auxiliary distal outcomes.
In step 1, I use my entire dataset to estimate the latent class model. Thus, the “most likely class” variable derived in step 2 is based on the entire dataset as well.
In step 3, I use the “most likely class” variable from step 2 to predict multiple distal outcomes. (Step 3 was run separately for each distal outcome.) All except for one of my distal outcome variables includes all respondents in the dataset. The one exception is the distal outcome variable “depression severity.” In order for a respondent in this dataset to have a “depression severity” score, they must first be diagnosed with depression. Only a small proportion of respondents in this dataset have a depression diagnosis. Thus, in step 3, I am only using a subsample (i.e., respondents with a depression diagnosis) of the original dataset to predict “depression severity.”
I am wondering if, in this instance of predicting the “depression severity” distal outcome, I am supposed to use only the ‘depressive subsample’ in step 1 (and subsequently step 2) rather than the entire sample.
It is better to use the entire sample in Step 1. This is in line with a one-step approach.
For a new distal outcome method called BCH that was introduced in Mplus Version 7.3, see the new Mplus Web Note 21 on our website:
Asparouhov, T. & Muthén B. (2014). Auxiliary variables in mixture modeling: Using the BCH method in Mplus to estimate a distal outcome model and an arbitrary second model. Mplus Web Notes: No. 21.
Ann Nguyen posted on Tuesday, October 21, 2014 - 10:28 am
Thank you for help, Dr. Muthen. As always, I am very impressed with (and appreciate) your and your colleagues' quick response. I have a follow up question for you. If I use the entire sample in step 1, is there a way to request the N for the regression analysis (with "depression severity" as the distal outcome) in step 3. The “SUMMARY OF ANALYSIS” section in step 3's output gives me “Number of observations,” but this appears to be the N for the entire sample and not the depressed subsample, which would have been used for the regression analysis.
I’m attempting to use the manual 3 step approach (as presented in webnote 15 and the associated appendices) to examine the relationships among latent class membership and several auxiliary covariates and one categorical distal outcome. 3,940 students are nested within 372 schools; all of my indicators and auxiliaries are categorical. Further, I am interested in using the parametric approach to modeling the latent class random intercepts (UG ex 10.6). I also have several predictors at the within and between levels. Questions: (1) For steps one and two of the manual 3 step, should I use TYPE=TW0LEVEL MIXTURE and then gather the resulting misclassification logits for use in the secondary model? (2) If 1 is correct, should I specify the between level factor (as in UG ex 10.6) before I gather the logits or after as part of my secondary model? I’ve compared class proportions, BIC, classification logits, etc. using TYPE=COMPLEX, TYPE=TWOLEVEL MIXTURE (without the between level factor) and TYPE=TWOLEVEL MIXTURE with the between level factor. Each method suggests the same number of classes, but the entropy, and classification logits are quite variable across the three approaches. Do you have any advice on how best to use the manual three step process (or another process) in the case of multilevel data? Thanks in advance for your time and expertise!
I can recommend only TYPE=COMPLEX MIXTURE. The 3 step methodology has not been developed and used yet for TYPE=TWOLEVEL MIXTURE.
Jen Doty posted on Saturday, January 03, 2015 - 1:59 pm
I have used GMM to identify 3 classes of closeness with mother overtime between generation1 & generation2. I would like to use these classes to predict a distal outcome in the children of generation3, but I have multiple children in families.
Would it be possible to use TYPE = MIXTURE COMPLEX in step 3 of the manual 3-step procedure, or would I need to use only 1 child from each family?
Not sure that's implemented; try it. But wouldn't you want to consider similar age categories of children and therefore maybe use only 1 child from each family (and delete families who have no child in a certain category.
The recommend method is illustrated with User's Guide example 8.17 and is related to the setting Basehazard=OFF(equal) setting which is actually the default setting. The effect of the latent class is summarized in the class specific mean for the survival variable which also has a mean of zero in the reference/last class. The method originates in
Larsen, K. (2004), “Joint Analysis of Time-to-Event and Multiple Binary Indicators of Latent Classes,” Biometrics,60(1), 85–92.
Ann Nguyen posted on Tuesday, June 30, 2015 - 10:33 am
I am using the manual three-step latent class analysis approach (Asparouhov & Muthén, 2013) in order to predict an auxiliary distal outcome (DISTRESS). My goal is to examine whether the latent classes predict DISTRESS. I understand how to set up this syntax. However, I am also interested in how the relationship between the latent classes and DISTRESS varies across 3 age groups. I think I need to add KNOWNCLASS to the analysis in order to analyze by age groups, but I do not know how to set up the syntax so that I would get auxiliary distal outcome analyses for each age group/”known class.” Is this possible?
Ann Nguyen posted on Thursday, July 02, 2015 - 8:42 am
Thank you, Dr. Muthen.
Ann Nguyen posted on Thursday, September 10, 2015 - 8:36 am
I am using a manual 3 step LCA approach to predict the effects of demographic covariates (sex & educ) on latent class membership. I'm also interested in understanding how the relationships between the covariates and class membership vary by age. Thus, in step 3 I am using age (3 categories) as a knownclass variable. However, with the syntax below, I end up getting regression results for age by class (e.g., pattern 1 1), but I am only interested in results by age groups. That is, I would like multinomial logistic regression results (similar to those generated from the R3STEP command) for latent classes regressed on covariates for each age group. Could you please advise me how to correct the following syntax to achieve this?
I don't see any problems with your syntax (except this %age#3% c#1-c#4 on sex educ ; You should remove that - the last class results are what you get as %overall%, while the other classes effects are added to the overall)
Consider User's Guide Table on page 499 - which explains the parameterization.
If you are still in doubt, run each regression separately - using 3 separate runs, one for each age group.
Dina Dajani posted on Monday, March 07, 2016 - 10:36 am
I am using the manual 3-step approach to determine the effect of a latent class variable on a distal outcome (with a covariate). So, my paths are Y on C and Y on X. When I run the first step, I get in my output the average latent class probabilities for the most likely latent class membership, but I do not get the table with the logits for classfication probabilities, which I need to run step 3. Is there a command that I am missing to specify that output?
Send output to Support along with your license number.
Dina Dajani posted on Tuesday, March 08, 2016 - 8:19 am
I figured out that I was using an older version of Mplus but when I used Mplus 7.4 I got the logit output. I had a question about whether I am specifying my model correctly. To run the regression described above, my syntax is below. Is it correct?
Y X N; NOMINAL=N; MODEL: %overall% Y on X; C#1 on X; C#2 on X;
Looks right, but also check against section 3.2 of our BCH paper.
Dina Dajani posted on Tuesday, March 08, 2016 - 10:37 am
Thank you very much for your quick reply. The issue I am having is with a "singular covariance matrix" and so the Wald test cannot be computed (which is the main result I am interested in). I believe the issue is that there is no variance of some covariates in certain classes. Is there any way to resolve this?
WARNING: THE SAMPLE VARIANCE OF X1 IN CLASS 1 IS 0.000.
WARNING: THE SAMPLE CORRELATION OF X2 AND X1 IN CLASS 3 IS -1.000.
ONE OR MORE MULTINOMIAL LOGIT PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX... THE FOLLOWING PARAMETERS WERE FIXED: Parameter 11, C#1 ON X1 Parameter 15, C#2 ON X2
THE MODEL ESTIMATION TERMINATED NORMALLY
WALD'S TEST COULD NOT BE COMPUTED BECAUSE OF A SINGULAR COVARIANCE MATRIX.
Ali posted on Wednesday, September 28, 2016 - 3:54 am
I am trying the 3-step manual approach to see the relationship between class membership and 11 countries. However, I have error message" Unknown class label: %C#1%" Here is my codes: Names are ST53Q01 ST53Q02 ST53Q03 ST53Q04 SENWGT_S SCHOOLID STIDSTD ST04Q01 ESCS PV1MATH PV2MATH PV3MATH PV4MATH PV5MATH COUNTRY CPROB1 CPROB2 CPROB3 n; Usevariables are n ; Classes=CNT(11)c(3); Nominal is n; Knownclass = CNT(COUNTRY=1 COUNTRY=2 COUNTRY=3 COUNTRY=4 COUNTRY=5 COUNTRY=6 COUNTRY=7 COUNTRY=8 COUNTRY=9 COUNTRY=10 COUNTRY=11); WEIGHT=SENWGT_S; Analysis: Type=Mixture; Starts=0; Model: %overall% c on CNT; Model C; %C#1% [Nemail@example.com]; [N#2@-1.446]; %C#2% [N#1@-2.435]; [N#2@-1.202]; %C#3% [Nfirstname.lastname@example.org]; [N#2@-2.901];
Q2: No - but the Most Likely Class values should be the same between the first and last steps.
nidhi gupta posted on Wednesday, April 04, 2018 - 6:35 am
Dear Dr Muthen I am performing a manual 3step approach where i am using the suggested webnote by T Asparouhov - ‎2014. i want to get regression results at step 3 stratified for one variable. however when i tried modelling, i get following error "External training variables are not supported when KNOWNCLASS option is used."
these are my codes
VARIABLE: NAMES are V1-V8 a g b f w s d al i mi W1-W4 C1-C4 m; Usevar are a g b s d al W1 W2 W3 W4; Classes = C(4) cg(2); KNOWNCLASS=cg(mi=0, mi=1); Missing=*; Training=W1, W2, W3, W4(bch); Data: file=step2.dat; Analysis: OPTSEED = 851945; TYPE = MIXTURE; ESTIMATOR = MLR; Model: %overall% C on a g s d al ; b on a g s d al ; %C#1% b on a g s d al; %C#2% b on a g s d al; %C#3% b on a g s d al; %C#4% b on a g s d al;
I have two latent class variables (i.e., c1#1, c1#2, c1#3; c2#1, c2#2, c2#3) and a continuous distal outcome. Then, try to test the effects (main or possibly interaction?) of two class variables on an outcome.
Do you have any suggestion for this analysis? Although I have done BCH, DCON, 3-step analysis (manual step estimation), etc, I couldn't get what I want.
Thank you for quick reply. By the way, I already checked the class varying intercepts of the distal outcomes. Interestingly, I got an error message when I simultaneously tested the effects of both variables.
I had the result of "score on C1", if I removed some lines regarding "C2" (i.e., [score] (a4), [score] (a5), [score] (a6), 0=a4-a5, 0=a4-a6) . But, when I tried to run the whole, there was an error message "Unknown parameter label in MODEL TEST: A1". Or, just run separately and report results from two different ones?
I have used the 3-step manual approach (ml) before for a LCA model with covariates and a distal outcome, but I have a question about whether I can use the BCH manual method as well for a particular question.
If I only have binary covariates (distal outcomes are continuous), would BCH be ok to use? It's unclear whether BCH is recommended for binary vs. continuous covariates.
Livia S. posted on Thursday, February 27, 2020 - 2:01 pm
Dear Dr. Muthen,
Sorry, I forgot to mention I am using an approach suggested in the book "HIGHER-ORDER GROWTH CURVES AND MIXTURE MODELING WITH MPLUS" by Wickrama et al. (2016).
"The second approach is a GMM-CI, where variances and covariances are constrained to be the same across all classes (class-invariant variances and covariances). This is the default model in Mplus. The other approach is a GMM-CV, where variances and covariances are freed to be estimated for all classes (class-varying variances and covariances)." (p.225)
Hope this makes my question clearer. Best regards, L.
I don't see off-hand why R3STEP would be problematic for GMM-CI. The GMM-CI is the first step and is also used in the last step. You can specify that model.
shonnslc posted on Monday, May 18, 2020 - 10:54 pm
Hi I am trying the 3-step approach for the arbitrary secondary model. I have some questions:
1. I used Appendix D in Web Notes #15 to generate the data. However, in step 1, "Classification Probabilities for the Most Likely Latent Class Membership (Column) by Latent Class (Row)" in the Mplus output do not match the number on p. 18. This is what I had:
.830 .046 .124 .072 .811 .117 .099 .094 .807
2. Can I directly use the values in the 3x3 matrix of the "Logits for the Classification Probabilities for the Most Likely Latent Class Membership (Column)by Latent Class (Row)" in the Mplus output instead of computing the measurement error based on the formula on p. 4?
3. When I used the formula on p.4 to compute the log ratios for each class c (p. 17) and probabilities provided in the above. The values do not match what is provided in Mplus output:
I have estimated an unconditional 3-class GMM-ZIP model with intercept variance. Next, I would like to predict class membership (N) with gender using the manual R3step approach. Please find my syntax below.
variable: NAMES = IGD_A IGD_B IGD_C IGD_D RESP FEMALE I S II SI C_I C_S C_II C_SI CPROB1 CPROB2 CPROB3 N; analysis: TYPE=MIXTURE; model: %OVERALL% C#1 C#2 ON FEMALE; %C#1% [Nemail@example.com]; [Nfirstname.lastname@example.org]; %C#2% [N#1@-1.332]; [N#2@-0.016]; %C#3% [N#1@-8.098]; [N#2@-3.164];
The output generated from this syntax shows that the univariate counts for N match the final class counts as observed in the first step (i.e. the GMM ZIP model). However, the final class counts as observed in the third step are different from the final class counts in the first step. Why do these final class counts differ? Or did I perhaps miss-specify the third step / is this approach not appropriate for random intercept models?