Ann Nguyen posted on Friday, October 17, 2014 - 3:27 pm
I am using the manual three-step latent class analysis approach (Asparouhov & Muthén, 2013) in order to predict a series of auxiliary distal outcomes.
In step 1, I use my entire dataset to estimate the latent class model. Thus, the “most likely class” variable derived in step 2 is based on the entire dataset as well.
In step 3, I use the “most likely class” variable from step 2 to predict multiple distal outcomes. (Step 3 was run separately for each distal outcome.) All except for one of my distal outcome variables includes all respondents in the dataset. The one exception is the distal outcome variable “depression severity.” In order for a respondent in this dataset to have a “depression severity” score, they must first be diagnosed with depression. Only a small proportion of respondents in this dataset have a depression diagnosis. Thus, in step 3, I am only using a subsample (i.e., respondents with a depression diagnosis) of the original dataset to predict “depression severity.”
I am wondering if, in this instance of predicting the “depression severity” distal outcome, I am supposed to use only the ‘depressive subsample’ in step 1 (and subsequently step 2) rather than the entire sample.
It is better to use the entire sample in Step 1. This is in line with a one-step approach.
For a new distal outcome method called BCH that was introduced in Mplus Version 7.3, see the new Mplus Web Note 21 on our website:
Asparouhov, T. & Muthén B. (2014). Auxiliary variables in mixture modeling: Using the BCH method in Mplus to estimate a distal outcome model and an arbitrary second model. Mplus Web Notes: No. 21.
Ann Nguyen posted on Tuesday, October 21, 2014 - 10:28 am
Thank you for help, Dr. Muthen. As always, I am very impressed with (and appreciate) your and your colleagues' quick response. I have a follow up question for you. If I use the entire sample in step 1, is there a way to request the N for the regression analysis (with "depression severity" as the distal outcome) in step 3. The “SUMMARY OF ANALYSIS” section in step 3's output gives me “Number of observations,” but this appears to be the N for the entire sample and not the depressed subsample, which would have been used for the regression analysis.
I’m attempting to use the manual 3 step approach (as presented in webnote 15 and the associated appendices) to examine the relationships among latent class membership and several auxiliary covariates and one categorical distal outcome. 3,940 students are nested within 372 schools; all of my indicators and auxiliaries are categorical. Further, I am interested in using the parametric approach to modeling the latent class random intercepts (UG ex 10.6). I also have several predictors at the within and between levels. Questions: (1) For steps one and two of the manual 3 step, should I use TYPE=TW0LEVEL MIXTURE and then gather the resulting misclassification logits for use in the secondary model? (2) If 1 is correct, should I specify the between level factor (as in UG ex 10.6) before I gather the logits or after as part of my secondary model? I’ve compared class proportions, BIC, classification logits, etc. using TYPE=COMPLEX, TYPE=TWOLEVEL MIXTURE (without the between level factor) and TYPE=TWOLEVEL MIXTURE with the between level factor. Each method suggests the same number of classes, but the entropy, and classification logits are quite variable across the three approaches. Do you have any advice on how best to use the manual three step process (or another process) in the case of multilevel data? Thanks in advance for your time and expertise!
I can recommend only TYPE=COMPLEX MIXTURE. The 3 step methodology has not been developed and used yet for TYPE=TWOLEVEL MIXTURE.
Jen Doty posted on Saturday, January 03, 2015 - 1:59 pm
I have used GMM to identify 3 classes of closeness with mother overtime between generation1 & generation2. I would like to use these classes to predict a distal outcome in the children of generation3, but I have multiple children in families.
Would it be possible to use TYPE = MIXTURE COMPLEX in step 3 of the manual 3-step procedure, or would I need to use only 1 child from each family?
Not sure that's implemented; try it. But wouldn't you want to consider similar age categories of children and therefore maybe use only 1 child from each family (and delete families who have no child in a certain category.
The recommend method is illustrated with User's Guide example 8.17 and is related to the setting Basehazard=OFF(equal) setting which is actually the default setting. The effect of the latent class is summarized in the class specific mean for the survival variable which also has a mean of zero in the reference/last class. The method originates in
Larsen, K. (2004), “Joint Analysis of Time-to-Event and Multiple Binary Indicators of Latent Classes,” Biometrics,60(1), 85–92.
Ann Nguyen posted on Tuesday, June 30, 2015 - 10:33 am
I am using the manual three-step latent class analysis approach (Asparouhov & Muthén, 2013) in order to predict an auxiliary distal outcome (DISTRESS). My goal is to examine whether the latent classes predict DISTRESS. I understand how to set up this syntax. However, I am also interested in how the relationship between the latent classes and DISTRESS varies across 3 age groups. I think I need to add KNOWNCLASS to the analysis in order to analyze by age groups, but I do not know how to set up the syntax so that I would get auxiliary distal outcome analyses for each age group/”known class.” Is this possible?
Ann Nguyen posted on Thursday, July 02, 2015 - 8:42 am
Thank you, Dr. Muthen.
Ann Nguyen posted on Thursday, September 10, 2015 - 8:36 am
I am using a manual 3 step LCA approach to predict the effects of demographic covariates (sex & educ) on latent class membership. I'm also interested in understanding how the relationships between the covariates and class membership vary by age. Thus, in step 3 I am using age (3 categories) as a knownclass variable. However, with the syntax below, I end up getting regression results for age by class (e.g., pattern 1 1), but I am only interested in results by age groups. That is, I would like multinomial logistic regression results (similar to those generated from the R3STEP command) for latent classes regressed on covariates for each age group. Could you please advise me how to correct the following syntax to achieve this?
I don't see any problems with your syntax (except this %age#3% c#1-c#4 on sex educ ; You should remove that - the last class results are what you get as %overall%, while the other classes effects are added to the overall)
Consider User's Guide Table on page 499 - which explains the parameterization.
If you are still in doubt, run each regression separately - using 3 separate runs, one for each age group.
Dina Dajani posted on Monday, March 07, 2016 - 10:36 am
I am using the manual 3-step approach to determine the effect of a latent class variable on a distal outcome (with a covariate). So, my paths are Y on C and Y on X. When I run the first step, I get in my output the average latent class probabilities for the most likely latent class membership, but I do not get the table with the logits for classfication probabilities, which I need to run step 3. Is there a command that I am missing to specify that output?
Send output to Support along with your license number.
Dina Dajani posted on Tuesday, March 08, 2016 - 8:19 am
I figured out that I was using an older version of Mplus but when I used Mplus 7.4 I got the logit output. I had a question about whether I am specifying my model correctly. To run the regression described above, my syntax is below. Is it correct?
Y X N; NOMINAL=N; MODEL: %overall% Y on X; C#1 on X; C#2 on X;
Looks right, but also check against section 3.2 of our BCH paper.
Dina Dajani posted on Tuesday, March 08, 2016 - 10:37 am
Thank you very much for your quick reply. The issue I am having is with a "singular covariance matrix" and so the Wald test cannot be computed (which is the main result I am interested in). I believe the issue is that there is no variance of some covariates in certain classes. Is there any way to resolve this?
WARNING: THE SAMPLE VARIANCE OF X1 IN CLASS 1 IS 0.000.
WARNING: THE SAMPLE CORRELATION OF X2 AND X1 IN CLASS 3 IS -1.000.
ONE OR MORE MULTINOMIAL LOGIT PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX... THE FOLLOWING PARAMETERS WERE FIXED: Parameter 11, C#1 ON X1 Parameter 15, C#2 ON X2
THE MODEL ESTIMATION TERMINATED NORMALLY
WALD'S TEST COULD NOT BE COMPUTED BECAUSE OF A SINGULAR COVARIANCE MATRIX.
Ali posted on Wednesday, September 28, 2016 - 3:54 am
I am trying the 3-step manual approach to see the relationship between class membership and 11 countries. However, I have error message" Unknown class label: %C#1%" Here is my codes: Names are ST53Q01 ST53Q02 ST53Q03 ST53Q04 SENWGT_S SCHOOLID STIDSTD ST04Q01 ESCS PV1MATH PV2MATH PV3MATH PV4MATH PV5MATH COUNTRY CPROB1 CPROB2 CPROB3 n; Usevariables are n ; Classes=CNT(11)c(3); Nominal is n; Knownclass = CNT(COUNTRY=1 COUNTRY=2 COUNTRY=3 COUNTRY=4 COUNTRY=5 COUNTRY=6 COUNTRY=7 COUNTRY=8 COUNTRY=9 COUNTRY=10 COUNTRY=11); WEIGHT=SENWGT_S; Analysis: Type=Mixture; Starts=0; Model: %overall% c on CNT; Model C; %C#1% [Nemail@example.com]; [N#2@-1.446]; %C#2% [N#1@-2.435]; [N#2@-1.202]; %C#3% [Nfirstname.lastname@example.org]; [N#2@-2.901];