Manual 3-step approach PreviousNext
Mplus Discussion > Latent Variable Mixture Modeling >
Message/Author
 Ann Nguyen posted on Friday, October 17, 2014 - 3:27 pm
I am using the manual three-step latent class analysis approach (Asparouhov & Muthén, 2013) in order to predict a series of auxiliary distal outcomes.

In step 1, I use my entire dataset to estimate the latent class model. Thus, the “most likely class” variable derived in step 2 is based on the entire dataset as well.

In step 3, I use the “most likely class” variable from step 2 to predict multiple distal outcomes. (Step 3 was run separately for each distal outcome.) All except for one of my distal outcome variables includes all respondents in the dataset. The one exception is the distal outcome variable “depression severity.” In order for a respondent in this dataset to have a “depression severity” score, they must first be diagnosed with depression. Only a small proportion of respondents in this dataset have a depression diagnosis. Thus, in step 3, I am only using a subsample (i.e., respondents with a depression diagnosis) of the original dataset to predict “depression severity.”

I am wondering if, in this instance of predicting the “depression severity” distal outcome, I am supposed to use only the ‘depressive subsample’ in step 1 (and subsequently step 2) rather than the entire sample.
 Bengt O. Muthen posted on Friday, October 17, 2014 - 4:45 pm
It is better to use the entire sample in Step 1. This is in line with a one-step approach.

For a new distal outcome method called BCH that was introduced in Mplus Version 7.3, see the new Mplus Web Note 21 on our website:

Asparouhov, T. & Muthén B. (2014). Auxiliary variables in mixture modeling: Using the BCH method in Mplus to estimate a distal outcome model and an arbitrary second model. Mplus Web Notes: No. 21.
 Ann Nguyen posted on Tuesday, October 21, 2014 - 10:28 am
Thank you for help, Dr. Muthen. As always, I am very impressed with (and appreciate) your and your colleagues' quick response. I have a follow up question for you. If I use the entire sample in step 1, is there a way to request the N for the regression analysis (with "depression severity" as the distal outcome) in step 3. The “SUMMARY OF ANALYSIS” section in step 3's output gives me “Number of observations,” but this appears to be the N for the entire sample and not the depressed subsample, which would have been used for the regression analysis.
 Bengt O. Muthen posted on Tuesday, October 21, 2014 - 12:16 pm
If you run the DCON option you get this information.
 Ryan Cartnal posted on Monday, December 15, 2014 - 5:02 pm
I’m attempting to use the manual 3 step approach (as presented in webnote 15 and the associated appendices) to examine the relationships among latent class membership and several auxiliary covariates and one categorical distal outcome. 3,940 students are nested within 372 schools; all of my indicators and auxiliaries are categorical. Further, I am interested in using the parametric approach to modeling the latent class random intercepts (UG ex 10.6). I also have several predictors at the within and between levels.
Questions:
(1) For steps one and two of the manual 3 step, should I use TYPE=TW0LEVEL MIXTURE and then gather the resulting misclassification logits for use in the secondary model?
(2) If 1 is correct, should I specify the between level factor (as in UG ex 10.6) before I gather the logits or after as part of my secondary model?
I’ve compared class proportions, BIC, classification logits, etc. using TYPE=COMPLEX, TYPE=TWOLEVEL MIXTURE (without the between level factor) and TYPE=TWOLEVEL MIXTURE with the between level factor. Each method suggests the same number of classes, but the entropy, and classification logits are quite variable across the three approaches. Do you have any advice on how best to use the manual three step process (or another process) in the case of multilevel data?
Thanks in advance for your time and expertise!
 Tihomir Asparouhov posted on Tuesday, December 16, 2014 - 2:09 pm
I can recommend only TYPE=COMPLEX MIXTURE. The 3 step methodology has not been developed and used yet for TYPE=TWOLEVEL MIXTURE.
 Jen Doty posted on Saturday, January 03, 2015 - 1:59 pm
I have used GMM to identify 3 classes of closeness with mother overtime between generation1 & generation2. I would like to use these classes to predict a distal outcome in the children of generation3, but I have multiple children in families.

Would it be possible to use TYPE = MIXTURE COMPLEX in step 3 of the manual 3-step procedure, or would I need to use only 1 child from each family?
 Bengt O. Muthen posted on Saturday, January 03, 2015 - 5:28 pm
Not sure that's implemented; try it. But wouldn't you want to consider similar age categories of children and therefore maybe use only 1 child from each family (and delete families who have no child in a certain category.
 db40 posted on Tuesday, May 12, 2015 - 6:25 am
Dear Bengt,

I am somwhat confused. I am looking to estimate a 4class solution which has a number of covariates but also a 6 distal outcomes.

I have been looking over webnote 15 to fimilarise myself with it (manual approach). Then i stumbled upon webnote 21 (Using the BCH method to estimate a distal outcome model...)

Could you clarify which of these procedures I need to estimate this model?
 Bengt O. Muthen posted on Tuesday, May 12, 2015 - 7:45 am
Our recommendations are shown in Table 6 and 7 of the BCH paper. The covariates can be handled in an R3STEP run. How to do the distals including covariates is described in a section of the BCH paper.
 db40 posted on Tuesday, May 12, 2015 - 8:47 am
Oh thank you. I guess since my variables are categorical I need to use the DCAT option.
 db40 posted on Wednesday, May 13, 2015 - 7:24 am
Dear Bengt,

I have a query regarding one of your examples in the Asparouhov & Muthen (2014) paper "...using the BCH Method..."

On page 10 the variable command details 10 variables for step 1.

U1-U8 y x ;

On the second step (page11) there is now five more variables detailed presumably for the BCH weights.

U1-U8 y x w1-w4 MLC ;

I am unable to get the second step to run and I am guessing its because there are now five extra variables which do not match the .dat file since i get this error.

"Unexpected end of file reached in data file."

May I ask if the second step should be calling the .dat file that is saved out in step 1 (named as 2.dat)?
 Thomas Olino posted on Thursday, May 14, 2015 - 6:42 am
Drs. Muthen and Asparouhov,

Is there a recommended method for examining latent classes as a predictor of time-until event?
 Bengt O. Muthen posted on Thursday, May 14, 2015 - 11:11 am
db40:

Yes, there is a typo - it should say 2.dat.
 Tihomir Asparouhov posted on Thursday, May 14, 2015 - 11:42 am
Answer to Thomas Olino:

The recommend method is illustrated with User's Guide example 8.17 and is related to the setting Basehazard=OFF(equal) setting which is actually the default setting. The effect of the latent class is summarized in the class specific mean for the survival variable which also has a mean of zero in the reference/last class. The method originates in

Larsen, K. (2004), “Joint Analysis of Time-to-Event and Multiple Binary Indicators of Latent Classes,” Biometrics,60(1), 85–92.
 Ann Nguyen posted on Tuesday, June 30, 2015 - 10:33 am
I am using the manual three-step latent class analysis approach (Asparouhov & Muthén, 2013) in order to predict an auxiliary distal outcome (DISTRESS). My goal is to examine whether the latent classes predict DISTRESS. I understand how to set up this syntax. However, I am also interested in how the relationship between the latent classes and DISTRESS varies across 3 age groups. I think I need to add KNOWNCLASS to the analysis in order to analyze by age groups, but I do not know how to set up the syntax so that I would get auxiliary distal outcome analyses for each age group/”known class.” Is this possible?
 Bengt O. Muthen posted on Wednesday, July 01, 2015 - 5:26 pm
Use the dot option:

%known#1.unknown#1%
[distress];
etc
 Ann Nguyen posted on Thursday, July 02, 2015 - 8:42 am
Thank you, Dr. Muthen.
 Ann Nguyen posted on Thursday, September 10, 2015 - 8:36 am
I am using a manual 3 step LCA approach to predict the effects of demographic covariates (sex & educ) on latent class membership. I'm also interested in understanding how the relationships between the covariates and class membership vary by age. Thus, in step 3 I am using age (3 categories) as a knownclass variable. However, with the syntax below, I end up getting regression results for age by class (e.g., pattern 1 1), but I am only interested in results by age groups. That is, I would like multinomial logistic regression results (similar to those generated from the R3STEP command) for latent classes regressed on covariates for each age group. Could you please advise me how to correct the following syntax to achieve this?

CLASSES = age(3) C(5);
KNOWNCLASS = age (age3cat=1 age3cat=2 age3cat=3); ...

MODEL:
%OVERALL%
c#1-c#4 on sex educ;

MODEL C:
%c#1%
[n#1@2.421]; ...

%c#2%
[n#1@-0.052]; ...

%c#3%
[n#1@0.572]; ...

%c#4%
[n#1@-0.012]; ...

%c#5%
[n#1@-2.722]; ...


MODEL AGE:
%age#1%
c#1-c#4 on sex educ ;

%age#2%
c#1-c#4 on sex educ ;

%age#3%
c#1-c#4 on sex educ ;
 Tihomir Asparouhov posted on Thursday, September 10, 2015 - 4:22 pm
I don't see any problems with your syntax
(except this
%age#3%
c#1-c#4 on sex educ ;
You should remove that - the last class results are what you get as %overall%, while the other classes effects are added to the overall)

Consider User's Guide Table on page 499 - which explains the parameterization.

Also consider Section 4.2 in web note 15
http://statmodel.com/download/webnotes/webnote15.pdf

If you are still in doubt, run each regression separately - using 3 separate runs, one for each age group.
 Dina Dajani posted on Monday, March 07, 2016 - 10:36 am
Hello,

I am using the manual 3-step approach to determine the effect of a latent class variable on a distal outcome (with a covariate). So, my paths are Y on C and Y on X. When I run the first step, I get in my output the average latent class probabilities for the most likely latent class membership, but I do not get the table with the logits for classfication probabilities, which I need to run step 3. Is there a command that I am missing to specify that output?

Thank you,
Dina Dajani
 Bengt O. Muthen posted on Monday, March 07, 2016 - 5:51 pm
I think the output should be there.

Send output to Support along with your license number.
 Dina Dajani posted on Tuesday, March 08, 2016 - 8:19 am
I figured out that I was using an older version of Mplus but when I used Mplus 7.4 I got the logit output. I had a question about whether I am specifying my model correctly. To run the regression described above, my syntax is below. Is it correct?

USEVARIABLES are

Y
X
N;
NOMINAL=N;
MODEL:
%overall%
Y on X;
C#1 on X;
C#2 on X;

%C#1%
[N#1@3.605];
[N#2@-.037];
[Y] (m1);

%C#2%
[N#1@-.589];
[N#2@2.423];
[Y] (m2);

%C#3%
[N#1@-4.535];
[N#2@-3.855];
[Y] (m3);

Model test:
m1=m2;
m1=m3;
m2=m3;
 Bengt O. Muthen posted on Tuesday, March 08, 2016 - 10:06 am
Looks right, but also check against section 3.2 of our BCH paper.
 Dina Dajani posted on Tuesday, March 08, 2016 - 10:37 am
Thank you very much for your quick reply. The issue I am having is with a "singular covariance matrix" and so the Wald test cannot be computed (which is the main result I am interested in). I believe the issue is that there is no variance of some covariates in certain classes. Is there any way to resolve this?

WARNING: THE SAMPLE VARIANCE OF X1 IN CLASS 1 IS 0.000.


WARNING: THE SAMPLE CORRELATION OF X2 AND X1
IN CLASS 3 IS -1.000.

ONE OR MORE MULTINOMIAL LOGIT PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX... THE FOLLOWING PARAMETERS WERE FIXED:
Parameter 11, C#1 ON X1
Parameter 15, C#2 ON X2


THE MODEL ESTIMATION TERMINATED NORMALLY

WALD'S TEST COULD NOT BE COMPUTED BECAUSE OF A SINGULAR COVARIANCE MATRIX.
 Linda K. Muthen posted on Tuesday, March 08, 2016 - 2:07 pm
Please send the output and your license number to support@statmodel.com.
 Ali posted on Wednesday, September 28, 2016 - 3:54 am
I am trying the 3-step manual approach to see the relationship between class membership and 11 countries. However, I have error message" Unknown class label: %C#1%"
Here is my codes:
Names are ST53Q01 ST53Q02 ST53Q03 ST53Q04 SENWGT_S SCHOOLID STIDSTD ST04Q01
ESCS PV1MATH PV2MATH PV3MATH PV4MATH PV5MATH COUNTRY CPROB1 CPROB2 CPROB3 n;
Usevariables are n ;
Classes=CNT(11)c(3);
Nominal is n;
Knownclass = CNT(COUNTRY=1
COUNTRY=2
COUNTRY=3
COUNTRY=4
COUNTRY=5
COUNTRY=6
COUNTRY=7
COUNTRY=8
COUNTRY=9
COUNTRY=10
COUNTRY=11);
WEIGHT=SENWGT_S;
Analysis: Type=Mixture;
Starts=0;
Model:
%overall%
c on CNT;
Model C;
%C#1%
[N#1@1.734];
[N#2@-1.446];
%C#2%
[N#1@-2.435];
[N#2@-1.202];
%C#3%
[N#1@0.289];
[N#2@-2.901];
 Bengt O. Muthen posted on Wednesday, September 28, 2016 - 3:00 pm
You have a typo

Model C;

This line should end with a colon, not a semicolon.
 Ali posted on Thursday, September 29, 2016 - 1:34 am
Thank you! I added a colon after Model C, but it has an error message"*** ERROR in MODEL commandUnknown class model name C specified in C-specific MODEL command."

Here is my codes:
Variable:
Names are ST53Q01 ST53Q02 ST53Q03 ST53Q04 SENWGT_S SCHOOLID STIDSTD ST04Q01
ESCS PV1MATH PV2MATH PV3MATH PV4MATH PV5MATH COUNTRY CPROB1 CPROB2 CPROB3 n;
Usevariables are n COUNTRY ;
Classes=CNT(11)c(3);
Nominal is n;
Knownclass = CNT(COUNTRY=1
COUNTRY=2
COUNTRY=3
COUNTRY=4
COUNTRY=5
COUNTRY=6
COUNTRY=7
COUNTRY=8
COUNTRY=9
COUNTRY=10
COUNTRY=11);
WEIGHT=SENWGT_S;
Analysis: Type=Mixture;
Starts=0;
Model:
%overall%
c on CNT;
Model C:
%C#1%
[N#1@1.734];
[N#2@-1.446];
%C#2%
[N#1@-2.435];
[N#2@-1.202];
%C#3%
[N#1@0.289];
[N#2@-2.901];
 Bengt O. Muthen posted on Thursday, September 29, 2016 - 9:59 am
You say:

Classes=CNT(11)c(3);

Try adding a space before c(3).
 RuoShui posted on Tuesday, January 10, 2017 - 3:32 pm
Dear Dr. Muthen,

May I please confirm with you that the equality tests of means across classes using the BCH procedure were based on Wald's test?

Thank you.
 Bengt O. Muthen posted on Tuesday, January 10, 2017 - 3:47 pm
Yes.
 db40 posted on Monday, November 27, 2017 - 1:22 pm
Dear Dr Muthen,

If I was conduct a 3 class LCA with covariates using (R3STEP) for age and gender, should I expect the estimates etc be the same when conducting the same model manually via the 3 step procedure?

2) also, should I expect the class counts and proportions from the 1st step be the same as the most likely class nominal variable?

thank you
 Bengt O. Muthen posted on Monday, November 27, 2017 - 5:11 pm
Q1: Yes

Q2: No - but the Most Likely Class values should be the same between the first and last steps.
Back to top
Add Your Message Here
Post:
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Password:
Options: Enable HTML code in message
Automatically activate URLs in message
Action: