Mplus Discussion >> Manual 3-step approach

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Manual 3-step approach

Mplus Discussion > Latent Variable Mixture Modeling >

Message/Author

Ann Nguyen posted on Friday, October 17, 2014 - 3:27 pm

I am using the manual three-step latent class analysis approach (Asparouhov & Muthén, 2013) in order to predict a series of auxiliary distal outcomes.

In step 1, I use my entire dataset to estimate the latent class model. Thus, the “most likely class” variable derived in step 2 is based on the entire dataset as well.

In step 3, I use the “most likely class” variable from step 2 to predict multiple distal outcomes. (Step 3 was run separately for each distal outcome.) All except for one of my distal outcome variables includes all respondents in the dataset. The one exception is the distal outcome variable “depression severity.” In order for a respondent in this dataset to have a “depression severity” score, they must first be diagnosed with depression. Only a small proportion of respondents in this dataset have a depression diagnosis. Thus, in step 3, I am only using a subsample (i.e., respondents with a depression diagnosis) of the original dataset to predict “depression severity.”

I am wondering if, in this instance of predicting the “depression severity” distal outcome, I am supposed to use only the ‘depressive subsample’ in step 1 (and subsequently step 2) rather than the entire sample.

Bengt O. Muthen posted on Friday, October 17, 2014 - 4:45 pm

It is better to use the entire sample in Step 1. This is in line with a one-step approach.

For a new distal outcome method called BCH that was introduced in Mplus Version 7.3, see the new Mplus Web Note 21 on our website:

Asparouhov, T. & Muthén B. (2014). Auxiliary variables in mixture modeling: Using the BCH method in Mplus to estimate a distal outcome model and an arbitrary second model. Mplus Web Notes: No. 21.

Ann Nguyen posted on Tuesday, October 21, 2014 - 10:28 am

Thank you for help, Dr. Muthen. As always, I am very impressed with (and appreciate) your and your colleagues' quick response. I have a follow up question for you. If I use the entire sample in step 1, is there a way to request the N for the regression analysis (with "depression severity" as the distal outcome) in step 3. The “SUMMARY OF ANALYSIS” section in step 3's output gives me “Number of observations,” but this appears to be the N for the entire sample and not the depressed subsample, which would have been used for the regression analysis.

Bengt O. Muthen posted on Tuesday, October 21, 2014 - 12:16 pm

If you run the DCON option you get this information.

Ryan Cartnal posted on Monday, December 15, 2014 - 5:02 pm

I’m attempting to use the manual 3 step approach (as presented in webnote 15 and the associated appendices) to examine the relationships among latent class membership and several auxiliary covariates and one categorical distal outcome. 3,940 students are nested within 372 schools; all of my indicators and auxiliaries are categorical. Further, I am interested in using the parametric approach to modeling the latent class random intercepts (UG ex 10.6). I also have several predictors at the within and between levels.
Questions:
(1) For steps one and two of the manual 3 step, should I use TYPE=TW0LEVEL MIXTURE and then gather the resulting misclassification logits for use in the secondary model?
(2) If 1 is correct, should I specify the between level factor (as in UG ex 10.6) before I gather the logits or after as part of my secondary model?
I’ve compared class proportions, BIC, classification logits, etc. using TYPE=COMPLEX, TYPE=TWOLEVEL MIXTURE (without the between level factor) and TYPE=TWOLEVEL MIXTURE with the between level factor. Each method suggests the same number of classes, but the entropy, and classification logits are quite variable across the three approaches. Do you have any advice on how best to use the manual three step process (or another process) in the case of multilevel data?
Thanks in advance for your time and expertise!

Tihomir Asparouhov posted on Tuesday, December 16, 2014 - 2:09 pm

I can recommend only TYPE=COMPLEX MIXTURE. The 3 step methodology has not been developed and used yet for TYPE=TWOLEVEL MIXTURE.

Jen Doty posted on Saturday, January 03, 2015 - 1:59 pm

I have used GMM to identify 3 classes of closeness with mother overtime between generation1 & generation2. I would like to use these classes to predict a distal outcome in the children of generation3, but I have multiple children in families.

Would it be possible to use TYPE = MIXTURE COMPLEX in step 3 of the manual 3-step procedure, or would I need to use only 1 child from each family?

Bengt O. Muthen posted on Saturday, January 03, 2015 - 5:28 pm

Not sure that's implemented; try it. But wouldn't you want to consider similar age categories of children and therefore maybe use only 1 child from each family (and delete families who have no child in a certain category.

db40 posted on Tuesday, May 12, 2015 - 6:25 am

Dear Bengt,

I am somwhat confused. I am looking to estimate a 4class solution which has a number of covariates but also a 6 distal outcomes.

I have been looking over webnote 15 to fimilarise myself with it (manual approach). Then i stumbled upon webnote 21 (Using the BCH method to estimate a distal outcome model...)

Could you clarify which of these procedures I need to estimate this model?

Bengt O. Muthen posted on Tuesday, May 12, 2015 - 7:45 am

Our recommendations are shown in Table 6 and 7 of the BCH paper. The covariates can be handled in an R3STEP run. How to do the distals including covariates is described in a section of the BCH paper.

db40 posted on Tuesday, May 12, 2015 - 8:47 am

Oh thank you. I guess since my variables are categorical I need to use the DCAT option.

db40 posted on Wednesday, May 13, 2015 - 7:24 am

Dear Bengt,

I have a query regarding one of your examples in the Asparouhov & Muthen (2014) paper "...using the BCH Method..."

On page 10 the variable command details 10 variables for step 1.

U1-U8 y x ;

On the second step (page11) there is now five more variables detailed presumably for the BCH weights.

U1-U8 y x w1-w4 MLC ;

I am unable to get the second step to run and I am guessing its because there are now five extra variables which do not match the .dat file since i get this error.

"Unexpected end of file reached in data file."

May I ask if the second step should be calling the .dat file that is saved out in step 1 (named as 2.dat)?

Thomas Olino posted on Thursday, May 14, 2015 - 6:42 am

Drs. Muthen and Asparouhov,

Is there a recommended method for examining latent classes as a predictor of time-until event?

Bengt O. Muthen posted on Thursday, May 14, 2015 - 11:11 am

db40:

Yes, there is a typo - it should say 2.dat.

Tihomir Asparouhov posted on Thursday, May 14, 2015 - 11:42 am

Answer to Thomas Olino:

The recommend method is illustrated with User's Guide example 8.17 and is related to the setting Basehazard=OFF(equal) setting which is actually the default setting. The effect of the latent class is summarized in the class specific mean for the survival variable which also has a mean of zero in the reference/last class. The method originates in

Larsen, K. (2004), “Joint Analysis of Time-to-Event and Multiple Binary Indicators of Latent Classes,” Biometrics,60(1), 85–92.

Ann Nguyen posted on Tuesday, June 30, 2015 - 10:33 am

I am using the manual three-step latent class analysis approach (Asparouhov & Muthén, 2013) in order to predict an auxiliary distal outcome (DISTRESS). My goal is to examine whether the latent classes predict DISTRESS. I understand how to set up this syntax. However, I am also interested in how the relationship between the latent classes and DISTRESS varies across 3 age groups. I think I need to add KNOWNCLASS to the analysis in order to analyze by age groups, but I do not know how to set up the syntax so that I would get auxiliary distal outcome analyses for each age group/”known class.” Is this possible?

Bengt O. Muthen posted on Wednesday, July 01, 2015 - 5:26 pm

Use the dot option:

%known#1.unknown#1%
[distress];
etc

Ann Nguyen posted on Thursday, July 02, 2015 - 8:42 am

Thank you, Dr. Muthen.

Ann Nguyen posted on Thursday, September 10, 2015 - 8:36 am

I am using a manual 3 step LCA approach to predict the effects of demographic covariates (sex & educ) on latent class membership. I'm also interested in understanding how the relationships between the covariates and class membership vary by age. Thus, in step 3 I am using age (3 categories) as a knownclass variable. However, with the syntax below, I end up getting regression results for age by class (e.g., pattern 1 1), but I am only interested in results by age groups. That is, I would like multinomial logistic regression results (similar to those generated from the R3STEP command) for latent classes regressed on covariates for each age group. Could you please advise me how to correct the following syntax to achieve this?

CLASSES = age(3) C(5);
KNOWNCLASS = age (age3cat=1 age3cat=2 age3cat=3); ...

MODEL:
%OVERALL%
c#1-c#4 on sex educ;

MODEL C:
%c#1%
[n#1@2.421]; ...

%c#2%
[n#1@-0.052]; ...

%c#3%
[n#1@0.572]; ...

%c#4%
[n#1@-0.012]; ...

%c#5%
[n#1@-2.722]; ...

MODEL AGE:
%age#1%
c#1-c#4 on sex educ ;

%age#2%
c#1-c#4 on sex educ ;

%age#3%
c#1-c#4 on sex educ ;

Tihomir Asparouhov posted on Thursday, September 10, 2015 - 4:22 pm

I don't see any problems with your syntax
(except this
%age#3%
c#1-c#4 on sex educ ;
You should remove that - the last class results are what you get as %overall%, while the other classes effects are added to the overall)

Consider User's Guide Table on page 499 - which explains the parameterization.

Also consider Section 4.2 in web note 15
http://statmodel.com/download/webnotes/webnote15.pdf

If you are still in doubt, run each regression separately - using 3 separate runs, one for each age group.

Dina Dajani posted on Monday, March 07, 2016 - 10:36 am

Hello,

I am using the manual 3-step approach to determine the effect of a latent class variable on a distal outcome (with a covariate). So, my paths are Y on C and Y on X. When I run the first step, I get in my output the average latent class probabilities for the most likely latent class membership, but I do not get the table with the logits for classfication probabilities, which I need to run step 3. Is there a command that I am missing to specify that output?

Thank you,
Dina Dajani

Bengt O. Muthen posted on Monday, March 07, 2016 - 5:51 pm

I think the output should be there.

Send output to Support along with your license number.

Dina Dajani posted on Tuesday, March 08, 2016 - 8:19 am

I figured out that I was using an older version of Mplus but when I used Mplus 7.4 I got the logit output. I had a question about whether I am specifying my model correctly. To run the regression described above, my syntax is below. Is it correct?

USEVARIABLES are

Y
X
N;
NOMINAL=N;
MODEL:
%overall%
Y on X;
C#1 on X;
C#2 on X;

%C#1%
[N#1@3.605];
[N#2@-.037];
[Y] (m1);

%C#2%
[N#1@-.589];
[N#2@2.423];
[Y] (m2);

%C#3%
[N#1@-4.535];
[N#2@-3.855];
[Y] (m3);

Model test:
m1=m2;
m1=m3;
m2=m3;

Bengt O. Muthen posted on Tuesday, March 08, 2016 - 10:06 am

Looks right, but also check against section 3.2 of our BCH paper.

Dina Dajani posted on Tuesday, March 08, 2016 - 10:37 am

Thank you very much for your quick reply. The issue I am having is with a "singular covariance matrix" and so the Wald test cannot be computed (which is the main result I am interested in). I believe the issue is that there is no variance of some covariates in certain classes. Is there any way to resolve this?

WARNING: THE SAMPLE VARIANCE OF X1 IN CLASS 1 IS 0.000.

WARNING: THE SAMPLE CORRELATION OF X2 AND X1
IN CLASS 3 IS -1.000.

ONE OR MORE MULTINOMIAL LOGIT PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX... THE FOLLOWING PARAMETERS WERE FIXED:
Parameter 11, C#1 ON X1
Parameter 15, C#2 ON X2

THE MODEL ESTIMATION TERMINATED NORMALLY

WALD'S TEST COULD NOT BE COMPUTED BECAUSE OF A SINGULAR COVARIANCE MATRIX.

Linda K. Muthen posted on Tuesday, March 08, 2016 - 2:07 pm

Please send the output and your license number to support@statmodel.com.

Ali posted on Wednesday, September 28, 2016 - 3:54 am

I am trying the 3-step manual approach to see the relationship between class membership and 11 countries. However, I have error message" Unknown class label: %C#1%"
Here is my codes:
Names are ST53Q01 ST53Q02 ST53Q03 ST53Q04 SENWGT_S SCHOOLID STIDSTD ST04Q01
ESCS PV1MATH PV2MATH PV3MATH PV4MATH PV5MATH COUNTRY CPROB1 CPROB2 CPROB3 n;
Usevariables are n ;
Classes=CNT(11)c(3);
Nominal is n;
Knownclass = CNT(COUNTRY=1
COUNTRY=2
COUNTRY=3
COUNTRY=4
COUNTRY=5
COUNTRY=6
COUNTRY=7
COUNTRY=8
COUNTRY=9
COUNTRY=10
COUNTRY=11);
WEIGHT=SENWGT_S;
Analysis: Type=Mixture;
Starts=0;
Model:
%overall%
c on CNT;
Model C;
%C#1%
[N#1@1.734];
[N#2@-1.446];
%C#2%
[N#1@-2.435];
[N#2@-1.202];
%C#3%
[N#1@0.289];
[N#2@-2.901];

Bengt O. Muthen posted on Wednesday, September 28, 2016 - 3:00 pm

You have a typo

Model C;

This line should end with a colon, not a semicolon.

Ali posted on Thursday, September 29, 2016 - 1:34 am

Thank you! I added a colon after Model C, but it has an error message"*** ERROR in MODEL commandUnknown class model name C specified in C-specific MODEL command."

Here is my codes:
Variable:
Names are ST53Q01 ST53Q02 ST53Q03 ST53Q04 SENWGT_S SCHOOLID STIDSTD ST04Q01
ESCS PV1MATH PV2MATH PV3MATH PV4MATH PV5MATH COUNTRY CPROB1 CPROB2 CPROB3 n;
Usevariables are n COUNTRY ;
Classes=CNT(11)c(3);
Nominal is n;
Knownclass = CNT(COUNTRY=1
COUNTRY=2
COUNTRY=3
COUNTRY=4
COUNTRY=5
COUNTRY=6
COUNTRY=7
COUNTRY=8
COUNTRY=9
COUNTRY=10
COUNTRY=11);
WEIGHT=SENWGT_S;
Analysis: Type=Mixture;
Starts=0;
Model:
%overall%
c on CNT;
Model C:
%C#1%
[N#1@1.734];
[N#2@-1.446];
%C#2%
[N#1@-2.435];
[N#2@-1.202];
%C#3%
[N#1@0.289];
[N#2@-2.901];

Bengt O. Muthen posted on Thursday, September 29, 2016 - 9:59 am

You say:

Classes=CNT(11)c(3);

Try adding a space before c(3).

RuoShui posted on Tuesday, January 10, 2017 - 3:32 pm

Dear Dr. Muthen,

May I please confirm with you that the equality tests of means across classes using the BCH procedure were based on Wald's test?

Thank you.

Bengt O. Muthen posted on Tuesday, January 10, 2017 - 3:47 pm

Yes.

db40 posted on Monday, November 27, 2017 - 1:22 pm

Dear Dr Muthen,

If I was conduct a 3 class LCA with covariates using (R3STEP) for age and gender, should I expect the estimates etc be the same when conducting the same model manually via the 3 step procedure?

2) also, should I expect the class counts and proportions from the 1st step be the same as the most likely class nominal variable?

thank you

Bengt O. Muthen posted on Monday, November 27, 2017 - 5:11 pm

Q1: Yes

Q2: No - but the Most Likely Class values should be the same between the first and last steps.

nidhi gupta posted on Wednesday, April 04, 2018 - 6:35 am

Dear Dr Muthen
I am performing a manual 3step approach where i am using the suggested webnote by T Asparouhov - ‎2014.
i want to get regression results at step 3 stratified for one variable. however when i tried modelling, i get following error
"External training variables are not supported when KNOWNCLASS option is used."

these are my codes

VARIABLE: NAMES are V1-V8 a g b f w s d al i mi W1-W4 C1-C4 m;
Usevar are a g b s d al W1 W2 W3 W4;
Classes = C(4) cg(2);
KNOWNCLASS=cg(mi=0, mi=1);
Missing=*;
Training=W1, W2, W3, W4(bch);
Data: file=step2.dat;
Analysis:
OPTSEED = 851945;
TYPE = MIXTURE;
ESTIMATOR = MLR;
Model:
%overall%
C on a g s d al ;
b on a g s d al ;
%C#1%
b on a g s d al;
%C#2%
b on a g s d al;
%C#3%
b on a g s d al;
%C#4%
b on a g s d al;

Do you please help me?

Bengt O. Muthen posted on Wednesday, April 04, 2018 - 3:46 pm

Then you may need to represent the groups using dummy variable covariates.

Shin, Tacksoo posted on Wednesday, April 11, 2018 - 9:07 am

Dear Dr. Muthen,

Hello,

I have two latent class variables (i.e., c1#1, c1#2, c1#3; c2#1, c2#2, c2#3) and a continuous distal outcome. Then, try to test the effects (main or possibly interaction?) of two class variables on an outcome.

Do you have any suggestion for this analysis? Although I have done BCH, DCON, 3-step analysis (manual step estimation), etc, I couldn't get what I want.

Always thanks for your help.

Warmly,
Tacksoo

Bengt O. Muthen posted on Wednesday, April 11, 2018 - 4:27 pm

The effects of the latent class variables is seen in the class-varying intercepts of the distal outcomes.

Shin, Tacksoo posted on Wednesday, April 11, 2018 - 5:33 pm

Thank you for quick reply. By the way, I already checked the class varying intercepts of the distal outcomes. Interestingly, I got an error message when I simultaneously tested the effects of both variables.

My code is;

MODEL C1:
%C1#1%
[score](a1); [N1#1@2.9]; [N1#2@-0.7];
%C1#2%
[score](a2); [N1#1@11.8]; [N1#2@13.8];
%C1#3%
[score](a3); [N1#1@-2.4]; [N1#2@-13.7];

MODEL C2:
%C2#1%
[score] (a4); [N2#1@3.1]; [N2#2@1.4];
%C2#2%
[score] (a5); [N2#1@-0.04]; [N2#2@2.6];
%C2#3%
[score] (a6); [N2#1@-3.8]; [N2#2@-1.9];

MODEL TEST:
0 = a1-a2; 0 = a1-a3;
0= a4-a5; 0= a4-a6;

I had the result of "score on C1", if I removed some lines regarding "C2" (i.e., [score] (a4), [score] (a5), [score] (a6), 0=a4-a5, 0=a4-a6) . But, when I tried to run the whole, there was an error message "Unknown parameter label in MODEL TEST: A1". Or, just run separately and report results from two different ones?

Bengt O. Muthen posted on Thursday, April 12, 2018 - 12:27 pm

Send the output for your run with the error message so we can see what you did wrong - send to Support along with your license number.

Olivenne Skinner posted on Friday, May 31, 2019 - 5:52 am

Hello:

I created profiles of families based on parents' characteristics and now I am interested in examining child outcomes. We have siblings in the data so the outcome variables are nested.

Is there a way to use the BCH method to account for nesting in the dependent variables even though this is not relevant for creating the profiles?

Tihomir Asparouhov posted on Friday, May 31, 2019 - 9:35 am

You can use type=complex mixture with the manual BCH to account for the nesting of observations in the second stage estimation using cluster=familyID.

Isabella Lanza posted on Tuesday, October 15, 2019 - 8:01 am

I have used the 3-step manual approach (ml) before for a LCA model with covariates and a distal outcome, but I have a question about whether I can use the BCH manual method as well for a particular question.

If I only have binary covariates (distal outcomes are continuous), would BCH be ok to use? It's unclear whether BCH is recommended for binary vs. continuous covariates.

Thank you.

Bengt O. Muthen posted on Wednesday, October 16, 2019 - 9:43 am

Yes, BCH is ok to use here. The scale of the covariates is not relevant for that.

Livia S. posted on Thursday, February 27, 2020 - 3:37 am

Dear Drs. Muthen and Asparouhov,

Is the R3STEP approach applicable to a class-invariant (CI) growth mixture model? I cannot find examples, mostly are on GMM-CV.

My thought is that only the manual one is possible for GMM-CIs. This is because, after running the unconditional GMM-CI, the uncertainty rates obtained need to be added in the third step.

Is that correct? Otherwhise, which syntax should I add to the R3STEP to specify that my model has class invariance?

Thank you.
Yours Sincerely,
L.

Bengt O. Muthen posted on Thursday, February 27, 2020 - 11:43 am

What is class-invariant GMM?

Livia S. posted on Thursday, February 27, 2020 - 2:01 pm

Dear Dr. Muthen,

Sorry, I forgot to mention I am using an approach suggested in the book "HIGHER-ORDER GROWTH CURVES AND MIXTURE MODELING WITH MPLUS" by Wickrama et al. (2016).

"The second approach is a GMM-CI, where variances and covariances are constrained to be the same across all classes (class-invariant variances and covariances). This is the default model in Mplus. The other approach is a GMM-CV, where variances and
covariances are freed to be estimated for all classes (class-varying variances and covariances)." (p.225)

Hope this makes my question clearer.
Best regards,
L.

Bengt O. Muthen posted on Thursday, February 27, 2020 - 5:36 pm

I don't see off-hand why R3STEP would be problematic for GMM-CI. The GMM-CI is the first step and is also used in the last step. You can specify that model.

shonnslc posted on Monday, May 18, 2020 - 10:54 pm

Hi
I am trying the 3-step approach for the arbitrary secondary model. I have some questions:

1. I used Appendix D in Web Notes #15 to generate the data. However, in step 1, "Classification Probabilities for the Most Likely Latent Class Membership (Column) by Latent Class (Row)" in the Mplus output do not match the number on p. 18. This is what I had:

.830 .046 .124
.072 .811 .117
.099 .094 .807

2. Can I directly use the values in the 3x3 matrix of the "Logits for the Classification Probabilities for the Most Likely Latent Class Membership (Column)by Latent Class (Row)" in the Mplus output instead of computing the measurement error based on the formula on p. 4?

3. When I used the formula on p.4 to compute the log ratios for each class c (p. 17) and probabilities provided in the above. The values do not match what is provided in Mplus output:

N1 = 346; N2 = 306; N3 = 348
q11 =.836;log(q11/q31)= 2.12
q21 =.064;log(q21/q31)= -.45
q31 =.100
q12 =.053;log(q12/q32)= -.72
q22 =.836;log(q22/q32)= 2.03
q32 =.110
q13 =.119;log(q13/q33)= -1.88
q23 =.100;log(q23/q33)= -2.06
q33 =.781

4. I found that there is still shift in number of people in each class:
Step 1
N1: 346
N2: 306
N3: 348

Step 3
N1: 340
N2: 304
N3: 356

Thank you very much!!

Tihomir Asparouhov posted on Tuesday, May 19, 2020 - 3:50 pm

1. You are getting the middle section of Table 1. To get the top section of Table 1 use
output:ppclass;

2. Yes

3. See the footnotes on page 7
https://www.statmodel.com/download/3stepOct28.pdf
This explains which table to use at which step. You started with the wrong Table. You would have to start with the Table that you get with
output:ppclass;

2. Small shifts are expected.

mboer posted on Wednesday, May 20, 2020 - 2:20 am

Dear Prof. Muthen,

I have estimated an unconditional 3-class GMM-ZIP model with intercept variance. Next, I would like to predict class membership (N) with gender using the manual R3step approach. Please find my syntax below.

variable: NAMES =
IGD_A IGD_B IGD_C IGD_D RESP FEMALE
I S II SI C_I C_S C_II C_SI
CPROB1 CPROB2 CPROB3 N;
analysis: TYPE=MIXTURE;
model: %OVERALL%
C#1 C#2 ON FEMALE;
%C#1%
[N#1@1.800];
[N#2@0.395];
%C#2%
[N#1@-1.332];
[N#2@-0.016];
%C#3%
[N#1@-8.098];
[N#2@-3.164];

The output generated from this syntax shows that the univariate counts for N match the final class counts as observed in the first step (i.e. the GMM ZIP model). However, the final class counts as observed in the third step are different from the final class counts in the first step. Why do these final class counts differ? Or did I perhaps miss-specify the third step / is this approach not appropriate for random intercept models?

Bengt O. Muthen posted on Wednesday, May 20, 2020 - 4:58 pm

We need to see the outputs from all 3 steps - send to Support along with your license number.