Mplus Discussion >> Manual 3-step approach with auxiliary model

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Manual 3-step approach with auxiliary...

Mplus Discussion > Latent Variable Mixture Modeling >

Message/Author

Lewina Lee posted on Wednesday, October 31, 2012 - 4:09 pm

Dear Drs. Muthen,

I would like to do (1) an LPA of 21 continuous variables, and (2)test class membership in relations to 9 covariates (x's) and 2 distal outcomes (y's). I intend to use the procedure for manually implementing the 3-step approach described in M+ webnotes #15 v5.

In the 3-step approach, given that the measurement model is estimated independent of the auxiliary variables, does it make sense to proceed in the following manner?

1. Do class enumeration in Step 1 (e.g., do Step 1 with 1 - 8 classes) to identify the best one or two models while specifying AUXILIARY = x1, x2,..x9, y1, y2.

2. Do Step 2 (calculating measurement error for most likely class variable) for the best one or two models identified in class enumeration.

3. Do Step 3 (estimating the auxiliary model while specifying the latent class model with measurement errors obtained in Step 2) for the best one or two models from class enumeration.

Can latent class membership (C) be regressed on covariates?

Can distal outcomes be regressed on class membership in Step 3?

Is it accurate to say that class membership will not shift regardless of modifications in the auxiliary model (e.g., adding/removing covariates and distal outcomes)?

Thank you,
Lewina

Lewina Lee posted on Wednesday, October 31, 2012 - 4:18 pm

One more short question in addition to the above:

In M+ webnotes #15 v5 Appendix F, Step 3 of the manually-implemented 3-step model is specified with STARTS=0, are users supposed to follow that in actual analyses?

(In Step 1, the authors noted that STARTS=0 was only specified to retain the order of classes as in the data generation step, and that users should remove that in actual analyses.)

Thank you,
Lewina

Bengt O. Muthen posted on Wednesday, October 31, 2012 - 5:24 pm

Regarding your questions

"Can latent class membership (C) be regressed on covariates?

Can distal outcomes be regressed on class membership in Step 3?"

- you can do that automatically using R3STEP and DU3STEP, but you can also do it manually. The class membership will not change due to the auxiliaries.

For Step 3, STARTS=0 should be used because the class membership is essentially known.

Lewina Lee posted on Thursday, November 01, 2012 - 12:01 pm

Thank you for your quick response, Dr. Muthen.

In Step 3, if any of the distal outcomes and/or covariates are binary, do I need to specify that in a "CATEGORICAL ARE x1 x2 y1" statement?

Bengt O. Muthen posted on Thursday, November 01, 2012 - 9:23 pm

No, that is not available. They are treated as continuous so for binary distals you will get proportions.

Lewina Lee posted on Wednesday, November 07, 2012 - 8:15 pm

Dr. Muthen,

Regarding my question on 11/1/2012 - 12:01pm on using the CATEGORICAL ARE statement on distal outcomes -- could you please clarify what you meant by "not available"?

I tried Step 3 of the manual 3-step approach by specifying:
MODEL:
%OVERALL%
Y on x1 x2 x3;
c on x1 x2 x3;
%C#1%
[N#1@ 5.04]; [N#2@ 2.38];
%C#2%
[N#1@ 0.53]; [N#2@ 3.28];
Y on x1 x2 x3;
%C#3%
[N#1@ -4.42]; [N#2@ -2.64];
Y on x1 x2 x3;

I tried it with vs. without the "CATEGORICAL = Y" statement. The regression results (p-values) for Y on X1-X3 are comparable in both cases. I see that I got an intercept for Y in each class when Y was modeled as continuous, as opposed to a threshold.

Could you please help me understand why I could not use "CATEGORICAL ARE" with binary distals?

What do I need to do to obtain P(distal=1), or odds of distal=1 in one class versus another class?

Thank you very much for all your help.

Lewina

Linda K. Muthen posted on Thursday, November 08, 2012 - 9:51 am

All variables on the AUXILIARY list are treated as continuous variables for the AUXILIARY functions whether they are on the CATEGORICAL list or not.

Lewina Lee posted on Thursday, November 08, 2012 - 1:42 pm

If I am doing the manual 3-step approach, I do not need to use the AUXILIARY= statement according to WebNote 15. (I only need to use AUXILIARY= in the automatic 3 step approach, e.g., using DU3STEP). Does that mean, in Step 3 of the manual 3-step approach, I can specify covariates and outcomes with CATEGORICAL= ?

Because when I used CATEGORICAL= at Step 3 of the manual 3-step approach with a binary outcome, I was able to get an output on "LOGISTIC REGRESSION ODDS RATIO RESULTS." I just want to verify that this is ok.

Thank you,
Lewina

Linda K. Muthen posted on Thursday, November 08, 2012 - 2:05 pm

Yes, it is. But you should not put any observed exogenous covariates on the CATEGORICAL list. This list is for dependent variables only.

Lewina Lee posted on Thursday, November 08, 2012 - 4:32 pm

Thank you for the clarification, Linda!

Scott Weaver posted on Tuesday, January 15, 2013 - 6:25 pm

I am interested in testing whether means on a set of distal outcomes differ across growth trajectory classes (GMM), controlling for a set of covariates. The covariates have direct effects on growth factor means (class indicators) and the class indicators have direct effects on the outcomes within class (constrained equal across class). I used a one-step approach , but a reviewer suggested a 3-step approach.
Two questions:
(1) Can I test whether *adjusted* means for the distal outcomes differ between classes with a manual 3-step approach, and
(2) Given the direct effects of covariates on class indicators (and class indicator effects on the distal outcomes) with entropy =.63 (obtained from the 1-step final model), would the 1-step approach be better suited than the 3-step approach based on simulation results in webnote 15. Thanks!

Tihomir Asparouhov posted on Wednesday, January 16, 2013 - 9:32 am

Scott

I don't think it is possible to do a 3-step approach for this model because you have a
"class indicator effects on the distal outcomes". Since the class indicators are latent variables in stage 3 you cant use them (these latent variables, the growth factors, are measured and created in stage 1 only so they wont be available in stage 3).

Tihomir

cogdev posted on Monday, January 28, 2013 - 7:24 pm

I would like to use latent class membership from one series of indicators (along with a few other continuous covariates), to predict latent profile membership derived from a separate series of indicators. Clustering independently is theoretically important (separate domains), which is why the 3-step procedure is appealing.

I can manually run the 3-step procedure separately for each latent class analysis (at least up to the 2nd step), to get the misclassification stats for each one. A 4-profile/class solution fits best in both cases.

Something along the lines of Ex7.14 appears to be close to what I need, except that I have a directed prediction from theory (actually more similar to Ex7.19, with a separate clustering variable instead of the factor), and a number of continuous covariates.

So, what type of specification am I dealing with here, and how can I implement it (the auxiliary option doesn't seem designed to support this)?

I can imagine that an analysis with categorical misclassification might be probability-based/fuzzy, or might need some sort of MCMC sampling? As a fall back, entropy is high (>.90) in both cases, so I guess I could 'hard-code' most likely cluster membership and run something like a multinomial logistic regression with covariates?

Any help or direction here would be greatly appreciated.

Linda K. Muthen posted on Tuesday, January 29, 2013 - 9:51 am

With entropy of .9 or greater, you can use most likely class membership. You do not need the 3-step procedure.

Jonathan Larson posted on Tuesday, July 30, 2013 - 1:45 pm

We are using the manual 3-step method to test predictors in a latent transition analysis with three time-points. Changing the predictors changes our class sizes, particularly for the third time-point. Do you know why this is happening?

Bengt O. Muthen posted on Tuesday, July 30, 2013 - 1:54 pm

Are you following the approach in Web Note 15 shown in Appendices L, M, N, O?

Jonathan Larson posted on Wednesday, July 31, 2013 - 8:46 am

We have measurement non-invariance, so each LCA is estimated separately. The nominal most likely class variable is obtained from each LCA estimation, without constraining any of the item thresholds. Other than that, we are following the approach in Web Note 15 shown in Appendices L, M, N, and O.

Bengt O. Muthen posted on Thursday, August 01, 2013 - 8:27 am

If you don't constrain item thresholds all bets are off for keeping the same class formation.

Jonathan Larson posted on Thursday, August 01, 2013 - 10:57 am

You constrained the item thresholds in your individual LCAs so that they would be the same as the LCAs from the initial LTA, which had measurement invariance. You then used the most likely class variable from the individual LCAs to run a second LTA, this time with predictors. When you change your predictors, do the class sizes change?

This is the problem we are experiencing. We don't need to constrain item thresholds in our individual LCAs because they don't need to be the same as the LCAs from an initial LTA, because we don't want measurement invariance. The class sizes are changing considerably when we use the most likely class variable to run an LTA with predictors. We aren't changing the most likely class variable, we're just changing the predictors. Does that make sense?

Bengt O. Muthen posted on Thursday, August 01, 2013 - 11:17 am

I actually meant to refer to fixing the nominal parameters (not the threshold invariance) - I assume you are doing that fixing in the 3rd step. So you are following appendices H, I, J. Those don't have a covariate. You could do your analyses on the Appendix K generated data which include x and follow the H-I-J steps to see what happens. I don't think we have explored that.

Jonathan Larson posted on Thursday, August 01, 2013 - 12:59 pm

That's correct, we're fixing the nominal most likely class variable in the 3rd step.

You think we should generate data with measurement invariance and a covariate, estimate LCAs without assuming measurement invariance, and then test the influence of that same covariate on our class solutions? Wouldn't that illuminate the consequences of not assuming measurement invariance when it exists?

We would like to test the predictive strength of variables without changing the class solutions, like in r3step. It might be useful to generate data with two covariates and then test each independently to see if the class sizes change. However, we already have data and multiple covariates, and we already know the class sizes are changing. Why would this happen when we've fixed the probability of being in one class versus another?

Bengt O. Muthen posted on Thursday, August 01, 2013 - 1:25 pm

On your first question, I think it a useful exercise to make sure that class formation doesn't change in this simple case.

On your second question, please send inputs, outputs, data, and license number to support so we can diagnose it.

Jonathan Larson posted on Thursday, August 01, 2013 - 1:29 pm

Will do!

Bengt O. Muthen posted on Thursday, August 01, 2013 - 3:11 pm

I see that you are using multiple imputation data. There seems to be a lot of variation across imputations given those huge SEs. A first step would be to analyze only one of those data sets.

Also, which variables are imputed - the x's? So that there is no missing on the latent class indicators and the nominal logits don't vary across imputations.

Laia Becares posted on Friday, April 18, 2014 - 10:37 am

I am running a manual 3-step approach with auxiliary variables following the web note 15, version 7. I have some missing data on my y variable, and I am using weights. When I run the syntax for step 3, I get the following error message:
Invalid symbol in data file: "*" at record #: 2, field #: 44
Below is the code, what am I doing wrong?
NAMES ARE (...);
MISSING ARE all (-9999);
USEVARIABLES ARE C1_7FP0 RACE3 SEX 7CONCPT N;
CLASSES = c(4); NOMINAL = N;
CATEGORICAL ARE RACE3 SEX; WEIGHT IS C1_7FP0;
Analysis: TYPE = mixture; ESTIMATOR = MLR;
STARTS = 600 120; PROCESSORS = 4(STARTS);
Model:
%overall%
C7CONCPT on RACE3 SEX;
%C#1%
[N#1@5.010]; [N#2@-8.791]; [N#3@0.161];
C7CONCPT on RACE3 SEX; C7CONCPT;
%C#2%
[N#1@-9.378]; [N#2@4.421]; [N#3@-0.889];
C7CONCPT on RACE3 SEX; C7CONCPT;
%C#3%
[N#1@1.031]; [N#2@-0.496]; [N#3@4.920];
C7CONCPT on RACE3 SEX; C7CONCPT;
%C#4%
[N#1@-3.593]; [N#2@-3.888];[N#3@-4.885];
C7CONCPT on RACE3 SEX; C7CONCPT;
Many thanks in advance

Laia Becares posted on Friday, April 18, 2014 - 2:15 pm

Hello, I figured out my previous question, but now I have come across another error message (same syntax as above):
*** ERROR in MODEL command
Unknown threshold for NOMINAL variable N: N#2
*** ERROR in MODEL command
Unknown threshold for NOMINAL variable N: N#3
*** ERROR in MODEL command
Unknown threshold for NOMINAL variable N: N#2
*** ERROR in MODEL command
Unknown threshold for NOMINAL variable N: N#3
*** ERROR in MODEL command
Unknown threshold for NOMINAL variable N: N#2
*** ERROR in MODEL command
Unknown threshold for NOMINAL variable N: N#3
*** ERROR in MODEL command
Unknown threshold for NOMINAL variable N: N#2
*** ERROR in MODEL command
Unknown threshold for NOMINAL variable N: N#3
*** ERROR The following MODEL statements are ignored:
* Statements in Class 1:
[ N#2 ]
[ N#3 ]
* Statements in Class 2:
[ N#2 ]
[ N#3 ]
* Statements in Class 3:
[ N#2 ]
[ N#3 ]
* Statements in Class 4:
[ N#2 ]
[ N#3 ]
*** ERROR
One or more MODEL statements were ignored. These statements may be incorrect or are only supported by ALGORITHM=INTEGRATION.

I took the parameters from the Logits for the Classication Probabilities table from step 1. What is producing the error?
Thank you

Laia Becares posted on Friday, April 18, 2014 - 2:24 pm

P.S: When I add ALGORITHM=INTEGRATION I still get the error message.

Thank you (and apologies for all the postings).

Linda K. Muthen posted on Friday, April 18, 2014 - 3:48 pm

In the future, please limit posts to one window. If you require more space it is not an appropriate question for Mplus Discussion.

Please send your output and license number to support@statmodel.com.

matteo giletta posted on Friday, April 25, 2014 - 9:05 am

Dear Drs. Muthen,

I would like to use the 3-step approach to examine the effects of some covariates on latent trajectory classes. Because I have censored data, I cannot use the auxiliary function (R3STEP), as it does not run with algorithm = integration. I would like to use the manual 3-step approach but I do not know how to obtain the logits for the classification probabilities. In my output I only have the average latent class probabilities. How could I obtain classification probabilities or logits in growth mixture models?

Thank you so much!

Linda K. Muthen posted on Saturday, April 26, 2014 - 10:52 am

All covariates are treated as continuous so this is not a problem.

Download Version 7.11 to obtain the values you want.

matteo giletta posted on Monday, April 28, 2014 - 6:04 am

Thank you so much! I downloaded version 7.11 and got the logits.

Dustin Pardini posted on Thursday, September 18, 2014 - 2:50 pm

I am running a model predicting continuous and categorical outcomes using a latent class variable and several covariates based on the manual 3 step procedure. I am freely estimating the means and thresholds of the Dvs for each class. I was wondering whether the means and thresholds of the DVs for each class are estimated values based on the mean value of all covariates in the model as a default or do I have to mean center them. We want to present the estimated means in a table for each class, but are unsure how they should be interpreted.

Thanks.

Bengt O. Muthen posted on Friday, September 19, 2014 - 6:01 am

The means of the covariates have an influence if the covariates have direct effects on the outcomes, in which case you can center them.

Brooke Magnus posted on Friday, September 19, 2014 - 8:25 am

Hi,

We are interested in using latent class as a predictor of a distal binary outcome (using the manual version of the 3-step approach). Specifically, in step 3 we want to test whether the thresholds for the binary outcome differ between our two classes. As far as we understand, we can do this with a Wald test using the Model Test statement -- that's what we did.

In our first model, we use only latent class as a predictor. We get the Wald test as requested, but we also get an odds ratio and its corresponding significance test, which is exactly what we want. In our second model, we use latent class and some covariates as predictors of the outcome. When we include the covariates, the odds ratio we are interested in is no longer provided, just the requested Wald test. Would it be accurate to compute an odds ratio ourselves using the covariate-adjusted thresholds from the output? Or if not, is there another way we can get this information from Mplus? We would like to be able to present the degree to which the odds ratio changes after taking into account the effects of the covariates on the outcome.

Thank you very much!

Bengt O. Muthen posted on Saturday, September 20, 2014 - 2:28 pm

For both of your models you can express the odds ratio in Model Constraint using parameter labels from Model. That also gives you SEs so you can get a test. With covariates the odds ratio would be based on only the thresholds as you say.

Brooke Magnus posted on Monday, September 22, 2014 - 8:24 am

Thank you very much for your response, that was very helpful.

From that, I have a follow-up question. I'm noticing that when I use Model Test to test threshold #2 = threshold #1 and compare that result to the "Latent Class Odds Ratio Results" that Mplus outputs automatically, or to the OR result from Model Constraint, I get quite different significance test results. The two OR significance tests are identical, but the test for the difference in thresholds from Model Test is quite different from the test of the OR. The only thought I've had is that the OR is being tested against a null value of 0 and not 1, but I'm not sure. I'm hoping someone can shed some light on my confusion.

Thanks!

Bengt O. Muthen posted on Monday, September 22, 2014 - 12:51 pm

You are right that the printed OR significance testing is the usual Mplus ratio:

(Est - 0)/ SE(Est)

With ORs, the relevant ratio is instead

(Est-1)/SE(Est)

so you have to do that by hand.

Related to your testing you might be interested in the paper on our website

Muth�n, B., Brown, C.H., Masyn, K., Jo, B., Khoo, S.T., Yang, C.C., Wang, C.P., Kellam, S., Carlin, J., & Liao, J. (2002). General growth mixture modeling for randomized preventive interventions. Biostatistics, 3, 459-475.

Section 3.2 deals with thresholds for a binary distal outcome.

Chris Kenaszchuk posted on Thursday, January 08, 2015 - 2:45 pm

Hi. How do I produce the table, "Logits for the Classification Probabilities for Most Likely Latent Class Membership (Column) by Latent Class (Row)"?

Bengt O. Muthen posted on Thursday, January 08, 2015 - 7:07 pm

That is included in the output since I think Version 7.

Chris Kenaszchuk posted on Friday, January 09, 2015 - 11:07 am

Thanks. I have Mplus Version 7.2. My output does not have the table I asked about, logits. It does not have the table, "Classification Probabilities for the Most Likely Latent Class Membership (Row) by Latent Class (Column)". The output does have the table, "Average Latent Class Probabilities..."

The output is from nonparametric two-level mixture models that have categorical latent variables on both the within and between levels. These were demonstrated in Henry & Muthen, 2010, "Multilevel Latent Class Analysis: An Application...", Structural Equation Modeling, vol 17.

The output states that Auxiliary= is not available for R3STEP, DCON, etc., when there are latent variables on the between and within levels.

The desired model is: (1) use Time 1 indicators to derive latent classes, (2) regress a Time 2 distal outcome on Time 1 class membership, (3) use the Time 1 version of the Time 2 distal outcome as a latent class covariate/auxiliary.

Can this be done with the manual 3-step method if I have the table of logits (or table of classification probabilities to calculate the logits)? Thanks.

Tihomir Asparouhov posted on Friday, January 09, 2015 - 10:44 pm

The 3 step methodology has not been developed and used yet for TYPE=TWOLEVEL MIXTURE.

Laura posted on Tuesday, April 07, 2015 - 6:43 am

Hi,

It is said previously in this page that

"With entropy of .9 or greater, you can use most likely class membership.
You do not need the 3-step procedure."

Does this also apply to situations when analysing the predictors of latent classes in LCGA (using multinomial logistic regression, for example)?

Is it possible to use the R3step method with the Mplus version 7, or do I need a newer version for that? I tried out an analysis with the auxiliary(r3step) in LCGA. The results were completely different to those obtained with auxiliary(r), and the standard errors were zeros for many of the predictors.

Thank you in advance!

Bengt O. Muthen posted on Tuesday, April 07, 2015 - 7:39 am

Q1. Yes.

Q2. If you don't get stopped when mentioning the R3STEP option then it is available. But you should always use the latest Mplus version. The tables as the end of our BCH paper explains that aux(r) is superseded by aux(r3step).

Laura posted on Wednesday, April 08, 2015 - 9:01 am

Ok, thank you. Is there any reference for using the most likely latent class when the entropy is high (over 0.9)? At least in Clark & Muthen (2009) this was compared to some other methods.

Bengt O. Muthen posted on Wednesday, April 08, 2015 - 10:17 am

This reference should do it:

Asparouhov, T. & Muth�n, B. (2014). Auxiliary variables in mixture modeling: Three-step approaches using Mplus. Structural Equation Modeling: A Multidisciplinary Journal, 21:3, 329-341. The posted version corrects several typos in the published version. An earlier version of this paper was posted as web note 15.

db40 posted on Friday, May 15, 2015 - 6:16 am

Dear Bengt,

might I add to Lauras question. The entropy of my latent class is 0.85. Would you consider this high enough so I dont have to use the 3step procedure?

Jon Heron posted on Friday, May 15, 2015 - 11:50 am

Well I have a paper in press showing bias even up to entropy of 0.9 and I'm sure you could simulate the odd bizarre example where some estimates were biased at even higher values of entropy depending on the class distribution.

I spent the best part of ten years avoiding the one-step model but I'm quite amenable to it now.

Bengt O. Muthen posted on Friday, May 15, 2015 - 5:54 pm

In many cases we found that 0.85 is probably high enough, but as Jon says you cannot be sure. You should read his new paper.

Raghav Ramachandran posted on Tuesday, May 19, 2015 - 9:18 am

Drs. Muthen,

I am trying to implement a 3-step LTA model to ensure that the latent class variable measurements are not affected by inclusion of covariates in the model. In order to do so, I am using MPlus Web Note 15: Appendix K-N and the Nylund-Gibson paper titled "A Latent Transition Mixture Model Using the Three-Step Specification".

I ran the LTA model with measurement invariance and separately calculated the most-likely class variables N1 and N2 for the Step 1 LCA at each of my two time points separately. However, I do not see Step 1 for the overall LTA in either the web note or the paper.

I am not clear on how to obtain the nominal variable N1, N2 thresholds for latent class variables C1 and C2 in the LTA. Would I use the "Logits for the Classification Probabilities for the Most Likely Latent Class Membership" in the output of step 1 for C1 and C2 (Appendix L and M) or will these values change when running the LTA?

Thanks,
Raghav

Bengt O. Muthen posted on Tuesday, May 19, 2015 - 6:26 pm

Answer to your paragraph 2: Use appendices L, M, and N. There is no "overall LTA".

Answer to your last paragraph: Yes, you would use those logits.

Alysia Blandon posted on Wednesday, May 20, 2015 - 8:10 am

I am trying to do the 3rd step of the manual 3-step approach for a LPA with a distal outcome and I have a couple of questions:

(1) I can't figure out the auxiliary model input that I need to include so that I get the same output that I would have gotten using the DU3STEP for a distal outcome (which I can't use because of missing data on covariates having to use integration = montecarlo).

(2) If I included covariates of the classes in Step 1, do I still include those covariates in the overall model statement in step 3?

Thanks!

Tihomir Asparouhov posted on Wednesday, May 20, 2015 - 12:39 pm

1) Take Appendix E and remove "Y on X"
http://statmodel.com/download/AppendicesOct28.pdf

2) No

John Woo posted on Wednesday, August 12, 2015 - 12:38 pm

Hi, adding to Laura and db40 questions, does the average latent class probabilities for most likely latent class membership also matter? If entropy is below 0.9 but the average probabilities for the most likely are above 0.9 for all classes (i.e., the diagonals), can I still consider skipping the 3-step approach? Thank you.

Bengt O. Muthen posted on Wednesday, August 12, 2015 - 5:19 pm

Probably.

Jon Heron posted on Thursday, August 13, 2015 - 7:58 am

My paper is still not published, but I found the off-diagonal elements of the D-matrix (Mplus' second classification matrix) to be important.

Setting entropy aside, if you have three classes and the off-diagonal elements of D relating to classes 1 & 2 are effectively zero (i.e. elements [1,2] and [2,1]) whilst those for classes 2 & 3 are not, then a parameter such a covariate effect comparing classes 1 and 2 (e.g. risk of class 2 relative to class 1) would be less biased than the same effect for class 3 relative to class 2.

Bottom line

(1) entropy and this "class separation" are both important

(2) it's easier to do a one-step or new 3-step than work out whether a simpler approach is adequate.

John Woo posted on Wednesday, August 19, 2015 - 11:50 am

Hi, if i understand correctly, one of the rationales for 3-step approach (or even 1-step approach) is the idea that latent class formation is independent of the influences of the covariates. [I am leaving aside the matter of distal outcome.] In case of GMM, where covariates have potential paths to both the growth factors and latent class structure, does this 3-step rationale apply only to the paths towards the latent class structure and not the growth factors? That is, when I run 3-step approach, is it consistent with the 3-step rationale that I include the covariates predicting the growth factors? I am a bit confused because it seems class formation can be influenced by growth factors, and, if the covariates influence the growth factors, then they indirectly influence the class formation as well? Thank you in advance.

Bengt O. Muthen posted on Wednesday, August 19, 2015 - 2:48 pm

In GMM with effects from covariates on not only c but also i and s, you shouldn't use the R3STEP approach because your model has direct effects beyond those on c.

Ali posted on Friday, March 11, 2016 - 2:40 am

I used the 3-step estimation. First, I estimated LCA, and I got 3 classes. Second, I fixed the log ratio to the parameters [N#1][N#2]in the class 1,2,and 3.The third step I the Linear regression auxiliary model which is almost the same as the web note 15(page.15), but I have two predictors(extrinsic and intrinsic motivation) and five plausible values on math achievement(y).
So, I run the 3rd step separately in terms of intrinsic and extrinsic motivation . The slopes are significant on all classes, no matter when I used intrinsic motivation or extrinsic motivation as a predictor.
Later, I put the extrinsic and intrinsic motivation as predictors(the correlation between the predictors is 0.67), and it turned out that the extrinsic and intrinsic motivation are not significant on class 2.

So, I am wondering that it is better to run the model with one predictor or two predictors.

Tihomir Asparouhov posted on Saturday, March 12, 2016 - 9:05 am

You can use the likelihood ratio test to decide this.

Ali posted on Thursday, March 17, 2016 - 3:19 am

But, I have five plausible values for the dependent variable, so I got the mean of likelihood across five plausible values.So, maybe the likelihood ratio test could not work to compare the model with one predictor or two predictors. Is possible to compare models with five plausible values ?

Tihomir Asparouhov posted on Friday, March 18, 2016 - 1:57 pm

Because of the plausible values LRT is much more complicated and not available in Mplus. You can use Model Test though (include both predictors and test various combinations of coefficients).

Ali posted on Wednesday, April 13, 2016 - 12:53 am

Hello, I posted the post on March 11th. I am using the LCA 3-step estimation manually. I have two predictors to predict the five plausible values in students' math score. But, I had missing data on two predictors due to missing at random . I tried to do imputation on two predictors, but it seems that I can not do it, because I typed"TYPE=Imputation" in the DATA command for the five plausible values.After I run it, I still had around 12000 sample size,but the sample size decreased around 50%. Is it any way to deal with the missing data on the predictors?

Tihomir Asparouhov posted on Wednesday, April 13, 2016 - 2:49 pm

Run the 5 data sets one at a time and combine the results using the usual Imputation rules.

see page 5
http://www.ats.ucla.edu/stat/sas/library/multipleimputation.pdf

Ali posted on Thursday, April 14, 2016 - 12:06 am

Do you mean that I impute the missing data on the predictors for five times at the first time? For example, the first data with the first plausible values, so I impute the missing data on the predictors. And, do it five time.Then, run the five data sets in the same time?

Tihomir Asparouhov posted on Friday, April 15, 2016 - 10:06 am

Yes - it is standard MI. You would have to combine the results from the 5 runs manually as described in this Section "Combining Inferences from Imputed Data Sets" in the above link.

Linn Gjersing posted on Thursday, April 21, 2016 - 11:56 pm

Hi, I need the odds ratio and confidence intervals from the latent variable multinominal logistic regression using the 3-step procedure (R3STEP).

I could calculate this manually by exponating the estimate, but I still need the confidence intervals. When I add CINTERVAL to the output: statement I only get the ci for "Model Results", "The probability scale" and for the "Latent Class Odds Ratio", but none from the multinominal logistic regression analysis.

Is there a way to get the odds ratio and confidence intervals using the AUXILARY R3STEP procedure?

Jon Heron posted on Friday, April 22, 2016 - 5:46 am

Just derive the confidence interval for the log-odds and exponentiate that too

OR = exp(Estimate)

lower bound = exp(Estimate - 1.96*S.E)
upper bound = exp(Estimate + 1.96*S.E)

Bengt O. Muthen posted on Friday, April 22, 2016 - 8:55 am

See also our FAQs on odds ratios.

Megan Ames posted on Monday, July 25, 2016 - 12:42 pm

Hello Drs. Muthen, I am attempting to run a manual 3-step latent class growth model with several distal outcomes and covariates. I am confused by how to get output regarding individual pairwise comparisons between the 5 classes. Below is my attempt at the syntax- I would like to be able to compare each of the class means for the distal outcome (dep13). Also, is there a way to include more than one outcome in the model?

Model: %OVERALL%

dep13 on sexw1 phhhc dep3;

%C#1%
[N#1@2.602];...
dep13 (m1);

%C#2%
[N#1@1.274];...
dep13 (m2);

%C#3%
[N#1@1.828];...
dep13 (m3);

%C#4%
[N#1@1.606];...
dep13 (m4);

%C#5%
[N#1@-2.751];...

MODEL TEST:

m1=m2;
m1=m3;
m1=m4;
m1=m5;

etc...

Bengt O. Muthen posted on Monday, July 25, 2016 - 4:00 pm

You can use Model Constraint to express any difference you like, such as

pdiff = p1 - p2;

The Model Test approach you show tests all equalities at once.

You can have several distals but they are done one at a time.

Megan Ames posted on Friday, July 29, 2016 - 12:44 pm

Thank you for your quick reply. I am still unclear how/where the pairwise comparisons are called for? In the one-step approach, chi-square statistics for each of the class mean comparisons is provided at the end of the output; however, in the manual three-step approach this output is not provided. I attempted the following syntax without success:

USEVAR ARE dep13 sexw1 phhhc dep3
n;

MISSING = all (999);
IDVARIABLE = responid;

classes = c(5);
nominal = n;

Analysis: TYPE = MIXTURE;
estimator = MLR;
ALGORITHM=INTEGRATION;

Model: %OVERALL%
dep13 on sexw1 phhhc dep3;

%C#1%
[N#1@2.602]; etc.
dep13;
[dep13] (p1);

%C#2%
[N#1@1.274]; etc.
dep13;
[dep13] (p2);

%C#3%
[N#1@1.828]; etc.
dep13;
[dep13] (p3);

%C#4%
[N#1@1.606]; etc.
dep13;
[dep13] (p4);

%C#5%
[N#1@-2.751]; etc.
dep13;
[dep13] (p5);

Model constraint:
p1=p2;
p1=p3;
p1=p4; etc....

Bengt O. Muthen posted on Friday, July 29, 2016 - 3:16 pm

Send your output and license number to Support.

Yajing Zhu posted on Monday, August 01, 2016 - 7:36 am

Dear Profs,

I have 2 questions regarding the 3-step approach with 1 continuous distal outcome , using modal assignment.
1. In step3,essentially only the relationship between modal class and latent class is fixed, it is essentially again a latent class model with one loading fixed. How come the latent class proportion remain unchanged in step3? (it is not fixed at values in previous steps)
2.Is step3 also subject to the assumption of conditional independence between the distal outcome and the modal class? (it seems so to me!)

Thanks for your support :-)

Yajing

Bengt O. Muthen posted on Monday, August 01, 2016 - 4:17 pm

1. Failure to maintain the latent class proportions are discussed in Section 7.1 of Web Note 15. See also the better BCH method in Web Note 21.

2. Yes.

Yajing Zhu posted on Tuesday, August 02, 2016 - 8:37 am

Thanks, Prof Muthen for confirming this. Following my last post item 1, I therefore wonder why in Step3, you do not fix P(C=latent class) to be used in step 3, but rather, using estimates from step 1 as starting values (Is this what you suggested in webnote 15 when you discuss the difference between automatic and manual way?)? If this is implemented in the manual step3, class proportions shall be maintained, is it? [I am saying this as in step 1, we can calculate P(C|M=modal class) and we have P(M) as M is observed, can't we use empirical distribution of M and use bays total prob. theorem to get P(C)]
Please correct me if I am wrong somewhere. Thanks for your support :-)

Best wishes,
Yajing

Bengt O. Muthen posted on Tuesday, August 02, 2016 - 6:20 pm

The 3-step method tries to classify each subject in the correct way. It is not enough that the class sizes are unchanged, the subjects occupying the classes also need to be unchanged.

Fan Xizhen posted on Tuesday, September 27, 2016 - 7:58 pm

Dear Drs. Muthen,
I am attempting to run a 3-step latent profile model with several distal outcomes using R3STEP.I got the output of "EQUALITY TESTS OF MEANS ACROSS CLASSES". But I am confused why it provide Chi-square instead of F value, so is it the same with F value of ANOVA? Are "EQUALITY TESTS" here similar to ANOVA that can be used to conduct pairwise comparisons?
Thank you!

Bengt O. Muthen posted on Wednesday, September 28, 2016 - 1:05 pm

Q1. Yes, they are analogous.

Q2. Yes.

Fan Xizhen posted on Wednesday, September 28, 2016 - 5:27 pm

Thank you for your quick answer, Prof Muthen.

Kiki van Broekhoven posted on Monday, February 27, 2017 - 11:53 pm

Hello,
I am new to Mplus and attempting to perform a LCGA and then relating predictor variables to the latent classes using 3-step method. I tried to use the syntax for the automatic 3-step method.

Part of my syntax for the "step 1":
VARIABLE:
NAMES ARE numero EDS1-EDS7;
USEVAR = EDS1-EDS7;
CENSORED = EDS1-EDS7(b);
MISSING = all (999);
CLASSES = c(3);
ANALYSIS:
TYPE = MIXTURE;
ALGORITHM=INTEGRATION;
STARTS = 10 2;
STITERATIONS = 10;
MODEL:
%OVERALL%
i s q | EDS1@0 EDS2@1 EDS3@2 EDS4@3 EDS5@4 EDS6@5 EDS7@6;
i-q@0;

Part of my syntax for the automatic 3step:
VARIABLE:
NAMES ARE numero ocpd EDS1-EDS7;
USEVAR = EDS1-EDS7;
CENSORED = EDS1-EDS7(b);
MISSING = all (999);
CLASSES = c(3);
AUXILIARY = ocpd (R3STEP);
ANALYSIS:
TYPE = MIXTURE;

I can't find the model that I found before, while I was performing my LCGA. The classes change (mean scores and class sizes), whereas I thought the advantage of a 3-step method was that the classes wouldn't change anymore?

Kiki van Broekhoven posted on Monday, February 27, 2017 - 11:54 pm

In addition:
Maybe it is better to use the manual 3-step method? However, I don't quite understand the lower part of the syntax provided in the M+webnotes:

%c#1%
[n#1@1.901];
[n#2@-0.990];
%c#2%
[n#1@-0.486];
[n#2@1.936];
%c#3%
[n#1@-2.100];
[n#2@-2.147];

What are these numbers, e.g. 1.901 and -0.990?

I think I am doing something wrong, I'm just not sure what it is.

Thank you in advance.

Bengt O. Muthen posted on Tuesday, February 28, 2017 - 6:18 pm

You want to send your 2 outputs to Support along with your license number: Step 1 and the R3Step run.

Fangsheng Zhu posted on Tuesday, June 20, 2017 - 12:17 am

I'm trying to run a two-level multinomial logistic regression on latent class assignments, as the third step in the three-step procedure. I saw a post above in 2015 saying it hasn't been developed. Can I ask if it has been developed now? Thanks!

Bengt O. Muthen posted on Tuesday, June 20, 2017 - 5:26 pm

No 3-step for 2-level yet.

Chris Giebe posted on Sunday, August 27, 2017 - 7:12 am

Hello,

I'm still fairly new to Mplus and LCA and may need some help with terminology.

I'm using the manual BCH method of estimating LCA with 4 distal outcomes, constant slopes across 4 classes, using a Wald Test to test mean equivalence on my outcomes, and controling for one variable (covariate).

I have gone through the enoumeration process without covariates no problem.
Both steps in the manual BCH seem to be working fine as well.

However, when I look at the class specific means in the output for each of the 4 distal outcome variables, I'm unsure if the classes are still the same as they were for the previous (i.e. if c#1 is still c#1, ect.)

I've read something about using SVALUES to fix classes? I'm not even sure what that means? I've seen people post fragments of syntax containing [N@0.2232] or whatever number, and it seems like it's what I need... but I'm neither sure of that, nor how to get what I need from the Mplus output.

I'd like for my c#1-c#4 to be the same for all 4 distal outcomes in step 3.

Thanks for your help.

Jasmijn de Lijster posted on Friday, September 22, 2017 - 3:59 am

Dear Dr. Muth�n,

When I run the analysis for the third step, I receive an error for a 4 class GMM model with a binary outcome variable. I would like to test the differences between variable schoprob (0 or 1) over the 4 classes while controlling for covariates.

I receive the following error messages:
THE ESTIMATED COVARIANCE MATRIX FOR THE Y VARIABLES IN CLASS 1 COULD NOT BE INVERTED. PROBLEM INVOLVING VARIABLE SCHOPROB. COMPUTATION COULD NOT BE COMPLETED IN ITERATION 5. CHANGE YOUR MODEL AND/OR STARTING VALUES. THIS MAY BE DUE TO A ZERO ESTIMATED VARIANCE, THAT IS, NO WITHIN-CLASS VARIATION FOR THE VARIABLE.
THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ERROR IN THE COMPUTATION. CHANGE YOUR MODEL AND/OR STARTING VALUES.

When I look at the proportions of schoprob across the four classes, the counts are fairly distributed:
Scho prob 0 1
1 2333 1012
2 117 130
3 75 64
4 78 48

I would state that the variable has within class variation. The starting values are zero as specified for this step. Do you have any other ideas why the model does not run and how I could make it work?

Sincerely,
Jasmijn

Bengt O. Muthen posted on Friday, September 22, 2017 - 4:48 pm

Please send to Support along with your license number.

DavidBoyda posted on Tuesday, February 27, 2018 - 8:35 am

Dear Professor,

Iam conducting a linear regression auxiliary model but have 7 covariates and have regressed both of my distal outcomes on the 7 covs (In the model command and under each class (as per appendix F of the 3step approach)

The estimation process is taking a long time. is there anyways I can speed this up?

Can I use the OPTSEED command in step 3 after step 1, if I have ascertained the optimal class during step 1?

kind regards,
D

Bengt O. Muthen posted on Tuesday, February 27, 2018 - 10:50 am

Perhaps you don't use Starts=0. Send your step 3 output to Support so we can see what's going on; include your license number.

DavidBoyda posted on Monday, March 05, 2018 - 1:05 pm

Hi,

are the thresholds estimates in between classes and distals standardised or unstandardised? If they are un, is there a way to covnert them? I have request standardised estimates.

Class 1

Thresholds
Y1$1 2.149 0.293 7.327 0.000

Bengt O. Muthen posted on Monday, March 05, 2018 - 1:28 pm

They are unstandardized unless they are printed in the standardized section. If they don't appear in the standardized section Mplus doesn't provide them in this particular case. You can standardize them using the y* mean and variance.

DavidBoyda posted on Monday, March 05, 2018 - 10:02 pm

Hi,

aplogogies, actually except age, these are all binary, so I am supposed to use stdY. I will convert age to StdYX.

thanks.

Jacqueline Homel posted on Wednesday, August 29, 2018 - 8:47 pm

I have a question about estimated means of covariates using the manual 3-step approach. I am using the manual 3-step method to look at the effect of some covariates on class membership. There are four latent classes. My code for the final step looks like:

Model
%overall%
c#1 on x1 ;
c#1 on x2 ;
c#1 on x3 ;

c#2 on x1 ;
c#2 on x2 ;
c#2 on x3 ;

c#3 on x1 ;
c#3 on x2 ;
c#3 on x3 ;

%c#1%
[n#1@4.889] ;
etc.

I am interested in presenting the estimated mean of my key covariate x1 in each latent class, adjusted for the other covariates in the model. Tech4 provides model-estimated means and standard errors for the covariates in each class under �estimated means for the latent variables�� I am wondering (1) whether these values are appropriate to use as model-estimated means of covariates in each class? And (2) how these means and SEs in tech4 are calculated?

Tihomir Asparouhov posted on Friday, August 31, 2018 - 8:08 pm

(1) Yes
(2) They are estimated from the H1 model ML information matrix where the sample size for each class is estimated from the mixture model and the sample mean and variance are posterior probability weighted.

Anna Austin posted on Tuesday, September 18, 2018 - 8:25 am

Hello!

I am conducting the 3-step approach manually to examine predictors in a 3 class LCGA model. Is the below code the correct code for the MODEL statement in the third step? Is there anything else I need to include? Thank you!

MODEL:
%OVERALL%
C on x1 x2 x3 x4; !x1-4 are predictors of interest, C is class;

%c#1%
[ModalC#1@5.824 ModalC#2@2.519];
!ModalC is modal class assignment, here fixed logits based on output from step 1 to account for classification uncertainty;

%c#2%
[ModalC#1@1.168 ModalC#2@3.009];
!ModalC is modal class assignment, here fixed logits based on output from step 1 to account for classification uncertainty;

%c#3%
[ModalC#1@-0.397 ModalC#2@-0.010];
!ModalC is modal class assignment, here fixed logits based on output from step 1 to account for classification uncertainty;

Bengt O. Muthen posted on Tuesday, September 18, 2018 - 5:42 pm

See Web Note 15's Appendix with Mplus scripts, especially Appendix A at

http://www.statmodel.com/download/AppendicesOct28.pdf

Anna Austin posted on Wednesday, September 19, 2018 - 10:25 am

Thank you!

An additional question:

In the output from step 3 of the 3 step approach, there is output labeled "LOGISTIC REGRESSION ODDS RATIO RESULTS" and output labeled "ALTERNATIVE PARAMETERIZATIONS FOR THE CATEGORICAL LATENT VARIABLE REGRESSION". Which is the appropriate output to use to for reporting odds ratios of covariates associated with the latent classes? I understand that for the "ALTERNATIVE PARAMETERIZATIONS FOR THE CATEGORICAL LATENT VARIABLE REGRESSION" output I will need to exponentiate the parameter estimates to get odds ratios. However, these two outputs seem to provide different SEs.

Thank you!

Bengt O. Muthen posted on Wednesday, September 19, 2018 - 10:32 am

Those will appear in the next Mplus version, 8.2. For now, please see the FAQ on our website which shows how to do it yourself:

Odds ratio confidence interval from logOR estimate and SE

Anna Austin posted on Wednesday, September 19, 2018 - 10:43 am

Thank you! The FAQ is helpful.

My question is whether I use the SE provided under "LOGISTIC REGRESSION ODDS RATIO RESULTS" or under "ALTERNATIVE PARAMETERIZATIONS FOR THE CATEGORICAL LATENT VARIABLE REGRESSION" for calculating the odds ratios. I think "ALTERNATIVE PARAMETERIZATIONS FOR THE CATEGORICAL LATENT VARIABLE REGRESSION" is correct, but I wanted to check.

Bengt O. Muthen posted on Wednesday, September 19, 2018 - 10:54 am

"Alternative Parameterizations..." use different reference classes than the default last class that is presented under "Logistic Reg...". So if you want one of the alternatives, that's where you would get the ingredients to compute their ORs and SEs.

Boyu Zhai posted on Thursday, August 13, 2020 - 1:35 am

Dear Professor,

I have a question about manual 3-step procedure. Basing on the output, q21 and q31 are both 0.

Average Latent Class Probabilities for Most Likely Latent Class Membership (Row)
by Latent Class (Column)

1 2 3

1 1.000 0.000 0.000
2 0.000 0.996 0.004
3 0.000 0.060 0.940

So the problem is that ln(q21/q31)=ln0/0. As suggested by Asparouhov (2013), q31 is replaced by 0.0001 at this point. Will q21 also be replaced by 0.0001 at the same time?

Looking forward to your reply!

Bengt O. Muthen posted on Friday, August 14, 2020 - 6:20 am

I think you want to work with this output segment:

Logits for the Classification Probabilities for the Most Likely Latent Class Membership (Column)
by Latent Class (Row)

Boyu Zhai posted on Friday, August 14, 2020 - 7:14 am

Thank you! Do you mean write code like this bases on "Logits for the Classification Probabilities for the Most Likely Latent Class Membership (Column) by Latent Class (Row)?

Logits for the Classification Probabilities for the Most Likely Latent Class Membership (Column)
by Latent Class (Row)

1 2 3

1 8.227 -5.588 0.000
2 -7.818 5.995 0.000
3 -13.724 -2.348 0.000

%c#1%
[N#1@8.227];
[N#2@-5.588];

%c#2%
[N#1@-7.818];
[N#2@5.995];

%c#3%
[N#1@-13.724];
[N#2@-2.348];

Sorry to bother you again!

Bengt O. Muthen posted on Friday, August 14, 2020 - 2:25 pm

Yes - see the papers on this.

Boyu Zhai posted on Friday, August 14, 2020 - 6:09 pm

Thank you, Professor! I want to conduct GMM with predictors and distal outcomes. Do you think the following code is correct? Distal outcomes are continuous variables and I don't know how to write code.

Model:
%Overall%
c on x;
y on c;
%c#1%
[N#1@8.227];
[N#2@-5.588];
%c#2%
[N#1@-7.818];
[N#2@5.995];
%c#3%
[N#1@-13.724];
[N#2@-2.348];

Another problem, is it necessary to control for distal outcomes at wave one?

Thank you in advance again!

Bengt O. Muthen posted on Sunday, August 16, 2020 - 4:12 pm

Send your full output to Support along with your license number.

Rimantas Vosylis posted on Thursday, September 03, 2020 - 5:35 am

Dear Muthens,
I have a two-wave LTA model with three continuous variables at each occasion as latent profile indicators. So this is a two-wave LPTA.
All profiles are constrained to be equal across two waves (means are equal across waves). I want to do a three-step approach and include a predictor in the model.
I am somewhat confused on how to get the stable solution of latent classes across waves, i.e., so that the classes in the model without covariate are the same as with covariate.
Question is simple: Can I simply fix the latent profile means in the model with predictor to mean values that I found in the model without predictor? Would that be enough to do a three-step approach in LPTA model?

Bengt O. Muthen posted on Thursday, September 03, 2020 - 11:24 am

No, that's not enough. You should have a look at Mplus Web Note 21 and its scripts for 3-step LTA that you find on our website.