Lewina Lee posted on Wednesday, October 31, 2012 - 4:09 pm
Dear Drs. Muthen,
I would like to do (1) an LPA of 21 continuous variables, and (2)test class membership in relations to 9 covariates (x's) and 2 distal outcomes (y's). I intend to use the procedure for manually implementing the 3-step approach described in M+ webnotes #15 v5.
In the 3-step approach, given that the measurement model is estimated independent of the auxiliary variables, does it make sense to proceed in the following manner?
1. Do class enumeration in Step 1 (e.g., do Step 1 with 1 - 8 classes) to identify the best one or two models while specifying AUXILIARY = x1, x2,..x9, y1, y2.
2. Do Step 2 (calculating measurement error for most likely class variable) for the best one or two models identified in class enumeration.
3. Do Step 3 (estimating the auxiliary model while specifying the latent class model with measurement errors obtained in Step 2) for the best one or two models from class enumeration.
Can latent class membership (C) be regressed on covariates?
Can distal outcomes be regressed on class membership in Step 3?
Is it accurate to say that class membership will not shift regardless of modifications in the auxiliary model (e.g., adding/removing covariates and distal outcomes)?
Thank you, Lewina
Lewina Lee posted on Wednesday, October 31, 2012 - 4:18 pm
One more short question in addition to the above:
In M+ webnotes #15 v5 Appendix F, Step 3 of the manually-implemented 3-step model is specified with STARTS=0, are users supposed to follow that in actual analyses?
(In Step 1, the authors noted that STARTS=0 was only specified to retain the order of classes as in the data generation step, and that users should remove that in actual analyses.)
No, that is not available. They are treated as continuous so for binary distals you will get proportions.
Lewina Lee posted on Wednesday, November 07, 2012 - 8:15 pm
Regarding my question on 11/1/2012 - 12:01pm on using the CATEGORICAL ARE statement on distal outcomes -- could you please clarify what you meant by "not available"?
I tried Step 3 of the manual 3-step approach by specifying: MODEL: %OVERALL% Y on x1 x2 x3; c on x1 x2 x3; %C#1% [N#1@ 5.04]; [N#2@ 2.38]; %C#2% [N#1@ 0.53]; [N#2@ 3.28]; Y on x1 x2 x3; %C#3% [N#1@ -4.42]; [N#2@ -2.64]; Y on x1 x2 x3;
I tried it with vs. without the "CATEGORICAL = Y" statement. The regression results (p-values) for Y on X1-X3 are comparable in both cases. I see that I got an intercept for Y in each class when Y was modeled as continuous, as opposed to a threshold.
Could you please help me understand why I could not use "CATEGORICAL ARE" with binary distals?
What do I need to do to obtain P(distal=1), or odds of distal=1 in one class versus another class?
All variables on the AUXILIARY list are treated as continuous variables for the AUXILIARY functions whether they are on the CATEGORICAL list or not.
Lewina Lee posted on Thursday, November 08, 2012 - 1:42 pm
If I am doing the manual 3-step approach, I do not need to use the AUXILIARY= statement according to WebNote 15. (I only need to use AUXILIARY= in the automatic 3 step approach, e.g., using DU3STEP). Does that mean, in Step 3 of the manual 3-step approach, I can specify covariates and outcomes with CATEGORICAL= ?
Because when I used CATEGORICAL= at Step 3 of the manual 3-step approach with a binary outcome, I was able to get an output on "LOGISTIC REGRESSION ODDS RATIO RESULTS." I just want to verify that this is ok.
I am interested in testing whether means on a set of distal outcomes differ across growth trajectory classes (GMM), controlling for a set of covariates. The covariates have direct effects on growth factor means (class indicators) and the class indicators have direct effects on the outcomes within class (constrained equal across class). I used a one-step approach , but a reviewer suggested a 3-step approach. Two questions: (1) Can I test whether *adjusted* means for the distal outcomes differ between classes with a manual 3-step approach, and (2) Given the direct effects of covariates on class indicators (and class indicator effects on the distal outcomes) with entropy =.63 (obtained from the 1-step final model), would the 1-step approach be better suited than the 3-step approach based on simulation results in webnote 15. Thanks!
I don't think it is possible to do a 3-step approach for this model because you have a "class indicator effects on the distal outcomes". Since the class indicators are latent variables in stage 3 you cant use them (these latent variables, the growth factors, are measured and created in stage 1 only so they wont be available in stage 3).
cogdev posted on Monday, January 28, 2013 - 7:24 pm
I would like to use latent class membership from one series of indicators (along with a few other continuous covariates), to predict latent profile membership derived from a separate series of indicators. Clustering independently is theoretically important (separate domains), which is why the 3-step procedure is appealing.
I can manually run the 3-step procedure separately for each latent class analysis (at least up to the 2nd step), to get the misclassification stats for each one. A 4-profile/class solution fits best in both cases.
Something along the lines of Ex7.14 appears to be close to what I need, except that I have a directed prediction from theory (actually more similar to Ex7.19, with a separate clustering variable instead of the factor), and a number of continuous covariates.
So, what type of specification am I dealing with here, and how can I implement it (the auxiliary option doesn't seem designed to support this)?
I can imagine that an analysis with categorical misclassification might be probability-based/fuzzy, or might need some sort of MCMC sampling? As a fall back, entropy is high (>.90) in both cases, so I guess I could 'hard-code' most likely cluster membership and run something like a multinomial logistic regression with covariates?
Any help or direction here would be greatly appreciated.
We are using the manual 3-step method to test predictors in a latent transition analysis with three time-points. Changing the predictors changes our class sizes, particularly for the third time-point. Do you know why this is happening?
We have measurement non-invariance, so each LCA is estimated separately. The nominal most likely class variable is obtained from each LCA estimation, without constraining any of the item thresholds. Other than that, we are following the approach in Web Note 15 shown in Appendices L, M, N, and O.
You constrained the item thresholds in your individual LCAs so that they would be the same as the LCAs from the initial LTA, which had measurement invariance. You then used the most likely class variable from the individual LCAs to run a second LTA, this time with predictors. When you change your predictors, do the class sizes change?
This is the problem we are experiencing. We don't need to constrain item thresholds in our individual LCAs because they don't need to be the same as the LCAs from an initial LTA, because we don't want measurement invariance. The class sizes are changing considerably when we use the most likely class variable to run an LTA with predictors. We aren't changing the most likely class variable, we're just changing the predictors. Does that make sense?
I actually meant to refer to fixing the nominal parameters (not the threshold invariance) - I assume you are doing that fixing in the 3rd step. So you are following appendices H, I, J. Those don't have a covariate. You could do your analyses on the Appendix K generated data which include x and follow the H-I-J steps to see what happens. I don't think we have explored that.
That's correct, we're fixing the nominal most likely class variable in the 3rd step.
You think we should generate data with measurement invariance and a covariate, estimate LCAs without assuming measurement invariance, and then test the influence of that same covariate on our class solutions? Wouldn't that illuminate the consequences of not assuming measurement invariance when it exists?
We would like to test the predictive strength of variables without changing the class solutions, like in r3step. It might be useful to generate data with two covariates and then test each independently to see if the class sizes change. However, we already have data and multiple covariates, and we already know the class sizes are changing. Why would this happen when we've fixed the probability of being in one class versus another?
I am running a manual 3-step approach with auxiliary variables following the web note 15, version 7. I have some missing data on my y variable, and I am using weights. When I run the syntax for step 3, I get the following error message: Invalid symbol in data file: "*" at record #: 2, field #: 44 Below is the code, what am I doing wrong? NAMES ARE (...); MISSING ARE all (-9999); USEVARIABLES ARE C1_7FP0 RACE3 SEX 7CONCPT N; CLASSES = c(4); NOMINAL = N; CATEGORICAL ARE RACE3 SEX; WEIGHT IS C1_7FP0; Analysis: TYPE = mixture; ESTIMATOR = MLR; STARTS = 600 120; PROCESSORS = 4(STARTS); Model: %overall% C7CONCPT on RACE3 SEX; %C#1% [Nemail@example.com]; [N#2@-8.791]; [Nfirstname.lastname@example.org]; C7CONCPT on RACE3 SEX; C7CONCPT; %C#2% [N#1@-9.378]; [Nemail@example.com]; [N#3@-0.889]; C7CONCPT on RACE3 SEX; C7CONCPT; %C#3% [Nfirstname.lastname@example.org]; [N#2@-0.496]; [Nemail@example.com]; C7CONCPT on RACE3 SEX; C7CONCPT; %C#4% [N#1@-3.593]; [N#2@-3.888];[N#3@-4.885]; C7CONCPT on RACE3 SEX; C7CONCPT; Many thanks in advance
Hello, I figured out my previous question, but now I have come across another error message (same syntax as above): *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#2 *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#3 *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#2 *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#3 *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#2 *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#3 *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#2 *** ERROR in MODEL command Unknown threshold for NOMINAL variable N: N#3 *** ERROR The following MODEL statements are ignored: * Statements in Class 1: [ N#2 ] [ N#3 ] * Statements in Class 2: [ N#2 ] [ N#3 ] * Statements in Class 3: [ N#2 ] [ N#3 ] * Statements in Class 4: [ N#2 ] [ N#3 ] *** ERROR One or more MODEL statements were ignored. These statements may be incorrect or are only supported by ALGORITHM=INTEGRATION.
I took the parameters from the Logits for the Classication Probabilities table from step 1. What is producing the error? Thank you
I would like to use the 3-step approach to examine the effects of some covariates on latent trajectory classes. Because I have censored data, I cannot use the auxiliary function (R3STEP), as it does not run with algorithm = integration. I would like to use the manual 3-step approach but I do not know how to obtain the logits for the classification probabilities. In my output I only have the average latent class probabilities. How could I obtain classification probabilities or logits in growth mixture models?
I am running a model predicting continuous and categorical outcomes using a latent class variable and several covariates based on the manual 3 step procedure. I am freely estimating the means and thresholds of the Dvs for each class. I was wondering whether the means and thresholds of the DVs for each class are estimated values based on the mean value of all covariates in the model as a default or do I have to mean center them. We want to present the estimated means in a table for each class, but are unsure how they should be interpreted.
We are interested in using latent class as a predictor of a distal binary outcome (using the manual version of the 3-step approach). Specifically, in step 3 we want to test whether the thresholds for the binary outcome differ between our two classes. As far as we understand, we can do this with a Wald test using the Model Test statement -- that's what we did.
In our first model, we use only latent class as a predictor. We get the Wald test as requested, but we also get an odds ratio and its corresponding significance test, which is exactly what we want. In our second model, we use latent class and some covariates as predictors of the outcome. When we include the covariates, the odds ratio we are interested in is no longer provided, just the requested Wald test. Would it be accurate to compute an odds ratio ourselves using the covariate-adjusted thresholds from the output? Or if not, is there another way we can get this information from Mplus? We would like to be able to present the degree to which the odds ratio changes after taking into account the effects of the covariates on the outcome.
For both of your models you can express the odds ratio in Model Constraint using parameter labels from Model. That also gives you SEs so you can get a test. With covariates the odds ratio would be based on only the thresholds as you say.
Thank you very much for your response, that was very helpful.
From that, I have a follow-up question. I'm noticing that when I use Model Test to test threshold #2 = threshold #1 and compare that result to the "Latent Class Odds Ratio Results" that Mplus outputs automatically, or to the OR result from Model Constraint, I get quite different significance test results. The two OR significance tests are identical, but the test for the difference in thresholds from Model Test is quite different from the test of the OR. The only thought I've had is that the OR is being tested against a null value of 0 and not 1, but I'm not sure. I'm hoping someone can shed some light on my confusion.
Thanks. I have Mplus Version 7.2. My output does not have the table I asked about, logits. It does not have the table, "Classification Probabilities for the Most Likely Latent Class Membership (Row) by Latent Class (Column)". The output does have the table, "Average Latent Class Probabilities..."
The output is from nonparametric two-level mixture models that have categorical latent variables on both the within and between levels. These were demonstrated in Henry & Muthen, 2010, "Multilevel Latent Class Analysis: An Application...", Structural Equation Modeling, vol 17.
The output states that Auxiliary= is not available for R3STEP, DCON, etc., when there are latent variables on the between and within levels.
The desired model is: (1) use Time 1 indicators to derive latent classes, (2) regress a Time 2 distal outcome on Time 1 class membership, (3) use the Time 1 version of the Time 2 distal outcome as a latent class covariate/auxiliary.
Can this be done with the manual 3-step method if I have the table of logits (or table of classification probabilities to calculate the logits)? Thanks.
"With entropy of .9 or greater, you can use most likely class membership. You do not need the 3-step procedure."
Does this also apply to situations when analysing the predictors of latent classes in LCGA (using multinomial logistic regression, for example)?
Is it possible to use the R3step method with the Mplus version 7, or do I need a newer version for that? I tried out an analysis with the auxiliary(r3step) in LCGA. The results were completely different to those obtained with auxiliary(r), and the standard errors were zeros for many of the predictors.
Q2. If you don't get stopped when mentioning the R3STEP option then it is available. But you should always use the latest Mplus version. The tables as the end of our BCH paper explains that aux(r) is superseded by aux(r3step).
Laura posted on Wednesday, April 08, 2015 - 9:01 am
Ok, thank you. Is there any reference for using the most likely latent class when the entropy is high (over 0.9)? At least in Clark & Muthen (2009) this was compared to some other methods.
Asparouhov, T. & Muthén, B. (2014). Auxiliary variables in mixture modeling: Three-step approaches using Mplus. Structural Equation Modeling: A Multidisciplinary Journal, 21:3, 329-341. The posted version corrects several typos in the published version. An earlier version of this paper was posted as web note 15.
might I add to Lauras question. The entropy of my latent class is 0.85. Would you consider this high enough so I dont have to use the 3step procedure?
Jon Heron posted on Friday, May 15, 2015 - 11:50 am
Well I have a paper in press showing bias even up to entropy of 0.9 and I'm sure you could simulate the odd bizarre example where some estimates were biased at even higher values of entropy depending on the class distribution.
I spent the best part of ten years avoiding the one-step model but I'm quite amenable to it now.
I am trying to implement a 3-step LTA model to ensure that the latent class variable measurements are not affected by inclusion of covariates in the model. In order to do so, I am using MPlus Web Note 15: Appendix K-N and the Nylund-Gibson paper titled "A Latent Transition Mixture Model Using the Three-Step Specification".
I ran the LTA model with measurement invariance and separately calculated the most-likely class variables N1 and N2 for the Step 1 LCA at each of my two time points separately. However, I do not see Step 1 for the overall LTA in either the web note or the paper.
I am not clear on how to obtain the nominal variable N1, N2 thresholds for latent class variables C1 and C2 in the LTA. Would I use the "Logits for the Classification Probabilities for the Most Likely Latent Class Membership" in the output of step 1 for C1 and C2 (Appendix L and M) or will these values change when running the LTA?
I am trying to do the 3rd step of the manual 3-step approach for a LPA with a distal outcome and I have a couple of questions:
(1) I can't figure out the auxiliary model input that I need to include so that I get the same output that I would have gotten using the DU3STEP for a distal outcome (which I can't use because of missing data on covariates having to use integration = montecarlo).
(2) If I included covariates of the classes in Step 1, do I still include those covariates in the overall model statement in step 3?
John Woo posted on Wednesday, August 12, 2015 - 12:38 pm
Hi, adding to Laura and db40 questions, does the average latent class probabilities for most likely latent class membership also matter? If entropy is below 0.9 but the average probabilities for the most likely are above 0.9 for all classes (i.e., the diagonals), can I still consider skipping the 3-step approach? Thank you.
Jon Heron posted on Thursday, August 13, 2015 - 7:58 am
My paper is still not published, but I found the off-diagonal elements of the D-matrix (Mplus' second classification matrix) to be important.
Setting entropy aside, if you have three classes and the off-diagonal elements of D relating to classes 1 & 2 are effectively zero (i.e. elements [1,2] and [2,1]) whilst those for classes 2 & 3 are not, then a parameter such a covariate effect comparing classes 1 and 2 (e.g. risk of class 2 relative to class 1) would be less biased than the same effect for class 3 relative to class 2.
(1) entropy and this "class separation" are both important
(2) it's easier to do a one-step or new 3-step than work out whether a simpler approach is adequate.
John Woo posted on Wednesday, August 19, 2015 - 11:50 am
Hi, if i understand correctly, one of the rationales for 3-step approach (or even 1-step approach) is the idea that latent class formation is independent of the influences of the covariates. [I am leaving aside the matter of distal outcome.] In case of GMM, where covariates have potential paths to both the growth factors and latent class structure, does this 3-step rationale apply only to the paths towards the latent class structure and not the growth factors? That is, when I run 3-step approach, is it consistent with the 3-step rationale that I include the covariates predicting the growth factors? I am a bit confused because it seems class formation can be influenced by growth factors, and, if the covariates influence the growth factors, then they indirectly influence the class formation as well? Thank you in advance.
I used the 3-step estimation. First, I estimated LCA, and I got 3 classes. Second, I fixed the log ratio to the parameters [N#1][N#2]in the class 1,2,and 3.The third step I the Linear regression auxiliary model which is almost the same as the web note 15(page.15), but I have two predictors(extrinsic and intrinsic motivation) and five plausible values on math achievement(y). So, I run the 3rd step separately in terms of intrinsic and extrinsic motivation . The slopes are significant on all classes, no matter when I used intrinsic motivation or extrinsic motivation as a predictor. Later, I put the extrinsic and intrinsic motivation as predictors(the correlation between the predictors is 0.67), and it turned out that the extrinsic and intrinsic motivation are not significant on class 2.
So, I am wondering that it is better to run the model with one predictor or two predictors.
But, I have five plausible values for the dependent variable, so I got the mean of likelihood across five plausible values.So, maybe the likelihood ratio test could not work to compare the model with one predictor or two predictors. Is possible to compare models with five plausible values ?
Because of the plausible values LRT is much more complicated and not available in Mplus. You can use Model Test though (include both predictors and test various combinations of coefficients).
Ali posted on Wednesday, April 13, 2016 - 12:53 am
Hello, I posted the post on March 11th. I am using the LCA 3-step estimation manually. I have two predictors to predict the five plausible values in students' math score. But, I had missing data on two predictors due to missing at random . I tried to do imputation on two predictors, but it seems that I can not do it, because I typed"TYPE=Imputation" in the DATA command for the five plausible values.After I run it, I still had around 12000 sample size,but the sample size decreased around 50%. Is it any way to deal with the missing data on the predictors?
Do you mean that I impute the missing data on the predictors for five times at the first time? For example, the first data with the first plausible values, so I impute the missing data on the predictors. And, do it five time.Then, run the five data sets in the same time?
Hi, I need the odds ratio and confidence intervals from the latent variable multinominal logistic regression using the 3-step procedure (R3STEP).
I could calculate this manually by exponating the estimate, but I still need the confidence intervals. When I add CINTERVAL to the output: statement I only get the ci for "Model Results", "The probability scale" and for the "Latent Class Odds Ratio", but none from the multinominal logistic regression analysis.
Is there a way to get the odds ratio and confidence intervals using the AUXILARY R3STEP procedure?
Jon Heron posted on Friday, April 22, 2016 - 5:46 am
Just derive the confidence interval for the log-odds and exponentiate that too
Megan Ames posted on Monday, July 25, 2016 - 12:42 pm
Hello Drs. Muthen, I am attempting to run a manual 3-step latent class growth model with several distal outcomes and covariates. I am confused by how to get output regarding individual pairwise comparisons between the 5 classes. Below is my attempt at the syntax- I would like to be able to compare each of the class means for the distal outcome (dep13). Also, is there a way to include more than one outcome in the model?