LCA with Covariates, Distal, and Impu... PreviousNext
Mplus Discussion > Latent Variable Mixture Modeling >
 James Swartz posted on Thursday, May 25, 2017 - 9:30 am
I am running an LCA with 19 binary predictors and an N of 223. The best fitting model has two classes (entropy = .87). I have regressed this model on three covariate predictors. I am fine with and, in fact, want the predictors to influence class membership.

c#1 on catag2 k6cat aids_eve;

My questions:

1) These predictors reduce the N to 189 so I have rerun the model using multiple imputation. Using MI, I can't get the conditional probabilities of the indicators, only thresholds. Is there a way to get the results in probability scale?

2) I have two binary outcomes included using the auxiliary statement. Mplus won't run these using DCAT, which I use because I want odds ratios for the distal variables. What other option works?

3) I want to regress the distal outcomes the covariates to obtain direct effects (as well as the indirect effects through latent class), how is this specified in the model statement?

Once the model is running on non-imputed data, can I run the exact same model using imputed data? I presently get this error message for the imputed data:

Auxiliary variables with E, R, DU3STEP, DE3STEP, or BCH are not available for TYPE=IMPUTATION.

Thanks for any help.
 Bengt O. Muthen posted on Thursday, May 25, 2017 - 6:44 pm
By 19 binary predictors I assume you mean 19 latent class indicators.

1) No auto in Mplus but you can easily compute them by the usual translation from logits to probabilities.

2) See the tables at the end of web note 21

3) In the 1-step approach you simply say y ON x in each class.

Last question - the message correctly states that this is not provided in Mplus.
 James Swartz posted on Friday, May 26, 2017 - 12:16 pm
Thank you Dr. Muthen,

I did mean indicators...sorry.

With respect to number 3, perhaps I was unclear...

In the current statements I have

usevar ind1-ind19 age aids;
auxiliary (e) hospital er;

Then in the model statement, as indicated, I have:

c#1 on age aids;

If I then add to that statement (as I interpreted from your suggestion):

hospital on age aids;

I get this error message:

*** ERROR in MODEL command
Unknown variable(s) in an ON statement: HOSPITAL

If I try to include hospital on the usevar AND auxiliary command, I get the same error message.

In other words, it seems to be telling me I can't have a variable that is defined as a distal outcome on the auxiliary command and also include that variable in the model as a DV regressed on one (or more) of the covariate predictors of latent class.

Is there a way to do this?
 Bengt O. Muthen posted on Friday, May 26, 2017 - 5:46 pm
First, note that Auxiliary (e) is outdated - see the tables at the end of Mplus Web Note 21.

To add that direct effect you have to take "a manual" approach to 3-step as discussed in Web Note 15 and 21.
 James Swartz posted on Friday, May 26, 2017 - 5:56 pm
Thank you again. Understood.
 Soyoung Kim posted on Monday, December 18, 2017 - 5:02 am
Dear Dr. Muthen.

I would like to ask a question about multiple imputation (MI) with distal outcome in mixture model.

When I use BCH with latent class model, I got a error message:
*** ERROR in VARIABLE command
Auxiliary variables with E, R, DU3STEP, DE3STEP, or BCH are not available for TYPE=IMPUTATION.

It does not seem to be available to use them at once.

doing BCH analysis, if I want to get the last result of multiple imputation,
is it possible to calculate the mean value of estimates manually?

In other words,
I have made imputed data sets, and I got the results of each data set.
Is it possible to calculate the means of chi-square values and p-values for BCH results from the multiple data sets?

Thank you in advance.
 Tihomir Asparouhov posted on Tuesday, December 19, 2017 - 10:31 am
Yes, you can combine the results using the usual formula, see the bottom two formulas on page 3

Alternatively you can use the manual BCH approach with imputation. The manual BCH approach is described in Section 3
 Soyoung Kim posted on Tuesday, December 19, 2017 - 7:57 pm
Thank you so much for your response Tihomir.
it was very helpful. :-)

Best wishes,
 Soyoung Kim posted on Wednesday, December 20, 2017 - 1:06 am
Dear Dr. Asparouhov.
How can I combine the p-values?

I understand that I can calculate the mean of the chi-squares values from multiple results.
Is it okay to calculate the means of p-values for BCH results from the multiple data sets?

Thank you in advance,
 Tihomir Asparouhov posted on Wednesday, December 20, 2017 - 9:23 am
You can not combine the mean of the chi-square or the mean of the p-values. Instead see the bottom two formulas on page 3
Using these you can compute the point estimate P as well as the SE for that joint point estimate (the formula there is for the variance so you have to square the SE for the individual imputed data sets and at the end take the square root of the total variance). From there you can use the standard method p-value=2*Phi^{-1}(-P/SE), where Phi^{-1} is the inverse of the standard normal distribution function. For example, if P/SE=1.96 the p-value=0.05.
 Angela Nickerson posted on Sunday, February 18, 2018 - 8:52 pm
Dear Dr. Asparouhov,

I have used BCH for distal outcomes over 10 imputed datasets with 3 latent classes. I have combined the Means and SEs from datasets for each distal outcome for each Class, using the formulas that you mentioned on page 3 of So now I have combined Mean and SE of each distal outcome for each separate Class, similar to the output under EQUALITY TESTS OF MEANS ACROSS CLASSES USING THE BCH PROCEDURE. However, I'm still a little unclear about how I use these combined estimates to compare means between latent classes. From my understanding, if I was interested whether there was a significant difference between Class 1 and 2 on Distal outcome X, i would have (Class 1 combined M - Class 2 combined M)/combined SE. Is this correct? And if this is the case as I have three combined SEs from each Class, would I need to pool these estimates again, and what would be the proper method for doing this?

Thank you for your assistance.
 Tihomir Asparouhov posted on Tuesday, February 20, 2018 - 12:59 pm
For each imputed data set compute the difference parameter - you can use model constraints statement to form that, like this

[x] (m1);
[x] (m2);

model constraints: new(p); p=m1-m2;

Then combine the estimates of p over the imputed data sets.

Section 2,, explains how to combine the three estimates (they are actually two not three class1- class2 and class1 - class3, the third one is just the difference of these two and should not be included as it is a dependent statement, i.e., the hypothesis class1=class2 and class1=class3, is the same as the hypothesis class1=class2, class1=class3, class2=class3).
 Angela Nickerson posted on Wednesday, February 21, 2018 - 3:44 pm
Dear Dr. Asparouhov,

Thanks for your response. I have a follow up question.

For 3 classes, I understand theoretically how the difference between the first comparison (Class 1-Class 2) and the second comparison (Class 1 -Class 3) gives you the estimates for the 3rd comparison (Class 2-Class 3). But I’m not sure I understand how to test that comparison without putting it into the model. In this analysis, we are interested in finding out if classes differ on depression symptoms. So using this procedure, we would be able to tell if Class 1 had significantly higher depression symptoms than Class 2, and whether Class 1 had significantly higher depression symptoms than Class 3. However, I’m not sure how we would determine the statistical significance of the difference in depression between Class 2 and 3. To get this result, should we put all three comparisons in the model?
For example my syntax is:

MODEL: %C#1%
[DEP_MEAN] (m1);
[DEP_MEAN] (m2);
[DEP_MEAN] (m3);
model constraints: new(DEP_Diff1);

Thanks again for your help.
 Tihomir Asparouhov posted on Wednesday, February 21, 2018 - 4:57 pm
Yes - that will work
 Angela Nickerson posted on Wednesday, February 21, 2018 - 5:29 pm
Thank you so much for your help Dr Asparouhov. One final question - in order to analyse the means of variables by latent classes, I'm assuming we would need to use a manual two step BCH process, where the latent classes were fixed with BCH weights in step 1, and then the model constraints are added in step 2 for each dataset? Or is there a
more straightforward way of doing this?

Thank you again.
 Tihomir Asparouhov posted on Thursday, February 22, 2018 - 8:15 am
Because of the imputation, there is no easier approach. If you can avoid the imputation and use simple FIML estimation that accounts for the missing data than you can use simply the auxiliary command.
 Angela Nickerson posted on Thursday, February 22, 2018 - 4:40 pm
Thank you very much Dr Asparouhov. Your advice has been extremely helpful.
 Dong Shuyang posted on Tuesday, April 24, 2018 - 2:48 am
Dear Dr Asparouhov and Dr Muthen,
I attempt to use the 1st-step syntax from Webnotes #21 Section 3.2 for each of my imputed datasets. However, I always get the ERROR as follows:
The following MODEL statements are ignored:
* Statements in Class 1:
[ EEC38$1 ]
[ EHT38$1 ]
[ EPA38$1 ]
[ ESD38$1 ]
[ EUI38$1 ]
* Statements in Class 2:
[ EEC38$1 ]
[ EHT38$1 ]
[ EPA38$1 ]
[ ESD38$1 ]
[ EUI38$1 ]
One or more MODEL statements were ignored. These statements may be
incorrect or are only supported by ALGORITHM=INTEGRATION.

My syntax is this and I'm using Mplus V8.0.

ESD38 EUI38...etc;
CENSORED = EEC38(a) EHT38-EUI38(b);

STARTS = 50 5;



I also notice that SAVE=BCHWEIGHTS is not available with ALGORITHM=INTEGRATION.
Could you help me out?
 Bengt O. Muthen posted on Tuesday, April 24, 2018 - 2:35 pm
The [y$1] expression refers to a threshold of a categorical variable, not to a mean/intercept of a continuous variables. You don't declare any of your variables as categorical.
 Dong Shuyang posted on Monday, April 30, 2018 - 2:51 pm
Thank you, Dr. Muthen. Another question is that following Dr. Asparouhov's suggestion in this topic, I computed the difference parameter for each imputed data set. However, I just noticed that the classification for each imputed data set is different. In some sets, class 1 refers to the high-concern group, while in some other sets, class 2 refers to the high-concern group, at least according to the means and variances of the indicators.
Need I swap the difference parameter between two classes (groups) for different datasets when combining the estimates over the imputed datasets to use the formulas on ?
Or this actually indicates that the classification solution is not good based on the MI datasets with AUXILIARY in LCA models?
 Tihomir Asparouhov posted on Monday, April 30, 2018 - 4:28 pm
You have to align the classes. We actually do this internally when Mplus combines the imputations. The way to do this is as follows. Run the first data set with the option "output:svalues". This will give you a model statement with great starting values. Copy that model statement and use it to run the rest of the imputations with the analysis:starts=0. This should align the classes. If the classification is not good for a particular run (this has nothing to do with the imputations) you will get a separate warning message from Mplus but I don't think this is the case.
 Tihomir Asparouhov posted on Monday, April 30, 2018 - 4:31 pm
Also your auxiliary command maybe should be something like this
 Dong Shuyang posted on Monday, May 14, 2018 - 3:10 am
Dear Dr. Tihomir Asparouhov,
Thank you for the helpful advice. One more question is to check whether I am using the formula correctly (perhaps also help other newbies like me). Correct me if I am wrong in any steps.

(1) I define and use model constraint to obtain the mean-difference between class 1 and class 2 (let’s say the 2-class solution is chosen) on variable A (x = A1-A2), so I will use the adjusted Wald test to decide whether x = 0.

Now I have x1, x2, x3…x30 for 30 imputed datasets. (i = 1, 2, 3, …30)

(2)Thus, I will have mean_x = 1/30 * SUM (x1, x2, …,x30);

(3) For each of x1, x2, x3…x30, I also have SE1, SE2, SE3,…, SE30 from the output. Thus, I have Vi = SEi^2 (V1 = SE1^2, V2 = SE2^2, …, V30 = SE30^2).

(4) Then to calculate V = 1/30 * SUM (V1, V2, V3…,V30) + (30+1)/[30*(30-1)] * SUM [(xi – mean_x)^2].

(5) Finally, W^2, which in the webnote MI7 (pp3-4 for Wald test) is 'W' in the text, is said to be a chi-square distribution. We need to calculate
W^2 (df = 1) = [(mean_x - 0)^2] / V
for p value, and decided whether it is significant or not.
 Tihomir Asparouhov posted on Monday, May 14, 2018 - 5:25 pm
All correct.

Instead of step 5 you can just use the Z-test: Z=mean_x/sqrt(V). If abs(Z)>1.96 the mean difference is statistically significant.
 Katharine Buek posted on Wednesday, November 21, 2018 - 7:03 am
I am using the 3-step method for regressing a distal outcome onto 10 latent classes as per Asparouhov & Muthen 2014, using the logits from the LCA to account for classification uncertainty in my regression. However, I notice that the logit for my 10th class is 0, the whole column has only zeroes. And it doesn't like this when i put it in the regression. Why is this occurring? Thank you!
 Bengt O. Muthen posted on Wednesday, November 21, 2018 - 2:35 pm
You don't use the last zero column. See the appendices for the Mplus Web Note 15 on our web site.
 fred posted on Saturday, March 30, 2019 - 11:22 pm
A have avery basic question about the use of BCH method,
What do the class specic (thresholds) values of 1,-1 , and 1* in the example below denote, and how is the order of these values given? I have a 2 class model with 7 binary indicators (U1-U7) that I want to include in a 3step BCH model and struggle to set these.

[ U1$1-U8$1*-1.0]
[ U1$1-U4$1*1.0 U5$1-U8$1*-1.0]
 Bengt O. Muthen posted on Sunday, March 31, 2019 - 3:32 pm
Perhaps you are looking at page 11 of web note 21. Note that it says:

"Starting values are provided so that the class order does not reverse from the
generated order. In real data analysis starting values are not needed. Instead, a
large number of random starting value should be set using the starts command."
 fred posted on Sunday, March 31, 2019 - 10:08 pm
Thanks, yes, I am looking at that example.
So the whole model command:
[U1$1-U8$1*-1.0 ] etc Is not nessecary in first step?

Also how does one specify the large number of starting values, would you kindly point to an example?
 Bengt O. Muthen posted on Monday, April 01, 2019 - 5:16 pm
Q1: That's right.

Q2: E.g. Starts = 100 20;
 fred posted on Monday, April 01, 2019 - 9:53 pm
Thank you very much! To follow up, the example provides class specific regression, is it possible to also include Y on C in the model overall command at the same time if the interest is in checking the effect of C on the outcome AND the class specific regressions of Y on Xs?
 Bengt O. Muthen posted on Tuesday, April 02, 2019 - 4:36 pm
You don't say Y ON C but instead the effect of C on Y is captured by different Y means for the different classes.
 fred posted on Wednesday, April 03, 2019 - 11:37 pm
I see, so the means are given in the output (by the outline of the example on P11) without having to specify a command such as D3step or DCON?
 Bengt O. Muthen posted on Thursday, April 04, 2019 - 5:34 pm
Yes. They are part of the model when you take the manual approach.
 fred posted on Tuesday, April 09, 2019 - 12:32 pm
Thanks, now I have run the analysis as suggested by Webnote 21 and get the error message that Type=Mixture is not avialable for multi group analysis. What does this indicate since I have used the xact code on page 11 (without starting values).
 Bengt O. Muthen posted on Tuesday, April 09, 2019 - 5:42 pm
Send your output to Support along with your license number.
 Mary M Mitchell posted on Thursday, June 20, 2019 - 4:42 pm
I am trying to estimate a latent class regression (3 classes on a single covariate) using the auxiliary command from the webnotes. I am using the R3STEP but it does not retain the class proportions from the unconditional model. I'm using the following syntax:

USEVARIABLES ARE [list of variables];
classes = c(3);
AUXILIARY = race_1(R3STEP) ;
MISSING are all (-6 9999);
TYPE = Mixture;
starts = 0;

Can you please tell me what I am doing incorrectly?

Thanks for your help!

 Bengt O. Muthen posted on Saturday, June 22, 2019 - 6:30 am
Send the outputs from your 2 steps to Support along with your license number.
 Steven Hope posted on Friday, July 03, 2020 - 12:26 am
Dear Prof. Muthen

Is there an equivalent of the FAQ sheet "Odds ratios from thresholds of binary distal outcomes in mixtures" for nominal distal outcomes? Otherwise please could you advise how I might specify this?

Many thanks

 Bengt O. Muthen posted on Saturday, July 04, 2020 - 6:03 am
No, but end of UG chapter 14 describes how to produce probabilities for a nominal DV. That can then be expanded to odds and odds ratios. All of this can be expressed in Model Constraint.
 Steven Hope posted on Thursday, July 09, 2020 - 7:40 am
Many thanks! Can I please just check my code for 3 classes assuming a nominal distal with 3 categories would be:




Model Constraint:
New(prob1 prob2 prob3 prob4 prob5 prob6
odds1 odds2 odds3 odds4 odds5 odds6
or15 or26 or35 or46);

prob1 = 1/(1+exp(d1));
prob2 = 1/(1+exp(d2));
prob3 = 1/(1+exp(d3));
prob4 = 1/(1+exp(d4));
prob5 = 1/(1+exp(d5));
prob6 = 1/(1+exp(d6));
odds1 = prob1/(1-prob1);
odds2 = prob2/(1-prob2);
odds3 = prob3/(1-prob3);
odds4 = prob1/(1-prob4);
odds5 = prob2/(1-prob5);
odds6 = prob3/(1-prob6);
or15 = odds1/odds5;
or26 = odds2/odds6;
or35 = odds3/odds5;
or46 = odds4/odds6;
 Bengt O. Muthen posted on Thursday, July 09, 2020 - 6:10 pm
The denominators of the probs are wrong - there are more terms; look at how it's done in chapter 14.
 Steven Hope posted on Tuesday, July 21, 2020 - 8:19 am
Many thanks for your suggestions! I have made the following changes - would you mind confirming whether this is now correct (again for 3 classes assuming a nominal distal with 3 categories):




Model Constraint:
New(prob1 prob2 prob3 prob4 prob5 prob6
odds1 odds2 odds3 odds4 odds5 odds6
rrr15 rrr26 rrr35 rrr46);

prob1 = exp(d1)/(1+exp(d1)+exp(d2));
prob2 = exp(d2)/(1+exp(d1)+exp(d2));
prob3 = exp(d3)/(1+exp(d3)+exp(d4));
prob4 = exp(d4)/(1+exp(d3)+exp(d4));
prob5 = exp(d5)/(1+exp(d5)+exp(d6));
prob6 = exp(d6)/(1+exp(d5)+exp(d6));
odds1 = prob1/(1-prob1-prob2);
odds2 = prob2/(1-prob1-prob2);
odds3 = prob3/(1-prob3-prob4);
odds4 = prob4/(1-prob3-prob4);
odds5 = prob5/(1-prob5-prob6);
odds6 = prob6/(1-prob5-prob6);
rrr15 = odds1/odds5;
rrr26 = odds2/odds6;
rrr35 = odds3/odds5;
rrr46 = odds4/odds6;

Many thanks

 Bengt O. Muthen posted on Wednesday, July 22, 2020 - 6:03 pm
It will be clearer if you compute not only

prob1 = exp(d1)/(1+exp(d1)+exp(d2));
prob2 = exp(d2)/(1+exp(d1)+exp(d2));

but also the third probability which is 1-(prob1+prob2) = 1-prob1-prob2.

When you compute odds1, you are contrasting prob1 with the above 3rd-categoru prob - is that what you wanted?
 Steven Hope posted on Friday, July 24, 2020 - 4:39 am
Many thanks for your response. Yes, as in a standard multinomial regression, where the third outcome category is the base category for the odds, and class 3 is base for the relative risk ratio.

Am I right in thinking that your suggestion is to calculate the probabilities for the third category for clarity only, so that odds1 would become:


Otherwise the original syntax is all correct?
 Bengt O. Muthen posted on Friday, July 24, 2020 - 3:39 pm
 Steven Hope posted on Monday, July 27, 2020 - 3:25 am
Brilliant - many thanks again!
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message