LCA with Covariates, Distal, and Impu... PreviousNext
Mplus Discussion > Latent Variable Mixture Modeling >
 James Swartz posted on Thursday, May 25, 2017 - 9:30 am
I am running an LCA with 19 binary predictors and an N of 223. The best fitting model has two classes (entropy = .87). I have regressed this model on three covariate predictors. I am fine with and, in fact, want the predictors to influence class membership.

c#1 on catag2 k6cat aids_eve;

My questions:

1) These predictors reduce the N to 189 so I have rerun the model using multiple imputation. Using MI, I can't get the conditional probabilities of the indicators, only thresholds. Is there a way to get the results in probability scale?

2) I have two binary outcomes included using the auxiliary statement. Mplus won't run these using DCAT, which I use because I want odds ratios for the distal variables. What other option works?

3) I want to regress the distal outcomes the covariates to obtain direct effects (as well as the indirect effects through latent class), how is this specified in the model statement?

Once the model is running on non-imputed data, can I run the exact same model using imputed data? I presently get this error message for the imputed data:

Auxiliary variables with E, R, DU3STEP, DE3STEP, or BCH are not available for TYPE=IMPUTATION.

Thanks for any help.
 Bengt O. Muthen posted on Thursday, May 25, 2017 - 6:44 pm
By 19 binary predictors I assume you mean 19 latent class indicators.

1) No auto in Mplus but you can easily compute them by the usual translation from logits to probabilities.

2) See the tables at the end of web note 21

3) In the 1-step approach you simply say y ON x in each class.

Last question - the message correctly states that this is not provided in Mplus.
 James Swartz posted on Friday, May 26, 2017 - 12:16 pm
Thank you Dr. Muthen,

I did mean indicators...sorry.

With respect to number 3, perhaps I was unclear...

In the current statements I have

usevar ind1-ind19 age aids;
auxiliary (e) hospital er;

Then in the model statement, as indicated, I have:

c#1 on age aids;

If I then add to that statement (as I interpreted from your suggestion):

hospital on age aids;

I get this error message:

*** ERROR in MODEL command
Unknown variable(s) in an ON statement: HOSPITAL

If I try to include hospital on the usevar AND auxiliary command, I get the same error message.

In other words, it seems to be telling me I can't have a variable that is defined as a distal outcome on the auxiliary command and also include that variable in the model as a DV regressed on one (or more) of the covariate predictors of latent class.

Is there a way to do this?
 Bengt O. Muthen posted on Friday, May 26, 2017 - 5:46 pm
First, note that Auxiliary (e) is outdated - see the tables at the end of Mplus Web Note 21.

To add that direct effect you have to take "a manual" approach to 3-step as discussed in Web Note 15 and 21.
 James Swartz posted on Friday, May 26, 2017 - 5:56 pm
Thank you again. Understood.
 Soyoung Kim posted on Monday, December 18, 2017 - 5:02 am
Dear Dr. Muthen.

I would like to ask a question about multiple imputation (MI) with distal outcome in mixture model.

When I use BCH with latent class model, I got a error message:
*** ERROR in VARIABLE command
Auxiliary variables with E, R, DU3STEP, DE3STEP, or BCH are not available for TYPE=IMPUTATION.

It does not seem to be available to use them at once.

doing BCH analysis, if I want to get the last result of multiple imputation,
is it possible to calculate the mean value of estimates manually?

In other words,
I have made imputed data sets, and I got the results of each data set.
Is it possible to calculate the means of chi-square values and p-values for BCH results from the multiple data sets?

Thank you in advance.
 Tihomir Asparouhov posted on Tuesday, December 19, 2017 - 10:31 am
Yes, you can combine the results using the usual formula, see the bottom two formulas on page 3

Alternatively you can use the manual BCH approach with imputation. The manual BCH approach is described in Section 3
 Soyoung Kim posted on Tuesday, December 19, 2017 - 7:57 pm
Thank you so much for your response Tihomir.
it was very helpful. :-)

Best wishes,
 Soyoung Kim posted on Wednesday, December 20, 2017 - 1:06 am
Dear Dr. Asparouhov.
How can I combine the p-values?

I understand that I can calculate the mean of the chi-squares values from multiple results.
Is it okay to calculate the means of p-values for BCH results from the multiple data sets?

Thank you in advance,
 Tihomir Asparouhov posted on Wednesday, December 20, 2017 - 9:23 am
You can not combine the mean of the chi-square or the mean of the p-values. Instead see the bottom two formulas on page 3
Using these you can compute the point estimate P as well as the SE for that joint point estimate (the formula there is for the variance so you have to square the SE for the individual imputed data sets and at the end take the square root of the total variance). From there you can use the standard method p-value=2*Phi^{-1}(-P/SE), where Phi^{-1} is the inverse of the standard normal distribution function. For example, if P/SE=1.96 the p-value=0.05.
 Angela Nickerson posted on Sunday, February 18, 2018 - 8:52 pm
Dear Dr. Asparouhov,

I have used BCH for distal outcomes over 10 imputed datasets with 3 latent classes. I have combined the Means and SEs from datasets for each distal outcome for each Class, using the formulas that you mentioned on page 3 of So now I have combined Mean and SE of each distal outcome for each separate Class, similar to the output under EQUALITY TESTS OF MEANS ACROSS CLASSES USING THE BCH PROCEDURE. However, I'm still a little unclear about how I use these combined estimates to compare means between latent classes. From my understanding, if I was interested whether there was a significant difference between Class 1 and 2 on Distal outcome X, i would have (Class 1 combined M - Class 2 combined M)/combined SE. Is this correct? And if this is the case as I have three combined SEs from each Class, would I need to pool these estimates again, and what would be the proper method for doing this?

Thank you for your assistance.
 Tihomir Asparouhov posted on Tuesday, February 20, 2018 - 12:59 pm
For each imputed data set compute the difference parameter - you can use model constraints statement to form that, like this

[x] (m1);
[x] (m2);

model constraints: new(p); p=m1-m2;

Then combine the estimates of p over the imputed data sets.

Section 2,, explains how to combine the three estimates (they are actually two not three class1- class2 and class1 - class3, the third one is just the difference of these two and should not be included as it is a dependent statement, i.e., the hypothesis class1=class2 and class1=class3, is the same as the hypothesis class1=class2, class1=class3, class2=class3).
 Angela Nickerson posted on Wednesday, February 21, 2018 - 3:44 pm
Dear Dr. Asparouhov,

Thanks for your response. I have a follow up question.

For 3 classes, I understand theoretically how the difference between the first comparison (Class 1-Class 2) and the second comparison (Class 1 -Class 3) gives you the estimates for the 3rd comparison (Class 2-Class 3). But I’m not sure I understand how to test that comparison without putting it into the model. In this analysis, we are interested in finding out if classes differ on depression symptoms. So using this procedure, we would be able to tell if Class 1 had significantly higher depression symptoms than Class 2, and whether Class 1 had significantly higher depression symptoms than Class 3. However, I’m not sure how we would determine the statistical significance of the difference in depression between Class 2 and 3. To get this result, should we put all three comparisons in the model?
For example my syntax is:

MODEL: %C#1%
[DEP_MEAN] (m1);
[DEP_MEAN] (m2);
[DEP_MEAN] (m3);
model constraints: new(DEP_Diff1);

Thanks again for your help.
 Tihomir Asparouhov posted on Wednesday, February 21, 2018 - 4:57 pm
Yes - that will work
 Angela Nickerson posted on Wednesday, February 21, 2018 - 5:29 pm
Thank you so much for your help Dr Asparouhov. One final question - in order to analyse the means of variables by latent classes, I'm assuming we would need to use a manual two step BCH process, where the latent classes were fixed with BCH weights in step 1, and then the model constraints are added in step 2 for each dataset? Or is there a
more straightforward way of doing this?

Thank you again.
 Tihomir Asparouhov posted on Thursday, February 22, 2018 - 8:15 am
Because of the imputation, there is no easier approach. If you can avoid the imputation and use simple FIML estimation that accounts for the missing data than you can use simply the auxiliary command.
 Angela Nickerson posted on Thursday, February 22, 2018 - 4:40 pm
Thank you very much Dr Asparouhov. Your advice has been extremely helpful.
 Dong Shuyang posted on Tuesday, April 24, 2018 - 2:48 am
Dear Dr Asparouhov and Dr Muthen,
I attempt to use the 1st-step syntax from Webnotes #21 Section 3.2 for each of my imputed datasets. However, I always get the ERROR as follows:
The following MODEL statements are ignored:
* Statements in Class 1:
[ EEC38$1 ]
[ EHT38$1 ]
[ EPA38$1 ]
[ ESD38$1 ]
[ EUI38$1 ]
* Statements in Class 2:
[ EEC38$1 ]
[ EHT38$1 ]
[ EPA38$1 ]
[ ESD38$1 ]
[ EUI38$1 ]
One or more MODEL statements were ignored. These statements may be
incorrect or are only supported by ALGORITHM=INTEGRATION.

My syntax is this and I'm using Mplus V8.0.

ESD38 EUI38...etc;
CENSORED = EEC38(a) EHT38-EUI38(b);

STARTS = 50 5;



I also notice that SAVE=BCHWEIGHTS is not available with ALGORITHM=INTEGRATION.
Could you help me out?
 Bengt O. Muthen posted on Tuesday, April 24, 2018 - 2:35 pm
The [y$1] expression refers to a threshold of a categorical variable, not to a mean/intercept of a continuous variables. You don't declare any of your variables as categorical.
 Dong Shuyang posted on Monday, April 30, 2018 - 2:51 pm
Thank you, Dr. Muthen. Another question is that following Dr. Asparouhov's suggestion in this topic, I computed the difference parameter for each imputed data set. However, I just noticed that the classification for each imputed data set is different. In some sets, class 1 refers to the high-concern group, while in some other sets, class 2 refers to the high-concern group, at least according to the means and variances of the indicators.
Need I swap the difference parameter between two classes (groups) for different datasets when combining the estimates over the imputed datasets to use the formulas on ?
Or this actually indicates that the classification solution is not good based on the MI datasets with AUXILIARY in LCA models?
 Tihomir Asparouhov posted on Monday, April 30, 2018 - 4:28 pm
You have to align the classes. We actually do this internally when Mplus combines the imputations. The way to do this is as follows. Run the first data set with the option "output:svalues". This will give you a model statement with great starting values. Copy that model statement and use it to run the rest of the imputations with the analysis:starts=0. This should align the classes. If the classification is not good for a particular run (this has nothing to do with the imputations) you will get a separate warning message from Mplus but I don't think this is the case.
 Tihomir Asparouhov posted on Monday, April 30, 2018 - 4:31 pm
Also your auxiliary command maybe should be something like this
 Dong Shuyang posted on Monday, May 14, 2018 - 3:10 am
Dear Dr. Tihomir Asparouhov,
Thank you for the helpful advice. One more question is to check whether I am using the formula correctly (perhaps also help other newbies like me). Correct me if I am wrong in any steps.

(1) I define and use model constraint to obtain the mean-difference between class 1 and class 2 (let’s say the 2-class solution is chosen) on variable A (x = A1-A2), so I will use the adjusted Wald test to decide whether x = 0.

Now I have x1, x2, x3…x30 for 30 imputed datasets. (i = 1, 2, 3, …30)

(2)Thus, I will have mean_x = 1/30 * SUM (x1, x2, …,x30);

(3) For each of x1, x2, x3…x30, I also have SE1, SE2, SE3,…, SE30 from the output. Thus, I have Vi = SEi^2 (V1 = SE1^2, V2 = SE2^2, …, V30 = SE30^2).

(4) Then to calculate V = 1/30 * SUM (V1, V2, V3…,V30) + (30+1)/[30*(30-1)] * SUM [(xi – mean_x)^2].

(5) Finally, W^2, which in the webnote MI7 (pp3-4 for Wald test) is 'W' in the text, is said to be a chi-square distribution. We need to calculate
W^2 (df = 1) = [(mean_x - 0)^2] / V
for p value, and decided whether it is significant or not.
 Tihomir Asparouhov posted on Monday, May 14, 2018 - 5:25 pm
All correct.

Instead of step 5 you can just use the Z-test: Z=mean_x/sqrt(V). If abs(Z)>1.96 the mean difference is statistically significant.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message