LCA and binary outcome variable PreviousNext
Mplus Discussion > Latent Variable Mixture Modeling >
 Anonymous posted on Friday, April 19, 2002 - 9:41 am
Dear Linda & Bengt,

I am wondering if it is legitimate to use a binary instead of a continuous variable as an outcome in LCA? Thanks.


 bmuthen posted on Friday, April 19, 2002 - 6:11 pm
LCA is for categorical outcomes. LPA for continuous outcomes. Maybe you are thinking of something else?
 Anonymous posted on Monday, April 22, 2002 - 6:37 am
Dear Bengt,

sorry for the misunderstanding. What I meant to ask was the following: In a LCA, where I use a certain set of categorical latent class indicators (u's), is it legitimate to use a categorical instead of continuous variables as outcomes (y's). Example 25.9B on page 268 uses a continous outcome. I am looking forward to your reply.


 bmuthen posted on Monday, April 22, 2002 - 9:35 am
I see what you mean. Yes, you can use a categorical "distal outcome" instead of a continuous one. This distal is then simply yet another indicator of the latent class variable because there is no difference statisticallly between this and regressing the distal on class. When you have more than one categorical distal, however, you need to watch out for the fact that the 2 distals become independent given class and that is maybe not what you want.
 Anonymous posted on Monday, April 22, 2002 - 10:47 am
Dear Bengt,

thank you for your reply. If I understand you correctly, the categorical outcome also needs to specified in the list of categorical variables. Additionally, instead of adding a line for the mean and variance (for continuous y's), I just provide start values for the threshold.


 bmuthen posted on Tuesday, April 23, 2002 - 12:35 pm
Yes and yes.
 Anonymous posted on Tuesday, March 09, 2004 - 9:59 am
I would like to model two periods, where in period 1 a person can be in one of three mutually exclusive states(c1,c2,c3) and in the second period they could be in one of the three mutually exclusive states. My interest is in predicting class membership in period 2 given class membership in period 1 and covariates. Is this possible in Mplus?
 bmuthen posted on Wednesday, March 10, 2004 - 7:28 am
I think you are considering an unordered categorical (nominal) variable at two time points and want to relate these two variables. This can be done by letting the variables be represented by perfectly measured latent class variables, one at each time point. In Mplus 2.14 this is done by a single latent class variable with 3 x 3 = 9 classes (see paper #86 on the Mplus home page), while in the soon to be released Version 3 it is done by regressing the time 2 variable on the time 1 latent class variable.
 Anonymous posted on Wednesday, July 28, 2004 - 7:32 am
Dear Linda & Bengt,

I would like to perform LCA, where each mixture is an IRT model (logistic regression model with random intercept). My question is whether Mplus is capable of handling such models.

 BMuthen posted on Wednesday, July 28, 2004 - 9:01 am
Yes, this can be done in Mplus Version 3. Our experience to date shows that with binary observed variables, it can be hard to estimate such a model unless the mixture is very clear, while with ordered polytomous observed variables it is easier.
 Anonymous posted on Wednesday, July 28, 2004 - 9:27 am
Dear Bengt,

thank you very much for your prompt answer. I actually would like to fit discrete mixture models, where each mixture is a 2PL model, with random person parameter, and compare the models with different restrictions. (My intention is to do it with both dichotomously and polytomously scored items. But in separate analyses, not mixing the two types of scoring.)

 bmuthen posted on Thursday, July 29, 2004 - 8:21 am
It will be interesting to see how this works out. I am looking for good examples to illustrate these new methods, so please let me know of any successes.
 JISUN CHOI posted on Wednesday, August 25, 2010 - 12:40 pm
Dear Linda,

Hello. I am interested in using Mplus to do a mixture regress analysis and have a couple of basic questions.

1. I saw the example 7.1(mixture regression analysis for a continuous dependent variable).

My dependent variable is a binary variable. Can I test this model using a binary dependent variable instead continuous dependent variable?

2. I am also interested in looking at whether individuals in different latent classes vary in terms of their background (covariates - i.e. race and education level) on the relationship bewteen independent and dependent variable.

Can I get profile information (i.e. frequency or portion)of these covariates associated with each latent class membership?

or Can I get plot for the findings to see visibly different group membership?

I will appreciate if you tell me some references about that.

 Linda K. Muthen posted on Wednesday, August 25, 2010 - 3:45 pm
1. See Example 7.3.

2. You can regress the latent class variable on a set of covariates to see which are related to class membership. You can also use the AUXILIARY (e) option. See the user's guide for details. I don't know how you would plot this information.
 JISUN CHOI posted on Thursday, August 26, 2010 - 10:28 am
Dear Linda,

Thank you very much for responding very quick. Your response raised one more question.

I saw Example 7.3. In my understanding I can do this latent class analysis if all dependent variables are binary that refer to binary latent class indicators.

What I tried to do is a mixture regression analysis. I am interested in looking at the relationship between job satisfaction (a binary dependent variable) and several continuous covariates.

I think it might be closer to Example 7.2. And, I thought that I can use logistic regression with a categorical latent variable. Is it possible?

Thanks a lot for your time, advice, and suggestions.
 Linda K. Muthen posted on Thursday, August 26, 2010 - 10:51 am
I'm sorry. You should just add the CATEGORICAL option to Example 9.1 if your dependent variable is binary. For this model it is difficult to have slopes vary across classes. You may only be able to allow intercepts to vary.
 JISUN CHOI posted on Thursday, August 26, 2010 - 1:02 pm
No problem.
I guess you mean Example 7.1?

Many thanks.
 Linda K. Muthen posted on Thursday, August 26, 2010 - 3:23 pm
 Alana Steffen posted on Wednesday, July 09, 2014 - 1:34 pm
I have a 4 class model based on 13 binary indicators and have included some residual associations. I would like to have class predict binary distal outcomes. Is it valid to use auxiliary and (e)? If not, could you please direct me to an appropriate example?
Many thanks!
 Bengt O. Muthen posted on Wednesday, July 09, 2014 - 3:59 pm
I would use the DCAT option of Auxiliary.
 Alana Steffen posted on Wednesday, July 09, 2014 - 6:07 pm
Thank you for the suggestion. I had tried that previously but got an error message that did not seem to apply to my analysis.

are not available with TYPE=MIXTURE in conjunction with ALGORITHM=INTEGRATION.

However, I am not using algorithm=integration but rather parameterization=rescov. Any other suggestions?
 Bengt O. Muthen posted on Wednesday, July 09, 2014 - 6:36 pm
Please send output and license number to
 May Chen posted on Sunday, May 28, 2017 - 12:01 pm

I would like to test whether class membership predicts a categorical distal outcome. Based on the table in Mplus webnote 21, I've gathered that the best way to do this is through the DCAT option. However, I would also like to include covariates so that the effect of latent class on the distal outcome is controlled for by those covariates. Do you have any suggestions on how to proceed? Can the manual versions of BCH or DU3STEP be used in this case?

Thank you!
 Bengt O. Muthen posted on Monday, May 29, 2017 - 6:15 pm
Use the manual DCAT version to do this; see web note 15.
 Bo Rolander posted on Tuesday, July 25, 2017 - 1:40 am
Dear Dr Muthén
We are new with LCA analysis and have done an with 8 variables belonging to the same factor. Each variable has 5 response options and there are 649 respondents in response.

We have used the following analytical code with examples for three classes:

TITLE: Latent class analysis with
Continuous latent (ordinal) variables;
DATA: FILE = Database_fa1.DAT;
Auxiliary = ID;
CLASSES = C (3);

SAVEDATA: file = Database_fa1_3;
Save = cprobabilities;

We wonder if this is the right method of analysis.

Then we wonder about the model fit values. See below

Number of Free Parameters 34


H0 Value -15087,483
H0 Scaling Correction Factor 18.8377
Information Criteria

Akaike (AIC) 30242.966
Bayesian (BIC) 30397,752
Sample-Size Adjusted BIC 30289.795
(N * = (n + 2) / 24)

Information about ex BIC says it should be low. Are the wrong values we look at or is the analysis incorrect? If the analysis is correct then should we report the AIC and BIC values when the analysis is to be described?
Grateful for help
Bo Rolander
 Bengt O. Muthen posted on Tuesday, July 25, 2017 - 6:05 pm
I don't know what you mean when you say "belonging to the same factor". Factors are typically continuous latent variables. Perhaps you just mean the same construct (irrespective of scale type).

My best advice is to study the handout and video of the Topic 5 short course on our website - see
 Viktoria Vibhakar posted on Thursday, September 14, 2017 - 3:36 am
Hi I have run an equality test of Means/Prob across classes for a 3 class LCA with binary indicators and 3 binary distal outcomes.

The Ref class 3 is the highest symptom class and the first 2 classes are being compared in relation to it. But I need to know how odds ratios for the outcome for the 2 higher symptom classes using the low symptom class as ref class
I would like to change the ref class to Class 1 which is the low symptom class so that it will calculate ORs on the binary outcomes. How is this done. Thanks
 Bengt O. Muthen posted on Thursday, September 14, 2017 - 4:10 pm
You can use the SVALUES option to save the estimates and then use a class-reordered version of those as starting values in a STARTS=0 run.
 Lauren Armstrong posted on Thursday, June 07, 2018 - 3:52 am

I am looking at whether there are differences in a categorical variable depending on which latent class a participant is in.

I run the analysis and it produces the equality tests of means/probabilities across classes, which includes prob(SE), odds ratio(SE) and confidence intervals.

How do I also see the N for each category in each latent class?

 Bengt O. Muthen posted on Thursday, June 07, 2018 - 4:16 pm
If you don't have any missing data on the distal outcome, the N's will be the same as in the Step 1 class frequencies.
 Lauren Armstrong posted on Friday, June 08, 2018 - 4:48 am
Maybe I haven't explained properly.

I have the following syntax:

file = LCA_Visibility.dat;
listwise = on;

names = Vis Prepare Blame Group Independ
Ruminate Distance Avoid Aware Enjoy Secrecy

Usevariable = Vis Prepare Blame Group
Independ Ruminate Distance Avoid Aware Enjoy Secrecy Accept;
missing = all (999);

classes = c(3);
auxiliary = Vis(DCAT);

type = mixture;

In the output I am presented with:

Class 1
Category 1
Category 2
Category 3
Category 4

(plus extra columns with prob, SE, odds ratio etc)

The categories relate to each of the 4 levels of my categorical variable. I can see the probability of each category belonging to each class, but I can't see the N of each category in each class.

I can't seem to find this anywhere else in the output.
 Tihomir Asparouhov posted on Saturday, June 09, 2018 - 7:41 pm
You would have to multiply the number from these two tables


Class 1
Category 1 0.975
Category 2 0.025
Category 3 0.000
Class 2
Category 1 0.000
Category 2 0.551
Category 3 0.449
Class 3
Category 1 0.073
Category 2 0.445
Category 3 0.482




1 348.97900 0.69796
2 94.55895 0.18912
3 56.46206 0.11292

So in class 1 category 1 occur for 348.97900*0.975=340.254525 individuals
 Lauren Armstrong posted on Monday, June 11, 2018 - 5:35 am
Ah thank you, of course that does make sense!

Just one final thing I wish to clarify:

If I have labelled my variable in the dataset as
0 = no
1 = yes
2 = other

where the variable refers to presence of mental health disorder, then 'category 1' will represent the probability of someone without the trait being in the class and 'category 2' will represent the probability of someone with the trait being in the latent class and category 3 will represent 'other'?
 Bengt O. Muthen posted on Monday, June 11, 2018 - 5:49 pm
Are you referring to your distal outcome "Vis"? Are you saying that it is a nominal variable?
 Lauren Armstrong posted on Wednesday, June 13, 2018 - 12:47 am
Yes and yes
 Bengt O. Muthen posted on Wednesday, June 13, 2018 - 11:51 am
With your statement

auxiliary = Vis(DCAT);

your Vis variable will be treated as "categorical", that is ordinal with more than 2 categories, not nominal. You might want to create 2 binary variables out of your 3-category nominal Vis variable and run one at a time.
 Lauren Armstrong posted on Friday, June 15, 2018 - 3:06 am
Ah sorry I made a mistake- I am referring to a new distal outcome which IS categorical.

so If I have a categorical variable, which in my data set I have coded

does Category 1 represent o=no, Category 2 represent 1=yes and Category 3 represent 2=other?
 Bengt O. Muthen posted on Friday, June 15, 2018 - 2:25 pm
Yes. Which sounds like a nominal variable, not ordinal.
 Lauren Armstrong posted on Wednesday, June 20, 2018 - 2:13 am
Thanks- you would be correct, yes.

One final question (hopefully!):

I am running the LCA with a continuous variable. It is a measure of coping, where each subscale of the measure represents how often the participant engages in a particular coping strategy.

It suggests 3 classes is the best fit- is there a way I can see if the mean score for each coping sub-scale differs between the 3 classes? I can see that class one has the highest mean level of 'avoidance' for example, but how can I tell that this mean is statistically different from the lower means for this subscale in class two and three?
 Bengt O. Muthen posted on Wednesday, June 20, 2018 - 11:54 am
Give parameter labels for the class-specific means in the Model command and then use the Model Constraint command to express any difference in means that you are interested in, e.g.

diff = mean2 - mean1;
 Lauren Armstrong posted on Thursday, June 21, 2018 - 3:44 am
Thank you.

Do I have to use a TECH command to get the parameter labels?
 Bengt O. Muthen posted on Thursday, June 21, 2018 - 3:22 pm
No. See Model Constraint in the UG index for how to do it - the first part describing NEW is sufficient.
 Nicholas Barr posted on Friday, June 21, 2019 - 1:57 pm

I am fitting an LCA with a categorical outcome variable (nonhon) with the Auxiliary DCAT option. I've used parameterization=rescov to allow for residual covariances for class indicator variables where local independence assumptions are violated. However, when I use this option, I am no longer able to obtain standard errors for model estimated probabilities for latent class indicators. Is there a way I can obtain these standard errors?

Thank you

LCA of discharge predictors 3 class;
FILE IS C:DisPredictMplus.csv;
NAMES ARE ptsddx tbidx depdx subdx chronpdx
speed carac drreck fight weap tobac
duib duid duia arrestb arrestd arresta
dvb dvd dva sxb sxd sxa pclt plccut phq9t phq9cut
alct alctrd alccutre phq15t phq15cut
male female nonhon id;
USEVARIABLES ARE duib duid arrestb arrestd dvb dvd sxb sxd;
CATEGORICAL ARE duib duid arrestb arrestd dvb dvd sxb sxd;
CLASSES = class(3);

STARTS = 200 20;

dvb WITH dvd;
arrestb WITH arrestd;

OUTPUT: tech10 tech8;
 Bengt O. Muthen posted on Saturday, June 22, 2019 - 6:48 am
You would have to express the probabilities in Model Constraint and thereby get those SEs. Also try the RESIDUAL option of the Output command.
 Nicholas Barr posted on Monday, June 24, 2019 - 10:43 am
Great, thank you Dr. Muthen. I'm not 100% clear about what you mean by "express the probabilities in Model Constraint". Would you be able to point me towards a resource for that?

I'll include the residual option as well.

 Bengt O. Muthen posted on Monday, June 24, 2019 - 3:39 pm
If y is a binary latent class indicator and you have 2 classes, you specify parameter labels in the Model command:

[y$1] (t1);
[y$1] (t2);

and then in the Model Constraint command:

new(prob1 prob2);
prob1 = 1/(1+exp(t1));
prob2 = 1/(1+exp(t2));

This gives you the probability estimates and their SEs and p-values.
 Nicholas Barr posted on Monday, June 24, 2019 - 6:26 pm
Great, thank you!
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message