LCA with binary and categorical indic... PreviousNext
Mplus Discussion > Latent Variable Mixture Modeling >
 dagmar posted on Thursday, February 07, 2002 - 5:17 pm
Is it possible to do LCA with binary and categorical indicators, as well as continuous outcomes? Mplus manual includes examples of LCA with binary indicators and continuous outcomes and of LCSA with 3-category latent class indicators. How would the syntax be combined? Thanks. Dagmar
 Linda K. Muthen posted on Thursday, February 07, 2002 - 5:47 pm
Yes, this is possible although it would not technically be Latent Class Analysis or Latent Profile Analysis. Indicators that are on the CATEGORICAL list would be binary or polytomous and their thresholds would be referred to useing the $ convention, while variables that are not on the CATEGORICAL list would be assumed to be continuous and would be referred to by their variable names.
 dagmar posted on Sunday, February 17, 2002 - 5:09 pm
Hello, I tried writing the syntax for this nameless analysis, but can't get it to run. Some of the categorical variables are binary, others have three to five categories. I don't know how whether this is the right way to deal with this. Thanks very much for your help--the error message is at the end of the syntax file. Dagmar

TITLE: Categorical and Continuous Variables; MS Survey

FILE IS "C:\My Documents\work\MS\Profile Analysis\ms.dat";

NAMES ARE id msmh9 es1 uhw2 speech thinking pain1 pain4 edss
marryliv cours2 female educ lwith vscat age_10 dur_10 wk_40
perinc cesd_10 hhinc50 volhalf hhwk_5 fat_5 socsup10;
USEVARIABLES ARE msmh9 es1 uhw2 speech thinking pain1 pain4
edss marryliv cours2 female educ lwith vscat age_10 dur_10
wk_40 perinc cesd_10 hhinc50 volhalf hhwk_5 fat_5 socsup10;
CLASSES = c (1);
CATEGORICAL = msmh9 es1 uhw2 speech thinking pain1 pain4
marryliv cours2 female educ lwith vscat;

LOGHIGH = +15;
LOGLOW = -15;
LOGCRITERION = 0.0000001;
CONVERGENCE = 0.000001;
MCONVERGENCE = 0.000001;


FILE IS rawdatams1;
FILE (SAMPLE) IS samplems1;
FILE (RESULTS) IS resultsms1;
FILE (TECH3) IS tech3ms1;
FILE (TECH4) IS tech4ms1;


[msmh9$1*6 es1$1*5 uhw2$1*4 speech$1*3 thinking$1*2 pain1$1*1
pain4$1*0 marryliv$1*-1 cours2$1*-2 female$1*-3 educ$1*-4
lwith$1*-5 vscat$1*-6];
[msmh9$2*7 uhw2$2*6 speech$2*5 thinking$2*4 pain4$2*3
cours2$2*2 educ$2*1 lwith$2*0];
[speech$3*6 thinking$3*5 pain4$3*4 educ$3*3 lwith$3*2];
[thinking$4*6 educ$4*5];

[edss*-1 age_10*-1 dur_10*-1 wk_40*-1 perinc*-1 cesd_10*-1
hhinc50*-1 volhalf*-1 hhwk_5*-1 fat_5*-1 socsup10*-1];
edss age_10 dur_10 wk_40 perinc cesd_10 hhinc50 volhalf hhwk_5 fat_5 socsup10;

Error message:
*** WARNING in Model command
All variables are assumed to be y-variables. Check that the covariances
between these and other variables are as intended.
*** ERROR in Model command
Unknown threshold value 2 for variable MSMH9
*** ERROR in Model command
Ordered thresholds 3 and 4 for class indicator PAIN4 are not
increasing. Check your starting values.
The following MODEL statement(s) in Class 1 are ignored:
[ MSMH9$2 ]
 Linda K. Muthen posted on Tuesday, February 19, 2002 - 9:36 am
What the error message is trying to tell you is that PAIN4 has four thresholds. You are giving starting values for three, so we are assigning zero to the fourth by default and zero is smaller than the value assigend to threshold 3.

Regarding MSMH9, it must be binary but you are assigning two thresholds so we are ignoring the second.

It might be a good idea to run a regular TYPE=BASIC on your data to see how may thresholds each variable has according to Mplus to make sure this is what you expect.
 Howard Degenholtz posted on Monday, October 21, 2002 - 1:06 pm
I am new to IRT/LCA methods, and have a question. I have data that were collected where respondents had a choice between a dichotomous (yes/no) response and 4 point likert type (degree of agreement) response.

Is this something that can be done using MPlus?

Thank you,
 bmuthen posted on Monday, October 21, 2002 - 2:28 pm
Do you mean that the subject chose the response format or that 2 response types were randomly offered to subjects? If the former, I guess one has to figure out which type of person tended to choose which format and if the response of similar people measured by the different formats relate the same way or differently to other variables. I think I have seen analogous situation in achievement testing situations.
 Howard posted on Monday, October 21, 2002 - 5:47 pm
The former: individual respondents chose the response format. Respondents in this case are elderly nursing home residents. Those who were more cognitively impaired were more likely to use the dichotomous response format. (There may be other variables, but we have not done thorough exploration of this.) We have done some multiple group CFA on our instruments to see if the same multidimensional structure holds up. We found evidence of stability, but also some differences. However, this could only be done by recoding the dichotomous data onto the 1-4 scale (using a z-score). IRT has been recommended as a preferred approach to calibrating the response formats. It seems like we could do a 2-class LCA model using cognitive status as a predictor of class membership. Part of my confusion has to do with the fact that we want to do CFA on our multidimensional structure, then proceed to causal modeling. However, I think the LCA part and the factor structure are interdependent. For example, what if a particular item should be deleted or assigned to a different latent variable? Won't that change the associated IRT parameters?

As I write this, I wonder if it matters - once we have fit the best model, won't we optimize both the factor structure and the IRT part?

Thank you for your reply this afternoon. I appreciate your further advice.
 bmuthen posted on Tuesday, October 22, 2002 - 10:15 am
I think the measurement modeling is strongly intertwined with the structural modeling in this situation. It is hard to give general advice without knowing the situation better, but I'll give you some quick thoughts. One can certainly jointly analyze the people to took the 2-point questions and the 4-point questions - without reformatting the 4-point data. And one can see if descriptively similar factor structures arise. And then relate factors to other variables. Such a 2-group analysis is, however, unusual in that both variables and people are different across groups, whereas typically only the people are different (and they are also typically obtained through random sampling which is not the case here because people choose group membership) - so at the end we are not sure we are measuring the same thing. This would also be true for the LCA. You mention CFA, factor structure, IRT, and LCA - it is not clear to me if you think of the first 3 as different - I don't. I see LCA and CFA as complementary procedures and both would be affected by the non-random choice of response formats, a non-random choice that may distort structural conclusions. I don't know if one can assume in this case that conditional on cognitive impairment status, the choice between the 2 formats is random, and I am not sure how this could be incoporated into the modeling.
 Howard posted on Tuesday, October 22, 2002 - 11:15 am
Well, I put a lot of ideas out there, but perhaps I should be more specific about my goals. We have collected the data in this way to allow people with limited cognitive abilities to provide as much information as they can (instead of being coded as missing). We now want to combine the data together. So my question is, can we use LCA to do that? If so, how? and what assumptions would we have to make about the multidimensional structure?
Thank you.
 bmuthen posted on Tuesday, October 22, 2002 - 1:44 pm
Tell me a bit more. How are you thinking about the LCA - e.g. how would the LCA classes be different from the known groups of individuals (those who chose the 2-cat format and those who chose the 4-cat format)? Are you thinking that there are 2 classes of individuals, having to do with cognitive impairment, and the tendency to choose one format over the other is related to those 2 classes?
 Howard posted on Wednesday, October 23, 2002 - 10:41 am
Yes, I am thinking that there is a tendency to use one response format or the other. You are right that the class is observed: we know their choices. What is not observed is why.

One point to clarify: The choice is at the item level - some people choose 2-cat for only a small number of items, using 4-cat mostly. Others use 2-cat mostly.
 bmuthen posted on Wednesday, October 23, 2002 - 6:03 pm
Let me think aloud although it may not get at what you are after. It sounds like it would be valuable to achieve a calibration of the 2 formats, so that one can translate the response to one of the formats into the other format. If not, you don't know if relationships between the response and covariates differ across people because of the format difference or because of people differences.

The problem of getting a calibration is that nobody has taken both formats. Optimally, a random subset of individuals should take both formats. I wonder if it is possible to calibrate even without this. Perhaps you have to first study (by logistic regression with a binary dependent outcome and with cognitive impairment etc as predictors) who chooses which format. And then see, for example if impairment is the key factor, how format outcomes relate for people with the same impairment, but who chose different formats. Once the responses on different formats are calibrated, you can get to the structural part.

But, my answer may be beside the point. I remember that ETS had a related situation where people could choose a reading topic and then answer questions on it. Perhaps other readers remember this, or have input to give.
 Howard posted on Wednesday, October 23, 2002 - 7:25 pm
If we were to do a logit to predict response format, how would that help us calibrate the data? It sounds like we would still need to do an experimental study where people did both response formats.

If we were to plan such an experiment where we had a sample of people respond to both forms (in random order), the following questions arise: (1) how would we analyze the data to develop the proper calibration parameters; and (2) how would we determine the sample size requirements for such a study? In some places I have read that to do this with IRT requires very large samples eg n > 1000.

It occurs to me that within each level of cognitive impairment (on a 6 point scale), there are some people who used each format. Could we use these groups for calibration? There may be some heterogeneity with respect to cognitive impairment within each level of our cognitive impairment score. However, this would give us at lease a rough estimate of the parameters we are interested in.

Thank you very much for all of your advice so far.
 bmuthen posted on Thursday, October 24, 2002 - 6:53 am
Your paragraph "It occurs to me..." refers to the approximate calibration I had in mind in my paragraph "The problem of...". The logistic regression would make is plausible that impairment was a key factor in the choice.

If you had an experiment with a random subsample of people responding to both forms, you have 3 subgroups of people (one group has the 2-cat format, 1 has the 4-cat format, 1 has both formats). You could calibrate within a 3-group analysis while you are also doing the structural analysis. By which I mean that the random subsample has 2 indicators (1 2-cat, 1 4-cat) which enables identification of a factor. Each of the groups with only one indicator sets its threshold and loading equal to the loadings and thresholds of these 2 indicators. I don't think you need n> 1000; it depends on the number of variables. But I think I will now step aside from this topic and leave room for a consultant who can get more familiar with your particular situation.
 Howard posted on Thursday, October 24, 2002 - 9:01 am
Thank you very much for all of your careful attention to these questions.

If I understand your last comment correctly, if we had a sample that did both forms, we could set the parameters as you mention. But in the absence of such a sample we need to use an 'approximate' calibration approach.

This has been very helpful - I feel much better about what questions to ask of a consultant.

Thank you,
 bmuthen posted on Thursday, October 24, 2002 - 9:46 am
Yes. Good.
 Sarah Varughese posted on Monday, November 15, 2004 - 9:46 pm
I am new to MPLUS and LCA. The problem I have at hand is I have to assign values ranging from 0 to 100 which are called the disability weights to a sample of 200 individuals based on certain 5 continuous variables which can be considered as indicator variables. Can this be handled under the umbrella of latent structure analysis and with MPLUS.

Thank You
 Linda K. Muthen posted on Tuesday, November 16, 2004 - 10:24 am
It sounds like you may want factor analysis rather than latent class analysis. LCA is used to group individuals. Factor analysis is used to group variables.
 Sarah Varughese posted on Tuesday, November 16, 2004 - 8:46 pm
Thank you for your prompt reply. Is it possible to consider it as a case of grouping individuals into 100 different classes because what I ultimately require is a disability score for each individual that could range from 0 to 100 based on the 5 variables which are indicators of the patients health condition.

Thank You
 Linda K. Muthen posted on Wednesday, November 17, 2004 - 6:12 am
Perhaps the following paper can help you understand the difference between factor analysis and latent class analysis. It can be download from the Mplus website from Mplus papers.

Muthén, B. & Muthén, L. (2000). Integrating person-centered and variable-centered analyses: Growth mixture modeling with latent trajectory classes. Alcoholism: Clinical and Experimental Research, 24, 882-891.
 Tom Hildebrandt posted on Tuesday, November 23, 2004 - 6:21 am
I was wondering if there were guidelines for the local dependence assumption in LCA, in terms of how to handle indicators that may be correlated (approx = .65). I have 10 indicators I'm interested in using, but 2 of the variables are correlated. The rest of the indicators don't correlate higher than .35. It is my understanding that local dependence is likely violated here. Should those that have a higher correlation be combined? Does MPLUS have any features for handeling local dependence?
 Linda K. Muthen posted on Tuesday, November 23, 2004 - 6:46 am
I don't think that the assumption of conditional independence in LCA says anything about the correlation of the observed latent class indicators. I think that the assumption means that the model is estimated such the the residual covariances among the observed latent class indicators are zero.
 Tom Hildebrandt posted on Tuesday, November 23, 2004 - 7:47 am
Thank you for your prompt reply. I had recently recieved a review on a manuscript that used LCA to define subtypes of weightlifters in terms of body image disturbance and the reveiwer claimed that the correlations between observed indicator variables suggested violations of conditional independence. I will look into the residual covariances. Do you know of any usefull references on this assumption in LCA?
 bmuthen posted on Tuesday, November 23, 2004 - 8:19 am
Just to add to this discussion. Note that the LCA model - with its assumption of conditional (local) independence - tries to explain correlations among observed variables. With zero observed correlations there is nothing to explain. So high correlations among observed variables is not a sign that the model assumptions are violated, on the contrary. Think about how 2 variables become correlated by mixing 2 classes of uncorrelated individuals, one class low on both variables and one class high on both variable (plot it and you will see).

You may also want to model local DEpendencies among some variables within class if that makes substantive sense - an this can be done using Mplus. For a good reference, see the edited book

Hagenaars, J.A & McCutcheon, A. (2002). Applied latent class analysis. Cambridge: Cambridge University Press.
 Tom Hildebrandt posted on Tuesday, November 23, 2004 - 9:53 am
This is very helpful indeed. How would I go about modeling conditional (local) dependencies in Mplus?

I will order the book today.
 bmuthen posted on Tuesday, November 23, 2004 - 10:24 am
For continuous outcomes it is easy, just say y1 with y2 within a certain class. For other outcomes it is harder but the Version 3 User's Guide has an example. Note, however, that you should have a good substantive reason for making this deviation from the standard model.
 Salma Ayis posted on Tuesday, June 06, 2006 - 3:52 am
How can I detect for local dependency among the indicators while fitting a Latent class model?
 Linda K. Muthen posted on Tuesday, June 06, 2006 - 10:24 am
You can look at the standardized bivariate residuals in TECH10. If you have many large standardized residuals, you can increase the number of classes and see if that reduces the residuals. You can also add a factor to the model that has as indicators the latent class indicators.
 Alexandre Morin posted on Monday, January 28, 2008 - 10:36 am

I would like to predict continuous outcomes (out1-out4 from a categorical latent variable (latent profile analysis, 7 continuous indicators - 5 classes). I tried the following syntax (with numerical integration) but it does not work: out1 out2 out3 out4 ON C#1-C#4

Is there any other way to do it. I imagine I could just add the outcomes to the usevariables list without any other specifications but then they will be used in the clustering algorythm and I do not want these variables to influence the classification (as outcomes).

Thank you in advance for your time.
 Linda K. Muthen posted on Monday, January 28, 2008 - 11:51 am
You would add the variables to the USEVARIABLES list as shown in Example 8.6. These variables will influence the classes.
 Alexandre Morin posted on Monday, January 28, 2008 - 12:30 pm
Greetings and thank you for the prompt answer.

However, if I do not want the outcomes to influence the classes ?

The only things I can think of is to work from the best fitting model without outcomes and then to fix "@" class characteristics (means, variances) before including the outcomes. However, given the fact that I have almost as many class indicators as I have outocomes, I'm afraid it may reflect badly on the model.

Else, I will have to work from the saved class membership and analyse the outcomes via MANOVA. But then, I will loose the information from the posterior class probabilities...

Any advice ?

Thanks again.
 Linda K. Muthen posted on Monday, January 28, 2008 - 12:40 pm
I would not recommend either of these approaches. When you have a distal outcome, the parameters of interest are the means and how they vary across classes. I would instead use the AUXILIARY option with the e setting to test the equality of the means of the distal outcome variables across classes. See the user's guide for more information about this option.
 Alexandre Morin posted on Monday, January 28, 2008 - 1:19 pm
Thank you again Linda,

That is a very nice improvement from version 5!

So nice, that I have a new question.

Reading the Technical appendice on this function, I saw that it can be used for any covariate. When would you suggest the use of this method versus the direct inclusion of covariates as predictors of class membership? In my case, I have a theoretical/pratical rationale (antecedents are included as predictors of c#1-c#4 in the model and outcomes, which arrive after, are kept out of the model so as not to influence something occuring before) but are there any other arguments to consider in this choice (other than the slower computation and harder interpretability already noted in the appendice)?
 Linda K. Muthen posted on Monday, January 28, 2008 - 3:57 pm
The AUXILIARY option is good to use to select covariates when you have many covariates to choose from. It is quicker than including them all in the model to assess their importance.

A distal outcome is a dependent variable not a covariate.
 Alexandre Morin posted on Tuesday, January 29, 2008 - 6:27 am
Thanks a lot Linda,

You are right, as usual. Excuse the confusion, I got carried over by the new feature.
 Sara posted on Sunday, February 17, 2008 - 12:16 pm
I have 2 questions regarding the "new" function of the AUXILIARY option to test the equality of means for variables not used in the mixture model (auxiliary variables).
First, I have 3 classes and used this function ("auxiliary" with "e" behind auxiliary variables of interest) to produce "EQUALITY TESTS OF MEANS ACROSS CLASSES USING POSTERIOR PROBABILITY-BASED MULTIPLE IMPUTATIONS WITH 2 DEGREE(S) OF FREEDOM" and the correponding class means. I can see that the omnibus test for a variable is significant. Is there a way to produce Class by Class tests of mean differences (post-hoc tests)?

Second, where can I find more information regarding exactly how these means and tests are being computed using "POSTERIOR PROBABILITY-BASED MULTIPLE IMPUTATIONS". I couldn't find anything regarding this in the MPLUS Manual 5.

Thanks so much.
 Linda K. Muthen posted on Sunday, February 17, 2008 - 5:32 pm
There is not currently a way to do class by class tests. See the following link for a description of the method:
 Alexandre Morin posted on Monday, March 10, 2008 - 7:37 am

Is it possible, with the auxilliary (e) command, to obtain estimates of class-specific variances (in addition to class specific means and Wald test) for the auxilliary variables ? I yes, how ?
 Bengt O. Muthen posted on Monday, March 10, 2008 - 8:00 am
Not in the current version - such information is obtained only if the variable is in the model (e.g. as a covariate). What would you use the class-specific variance for - are you aiming to get the standard error for each mean?
 Alexandre Morin posted on Monday, March 10, 2008 - 8:08 am

I was mostly trying to be report the results as completely as possible. With means and variances, box plot would be possible. Would aiming to get the means SE be a good idea ?
 Bengt O. Muthen posted on Monday, March 10, 2008 - 8:11 am
In the upcoming Mplus version 5.1 the SEs for the means will be provided.
 Herb Marsh posted on Wednesday, March 12, 2008 - 2:28 am
I have conducted an LPA based on 8 y variables used to define 5 groups (arrows from ys to latent groups). I also have a set of 10 criterion variables that might be considered as auxiliary variables in MPLUS terminology. I have done separate analyses based on:

I have a latent profile analysis with 8 y variables that define 5 groups and 10 criterion variables as auxiliary variables.

I can test of the equality of the criterion variable means across the five latent groups. Although I understand that these are based on pseudo-class draws (equality test of means across latent classes using Wald chi-square based on draws from posterior probabilities, a strategy seems to be a combination of relating the auxiliary variables to group dichotomies and group probabilities), I am not sure how to interpret them.

The values for auxiliary variables are, I assume, means for each variable on the corresponding pseudo groups (classes in the output).

How do these relate to the corresponding path coefficients or odds ratios in analyses in which the criterion variables are treated as covariates?

Can I determine the amount of variance in one set of variables (group dichotomies or probabilities) explained by the other (auxiliary variables)? – or even the amount of variance in each grouping variable (dichotomy or probability) explained by the set of auxilary variables?
 Bengt O. Muthen posted on Wednesday, March 12, 2008 - 9:25 am
Regarding the equality testing of means for auxiliary variables (criterion vbles; I'll call them x's in the following) across classes, you are correct in your interpretation. I see this as univariate information and not informative for including all the auxiliary variables as predictors of latent class membership (c in the following). So this is just a first step for seeing which x's might be useful. This information does not directly related to having several of the x's as predictors of c. In Mplus Version 5.1 coming out next month we will have a second-stage approach where we do pseudo-class draws and provide output for "c on x" multinomial regressions. This new output is relevant for choosing a set of x's.

Regarding the amount of variance explained, I don't think that is a common usage in logistic regression or multinomial logistic regression. Perhaps a better way to understand how important x's are is to plot the probability curves for c on x as can be done in Mplus graphics.
 Jinseok Kim posted on Thursday, May 29, 2008 - 6:36 am

I conducted an LCA with covariates (7 indicators; 12 covariates) using a complex survey data, which seems to work fine. However, when I call in the saved data (using savedata: command) into stata to do some more next step analyses, I found that the raw class counts based on the most likely LC membership in Mplus output are different from the same from the saved data. Later I found that the proportion in mplus (based on most likely class membership) is same as the estimated proportion in the saved data (using stata survey estimation module). Now, the question is which class counts would you recommend to report? Would you rather not report the class counts at all in survey data? It seems to me that the class counts in complex survey data may be kind of misleading. Any suggestion would be most appreciated. Thanks.
 Linda K. Muthen posted on Thursday, May 29, 2008 - 2:10 pm
If you are not using Version 5.1, please do. If you are, please send your input, data, output and license number to
 Argyris posted on Thursday, August 14, 2008 - 12:33 pm
I would like to estimate categorical variables, such as gender or diagnostic group, as potential class predictors in an LCA. I know I can test the equality of means for continuous variables using the "auxiliary" option. Is there an option for categorical variables? Is it robust/legitimate to use the same?
Also, is there a way to get the proportions of categorical variables per class (as in a contingency table) in the output without having to read them from the histogram?
Many thanks.
 Linda K. Muthen posted on Thursday, August 14, 2008 - 1:57 pm
Covariates are either binary or continuous. In both cases they are treated as continuous. So you would use the AUXILIARY option the same for both.

If you ask for SAMPSTAT in the OUTPUT command, you will obtain these values.
 Argyris posted on Thursday, August 14, 2008 - 2:34 pm
Thanks very much.
Sampstat seems to only provide information on those variables that are an explicit part of the model and not the auxiliary variables. But I might be missing something.
 Linda K. Muthen posted on Thursday, August 14, 2008 - 2:50 pm
You are correct. Then you can get them from the AUXILIARY option. The means are given for each class.
 Argyris posted on Friday, August 15, 2008 - 5:28 am
Thank you.
Yes, true about the means in Auxiliary. It does not, however, seem to provide the proportions of an (auxiliary) categorical variable per latent class, say the proportion of males per latent class 1. I seem to be able to get this through the histogram by resting the mouse on the relevant bar. I just wanted to ensure that is right and find out whether there was an alternative. Sorry to persevere.
 Linda K. Muthen posted on Friday, August 15, 2008 - 8:22 am
The mean of a binary variable is a proportion.
 Argyris posted on Friday, August 15, 2008 - 8:51 am
Quite. These means/proportions of auxiliary categorical variables do not appear anywhere in the output though--only in the histograms, it would seem. The means provided in the equality of means part of the output are the means of the binary codings of the variables (eg. 1.43 for a variable coded between 1 and 2;see copied output below). The query is more about confirming that what I get by resting the cursor on the bars is, indeed, the mean/proportion, given it appears nowhere else. Sorry about the confusion.
Thanks very much.

Mean S.E.

Class 1 1.321 0.053 Class 2 1.454 0.055
Class 3 1.537 0.037 Class 4 1.520 0.031
 Linda K. Muthen posted on Friday, August 15, 2008 - 9:03 am
You must not be using the most recent version of the program. It has means.
 Nan Zhang posted on Monday, February 23, 2009 - 9:45 pm
I want to save the posterior probabilities and the related syntax is as following:

savedata: file is D:\09spring\RA(take action)\data\binaryresults;

There is error in the output:
*** ERROR in SAVEDATA command
The syntax for the FILE option has changed. Please refer to the
Mplus User's Guide for available options.

Is there any change for the savedata option or anything wrong with my command?
 Linda K. Muthen posted on Tuesday, February 24, 2009 - 6:45 am
This looks correct to me. Please send the input, data, output, and license number to for further help.
 Blair Beadnell posted on Tuesday, March 24, 2009 - 1:43 pm
When there are categorical indicators in an LCA, the output includes the LATENT CLASS ODDS RATIO RESULTS section. I understand that these odds ratios represent the comparisons of each pair of classes. However, i'm not sure how to interpret the statistical signficance test. The way it's calculated, it appears to be a test of whether the odds ratio differs significantly from 0. Is that correct? (In other words, one should not interpret these to mean that the odds ratio is significantly different than 1?). Is there a way to use these results to calculate whether the odd ratio differs from 1?
 Bengt O. Muthen posted on Tuesday, March 24, 2009 - 7:13 pm
Yes, the 3rd column of the Mplus output is always Est./S.E., so this tests against zero. To test against the more relevant value of 1 in this case, you simply consider

(Est. - 1)/S.E.

But it is more common to provide a confidence interval around the odds ratio point estimate. This interval can be requested in the Mplus Output command saying Cinterval. As in the literature, these intervals are derived from those of the logodds and then exponentiated.
 Jinseok Kim posted on Tuesday, October 20, 2009 - 11:47 pm
I conducted an LCA with auxiliary option. It ran with not problem but presented no SE values for a variable but "*******". Does this mean that the SE of the auxiliary variable is not computed or something like "out of range"? In either case, is there anyway that I can get the SE values for the variable for each class?
 Linda K. Muthen posted on Wednesday, October 21, 2009 - 6:20 am
Asterisks would mean the value is too large to fit in the space allowed. For further information, please send your output and license number to
 Richard Silverwood posted on Thursday, July 01, 2010 - 2:38 am
I am conducting a pretty simple LCA on complete cases only (LISTWISE IS ON). When I use the AUXILIARY option to compare means across classes for some auxiliary variables the number of observations used in the model is reduced - presumably because Mplus is now using the subset of observations for which all the model variables *and* all the auxiliary variables are non-missing. Is it possible to make Mplus fit the model to the same subset of observations it would if AUXILIARY was not specified, but still compare means for the auxiliary variables (where available)?

Many thanks.
 Linda K. Muthen posted on Thursday, July 01, 2010 - 9:09 am
Before Version 6 the listwise deletion was done on the full set of variables inlcuding the auxiliary variables. This has been changed in Version 6 so that the listwise deletion is done on only the analysis variables.
 K Frampton posted on Thursday, April 21, 2011 - 10:07 am
Hello. I have a 5 class LCA with 7 continuous indicators. I'm trying to decipher if there are significant class differences in particular indicators' means (e.g., if class 1 and class 2 are signficantly different from each other on indicator 1). This seems so simple, but I'm having a hard time interpreting the output. Is this provided? If not, any guidance in how to calculate this is much appreciated! Thank you!
 Linda K. Muthen posted on Thursday, April 21, 2011 - 11:45 am
You can use MODEL TEST to do this. See the user's guide for further information.
 Patchara Popaitoon posted on Monday, July 04, 2011 - 11:02 am
Dear Linda,

I would like to test a relationship using a binary variable as a dependent variable and a few continuous variables as the predictors. I have read from the manual that this type of analysis is allowed. Could you please confirm this.

Also, which chapter should I read if I'd like explore further regarding the above analysis.

Many thanks.

 Linda K. Muthen posted on Monday, July 04, 2011 - 1:28 pm
You can do this using probit or logistic regression. Both examples are shown in Chapter 3.
 anne marie mauricio posted on Friday, December 02, 2011 - 11:47 am

I am running a LCGA model with several continuous covariates and a continuous distal outcome-Time 2 Teacher report of Externalizing. I am controlling for Time 1 Teacher report of Externalizing on Time 2 report, however Time 1 report of externalizing is not currently included as a covariate in the model.

To estimate class differences on Time 2 externalizing controlling for Time 1 externalizing, is it enough to have "Extern2 on extern1" in the model or should I also include Time 1 externalizing as a covariate of class.

Thank you for your help,

 Bengt O. Muthen posted on Friday, December 02, 2011 - 2:22 pm
To clarify, are you talking about 2 different variables - "externalizing" and "Teacher report of Externalizing"? Perhaps the former is self-reported? So that you have these 2 variables at time 1 and also these 2 variables at time 2?
 anne marie mauricio posted on Friday, December 02, 2011 - 3:45 pm
Thank you for your quick response Dr. Muthen. My apologies for not being clear. I am talking about the same variable- both Time 1 and Time 2 externalizing scores are teacher report. I want to assess between class differences on teacher report of externalizing at Time 2 (Extern2) controlling for teacher report of externalizing at Time 1 (Extern1). I have 4 classes.

My syntax includes "Extern2 on Extern1*" and I allow this estimate to vary between classes. But do I also need to include Extern1 as a covariate of class membership? That is, do I also need to include:

C#1 on Extern1;
C#2 on Extern1;
C#3 on Extern1;

Thank you,

 Bengt O. Muthen posted on Monday, December 05, 2011 - 9:22 am
That depends on your substantive theory. Does latent class membership "exist" before Extern1 or can it be viewed as being influenced by Extern1?
 anne marie mauricio posted on Tuesday, December 06, 2011 - 10:15 am
Thank you so much for your response Dr. Muthen.
 Jaclyn Harron posted on Wednesday, February 08, 2012 - 3:32 am
Hi there,

I'm trying to save posterior probabilities and class assignments in a LCGM but I'm getting the following error:

*** ERROR in SAVEDATA command
The syntax for the FILE option has changed. Please refer to the
Mplus User's Guide for available options.

This is my syntax:

SAVEDATA: FILE IS C:\Users\Jaclyn\Documents\PhD\Analysis\General delinquency\
Linear LCGA - with FIML FINAL MODELS with assignment\LCGA count
ZIP (inc non-offenders FIML) FINAL\males;
save = cprobabilities;

 Linda K. Muthen posted on Wednesday, February 08, 2012 - 8:12 am
I can't see the problem. Please send the output and your license number to
 Ann-marie Faria posted on Thursday, March 15, 2012 - 8:27 am
I am running a multi-level latent profile analyses with three profiles at level 2 and only 1 profile at level 1. Students are level 1 and schools are level 2.

I am trying to determine if profile membership at level 2 distinguishes between the means (by profile) on level 1 auxiliary variables. However, when I add in the auxiliary command

Auxiliary = (2) X1 X2 X3;

I receive the following error message:
*** ERROR in VARIABLE command
Auxiliary variables with 'e' or 'r' specifier are not available with TYPE=MIXTURE
with more than one categorical latent variable.

I currently have the within variables listed in the Use Variables section as “within”- I tried to take them out there and still received the same message.

Could you please advise?
 Ann-marie Faria posted on Thursday, March 15, 2012 - 8:28 am
please excuse my typo, the auxiliary code actually reads

Auxiliary = (e) x1 x2 x3;
 Linda K. Muthen posted on Thursday, March 15, 2012 - 9:44 am
The error message refers to the fact that you have more than one categorical latent variable on the CLASSES list. I am not sure that AUXILIARY (e) is available with multilevel models.
 Ann-marie Faria posted on Friday, March 16, 2012 - 9:33 am
Hi Linda-
Thank you for your response. If auxiliary (e) is not possible given the multiple categorical latent variables, would you suggest setting up a series of Wald tests to see if the means at level 1 are signfiicantly different by level 2 profile?

Thank you,
 Linda K. Muthen posted on Saturday, March 17, 2012 - 10:44 am
Yes, you can use Wald tests.
 Ann-marie Faria posted on Wednesday, March 21, 2012 - 7:12 am
Hi Linda-
When I tried to use the Wald tests of equality, I am running into trouble because the variables I would like to test using model statements are at level 1. I am trying to test if auxiliary variables at level 1 are significantly different across the level 2 profiles. However, because my latent profile model has 3 between and 1 within profile, Mplus only sees one group/profile at level 1 and I cannot test the mean differences at Level 1. Also, it seems that Mplus will not let me call for the tests of equality on level 1 variables in the model between statements. Is there a way to get these mean differences at level 1 based on level 2 profile membership? Without outputting the profile memberships or predicted probabilities?
Thank you for your help,
 Bengt O. Muthen posted on Wednesday, March 21, 2012 - 9:00 pm
I don't think it is possible to do user-specified Wald tests based on auxiliary variables - that is, variables that are not part of the model - you have to have model parameter labels that you refer to.
 Mario Mueller posted on Thursday, August 09, 2012 - 3:38 am
Dear Linda,

I would like to run a LCA with four binary and three 3-level categorical indicators in a sample of n=60.000.

Can I use the the following command for the most simple model without any covariates(?):

categorical=tri1 tri2 tri3 bi1 bi2 bi3 bi4;

missing are all (-999);

STARTS = 5000 500;

I didn't find any examples in the UG.

Thanks, Mario
 Linda K. Muthen posted on Thursday, August 09, 2012 - 5:52 am
That looks fine.
 Jung-Ah Choi posted on Thursday, August 23, 2012 - 2:18 am
Dear Linda,

I'm trying to do LCA with binary variables. I performed multiple imputation to deal with missing data.

However, there are not output parts as "results in probability scale" and "odds ratio results" in my output. I guess it is because I put 5 imputation files. Then, how can I treat missing data when trying to do LCA? I know it is required that threre're no blanks(missing data) in data file.

I need your help.

 Linda K. Muthen posted on Thursday, August 23, 2012 - 8:32 am
These are not available with multiple imputation. If you want them, you can use MODEL CONSTRAINT to compute them. See the user's guide.
 Jung-Ah Choi posted on Saturday, August 25, 2012 - 6:22 am
Thanks so much, Dr. Linda,

I'm trying to do LCA using dataset with missing data. My syntax is as follows:

DATA: FILE=bulelem0825.dat;
VARIABLE: NAMES=id sex b1-b5 v1-v5;
CATEGORICAL= b1-b5 v1-v5;

However, I got error message,
"Categorical variable B1 contains 59 categories. This exceeds the maximum allowed of 10."

I can't understand this message because b1 is binary variable including 2 categories(0=no, 1=yes).

Could you tell me what this error messgae means and how to fix it?

I do appreciate your help.
 Linda K. Muthen posted on Saturday, August 25, 2012 - 12:20 pm
You are reading your data incorrectly. You either have more names in the NAMES list than you have columns of data or you have blanks in your data set and are reading it as free format.
 Chia-Yi Chiu posted on Saturday, September 07, 2013 - 1:58 pm
I am fitting binary data with the LCA in the context of cognitive diagnostic models. Because the model is a reparametrization of the logistic model (not directly a logistic model), I need to impose some constraints to the parameters. When the highest interaction term in the logistic model is 3-way, the iterations converge. However, when the highest interaction term goes up to 4-way, Mplus gives the following error message:



It looks like the starting value is an issue. However, I did not specify any starting value, but used


I then changed the number of random starts to

STARTS = 100 10;

But it did not work either. Any suggestion is welcome. Thanks!!
 Linda K. Muthen posted on Saturday, September 07, 2013 - 3:23 pm
Please send the output and your license number to
 Wen-Hsu Lin posted on Tuesday, September 22, 2015 - 7:00 am

The example 8.6 indicated that latent class c predicated the distal outcome u. How did we know whether the varying thresholds of u across classes are significantly different? or means are significantly different across classes for continuous distal outcome? Thank you.
 Linda K. Muthen posted on Tuesday, September 22, 2015 - 12:14 pm
You can use the Wald test of MODEL TEST for this. You label the thresholds of u in each class and use the labels in MODEL TEST. See the user's guide for further information.
 Wen-Hsu Lin posted on Tuesday, September 22, 2015 - 6:01 pm
Thank you Linda
 Wen-Hsu Lin posted on Tuesday, September 22, 2015 - 6:52 pm
Hi, again
after reading the example, I still did not get it. It kept showing warring. My command as follow:

usevariables are bmi1
sk1 sk2 sk3 sk4 sk5 sk6 sk8 sk9;
missing is blank;
categorical are sk1 sk2 sk3 sk4 sk5 sk6 sk8 sk9 bmi1;

weight is ipt1;

iteration = 50000;
model test:
is the last line correct? The example showed model constraint and symbol for manipulating. In my case, there is no line for bmi1 in the %overall%. So how do I ask Mplus to do the test? Thank you.
 Linda K. Muthen posted on Wednesday, September 23, 2015 - 6:33 am
You must specify the parameters, label them, and use the labels in MODEL TEST.


0 = p1 - p2;
 Wen-Hsu Lin posted on Wednesday, September 23, 2015 - 6:05 pm
This clears all up.Thanks a lot.
 Angelique posted on Thursday, September 22, 2016 - 1:26 am
Hi, I ran an LCA model. I also determined what predicts class membership. I would like to interpret the Odd's ratio but since some of the predictors are categorical variables (i.e.,B41A), I am having trouble interpreting the output.


Categorical Latent Variables

C#1 ON
SITE 0.863
A1 1.508
A2 0.552
A5B 0.547
B41A 0.643
PROSOC 0.498
AGEG 0.723
COMMS 0.921

Could you perhaps help?
 Bengt O. Muthen posted on Thursday, September 22, 2016 - 9:33 am
That's interpreted the same way as for multinomial logistic regression with an observed nominal DV. See for instance UG Chapter 14 or our new book, Chapter 5.
 Angelique posted on Friday, September 23, 2016 - 2:07 am
Thank you!
 samah Zakaria Ahmed posted on Sunday, January 22, 2017 - 12:03 pm
i have two latent class variables and 10 items (5 items for each latent variable), all items are binary. how can i write the model command to assign the items for each related latent class variable.
i tried to write:
c1 by u1-u5
c2 by v1-v5
but i found warning message
 samah Zakaria Ahmed posted on Sunday, January 22, 2017 - 12:33 pm
i have two latent class variables and 10 items (5 items for each latent variable), all items are binary. how can i write the model command to assign the items for each related latent class variable.
i tried to write:
c1 by u1-u5
c2 by v1-v5
but i found warning message
 Naomi Wright posted on Tuesday, August 22, 2017 - 9:49 am

When attempting a mixture model with both categorical and continuous indicator variables, I receive the error below (all caps). I did not specify the model and left the default random start values. It's possible that the model is non-identified, as this is an exploratory analysis, but I'm not sure if I am doing something wrong with the input. Also, when I run separate analyses with only the categorical or only the continuous variables, the syntax runs. Any thoughts would be greatly appreciated!



NUMBER IS -0.116D-18.

Parameter 63, %C#3%: [ INC_CAT$7 ]
 Linda K. Muthen posted on Tuesday, August 22, 2017 - 10:51 am
Please send the output and your license number to
 samah Zakaria Ahmed posted on Monday, January 22, 2018 - 4:46 pm
I want to ask about MODEL RESULTS in case of having 2 class latent variable and binary observed variables
what does Thresholds in each class refer to?
Does it refer to the coefficients of logit function directly(alpha and beta)?
logit(prob(y=1|z)=alpha + beta(z)
 samah Zakaria Ahmed posted on Monday, January 22, 2018 - 4:55 pm
and what is the reference category?(no.1 or 2?)
 Bengt O. Muthen posted on Monday, January 22, 2018 - 5:05 pm
Answered. Please post the same question only once.
 rgm smeets posted on Tuesday, February 12, 2019 - 10:52 am
Dear Mister Muthen,

I ran a LCA with 9 variables (continuous, binary and nominal variables). I want to run a regular LCA without anything defined in the MODEL. I neither included covariates. Therefore, I did not specify anything in the MODEL command in the input. Is this necessary to do?
 Bengt O. Muthen posted on Tuesday, February 12, 2019 - 5:22 pm
 J. Gayle Beck posted on Thursday, October 31, 2019 - 3:11 pm
Dr. Muthen,

I am running a latent class analysis with both dichotomous and continuous variables. It is my understanding that mixed mode data cannot be plotted in the same line graph since we plot probabilities for dichotomous or categorical variables and estimated means for continuous variables.

Is there any way to convert our estimates given for the continuous variables into probabilities that can be plotted with probabilities of the dichotomous variables?

Thank you.
 Bengt O. Muthen posted on Friday, November 01, 2019 - 5:20 pm
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message