Predictors of latent class membership PreviousNext
Mplus Discussion > Latent Variable Mixture Modeling >
 Anonymous posted on Thursday, November 15, 2001 - 12:53 pm
I have a six class model, and one of the classes has the lowest number of members (around 15). When I add X(0,1) as a predictor, this smallest class divides up as 13 0's and 2 1's. This seems to be causing a problem, especially since I have other predictors in the model as well. What do I do in this case? And does Mplus have a conditional maximum likelihood method? Thanks.
 Bengt O. Muthen posted on Thursday, November 15, 2001 - 3:39 pm
Yes, this happens with small classes. You probably have a slope and an intercept for this small class that start going towards extreme values because almost everyone has the same x value in this class. You can fix these coefficients at plus or minus 15 to get estimates for the other parameters.
 Bob Stephens posted on Thursday, January 10, 2002 - 3:14 pm
I am running a LCA with 8 dichotomous u variables indicating status of utilization of specific mental health services. The best model I came up with has 3 classes. I know want to run a model with a dichotomous covariate (type of service delivery system) predicting the latent categorical variable. I specify the following:

c#1 on siteid;
c#2 on siteid;

[incouns2$1*0 meds2$1*1 cm2$1*-1 specedu2$1*3 gpfcoun2$1*2 resinpt2$1*-2 placmt2$1*-3];

[incouns2$1*3 meds2$1*2 cm2$1*-3 specedu2$1*-1 gpfcoun2$1*1 resinpt2$1*1.5 placmt2$1*0];

[incouns2$1*-3 meds2$1*-2 cm2$1*3 specedu2$1*1 gpfcoun2$1*-1 resinpt2$1*-1.5 placmt2$1*2];

I get a message saying that the model estimation terminated abnormally due to an ill-conditioned Fisher information matrix. And there is a problem with the gamma(c) matrix starting value for the regression of c#2 on siteid. Do you have any advice on how I should proceed? I have tried specifying different starting values for c#2 on siteid, but to no avail. Thanks for any insight you can provide.
 Anonymous posted on Friday, January 11, 2002 - 10:59 pm
Most probably the "siteid" variable is constant (or close to constant) in class 2. That makes the slope in "c#2 on siteid" unidentifiable, however by fixing that slope to zero (or any other number for that matter) you should get just as good a model (check the log-likelihood value to make sure). Also if you request Tech 7 in the output command, you can check whether the variance of "siteid" in class 2 is zero.

Another possibility is that the slope in "c#2 on siteid" is very large or very small, i.e., it approaches +/- infinity (and therefore it is unidentifiable). In that case fixing the slope to +/- 15 will produce an identified model that is just as good (check the log-likelihood of the two runs). This second possibility corresponds to the case when all subjects with certain "siteid" are classified in the same class.
 Anonymous posted on Friday, January 18, 2002 - 3:31 am
How is it that this parameter can be fixed to zero or any other number and you can get a good solution for the model?
 Anonymous posted on Saturday, January 19, 2002 - 1:09 am
Thanks for the question. Now I realize that my message from Jan 11 is incorrect. Sorry about the confusion…

When the variance of an X variable is 0 in class C then the slope of U on X in class C is unidentified and can be fixed to zero or any other number and you can get a good solution. This can not occur in the latent class regression (C on X).

An empirical non-identification in the latent class regression usually arises from empty cells in the joint distribution of C and X. Say that C=1,2,3 and X=0,1. If P(C=2,X=1)=0 then the slope C#2 on X is – infinity. If P(C=3,X=1)=0 then the slope C#2 on X is + infinity.

The bottom line is that you have an empirical non-identification. Something in the data makes the c#2 on siteid an unidentifiable slope. Take the ending value from the failed run and fix the slope to that value. This way you will be able to get SE for the rest of the parameters.

 Jason Bond posted on Monday, April 28, 2003 - 2:27 pm
Hello. I have a latent class problem where I have an observed class variable which has 5 classes.
So I have created 5 dummy indicators variables Y910 - Y914. I also have a continuous predictor
variable, BAC. Essentially, what I would like to do is estimate 5 cutpoints, or thresholds (call them
T1-T4) on the continuous variable BAC such that:

if BAC T4, the latent variable takes the value 5

So I estimated the model:

NAMES = BAC Y910 Y911 Y912 Y913 Y914;
Classes = C(5);
Categorical = Y910 Y911 Y912 Y913 Y914;

TYPE = Mixture;

C#1 on BAC;
C#2 on BAC;
C#3 on BAC;
C#4 on BAC;

[Y910$1*-1.5 Y911$1*0 Y912$1*1 Y913$1*1.5 Y914$1*2];

[Y910$1*0 Y911$1*-1 Y912$1*0 Y913$1*1 Y914$1*1.5];

[Y910$1*1 Y911$1*0 Y912$1*-1 Y913$1*0 Y914$1*.5];

[Y910$1*1 Y911$1*.5 Y912$1*0 Y913$1*-1 Y914$1*-5];

[Y910$1*1.5 Y911$1*.5 Y912$1*-.5 Y913$1*-1 Y914$1*-1.5];

It seems that what one should look at in the output, to accomplish the above goal, is the latent
class regression model part giving the intercept and slope of the regressions of Y910 - Y914 on BAC,
but I seem to be a little confused about where the estimates of the thresholds T1-T4 are obtained.
Any sugesstions for my confusion? Also, I seem to be having trouble with the thresholds approaching
the extreme values. Should I not be using all 5 category indicators Y910 - Y914, or could this just
be a starting values problem? Thanks much for any input.

 bmuthen posted on Monday, May 05, 2003 - 10:23 am
I have a difficult time understanding these questions. To move forward, instead of stating the problem in terms of the Mplus analyses you are contemplating, can you please describe conceptually what it is that you want to do? BAC sounds like blood alcohol concentration. What do the BAC thresholds correspond to? What are the y's? How should the relationship between BAC and y's be viewed? Stating these things very briefly might help me understand.
 D.Gross posted on Tuesday, November 02, 2004 - 3:49 am
I am trying to determine how many males and females from one nominal covariate (gender) are in one of my two classes.
Is there any command I can use to do that in Mplus?

Thank you
 bmuthen posted on Tuesday, November 02, 2004 - 11:47 am
Use Mplus graphics, where you can look at histograms of the analysis variables by class. Clicking on a histogram bar gives you the number of people for it.
 Anonymous posted on Tuesday, May 03, 2005 - 7:40 am
I am estimating a latent class growth mixture model with 2 classes and 2 predictors of the latent slope and latent intercept. While I assume the predictors to have a different impact on eta depending on class, I have strong reason to believe they are mutually uncorrelated (throughout all classes). By default, however, Mplus treats the predictors as correlated. The (default) model converges without any problems, however, after adding p1 WITH p2@0; in the %overall% model statement I get the message that "This latent class regression requires numerical integration" - why? Thank you very much!
 Linda K. Muthen posted on Tuesday, May 03, 2005 - 9:55 am
When you mention p1 and p2, they are no longer considered to be independent variables. They are brought into the model and numerical integration is required. Add ALGORITHM=INTEGRATION; to the ANALYSIS command.
 Anonymous posted on Tuesday, May 03, 2005 - 11:39 am
thank you for your extremely fast reply! Unfortunately, I still do not understand (sorry): All I want to do is fix a parameter to zero (independent of class membership). This should make the model more restrictive (are not the models even nested?) and I am having difficulties understanding why I have to change the estimation algorithm when the more general model is estimated without problems. Can you hint me to any references on this?
Thank you so much - this board is so helpful!!!
 bmuthen posted on Tuesday, May 03, 2005 - 12:09 pm
Does your model specify that the 2 predictors influence not only the 2 growth factors (intercept and slope) but also the latent class variable?
 Anonymous posted on Thursday, June 09, 2005 - 1:20 pm
Hello. I am estimating a latent class growth model and would like to get the predicted values for the outcome variable for each individual. So, for example, if the model places the ith person in class j, could I get a N by T (where N is the number of respondents and T is the number of time points available) dataset where for the ith row, the predicted values are those for class j? Thanks much,

 bmuthen posted on Friday, June 10, 2005 - 6:13 am
You can get a plot of these values using the PLOT command, requesting individual estimated values. When you are in the graphics module you can use the Save graph data menu function to save the individual values used in the plot.
 Tom Hildebrandt posted on Thursday, September 22, 2005 - 8:58 am
I have a conceptual question for LCA with covariates. I am trying to define subgroups of drug users based on the types drugs they usually use (y1-y14). I would like to evaluate several covariates and then based on the classification of cases compare groups on certain external validators (indicators of risk, drug side effects, etc.). As I understand the use of covariates, they should be background variables (age, race/ethnicity, income, etc.). My question is, are covariates in a sense "washing out" the effects of the covariate on latent class membership such that the differences between classes on a given drug use (y1-y14) are independent of the given covariate?

My second question is how to determine or justify the inclusion/exclusion of covariates into the model statistically. For instance, if I include age as a covariate, but find it does not significantly contribute to class membership, am I justified in dropping it from the model?
 bmuthen posted on Friday, September 23, 2005 - 8:11 am
Regarding your first question, the covariates do not wash out the differences in covariate means across classes. If a covariate has a strong effect on class membership then it will have different means in the classes.

Strategies for the inclusion/exclusion of covariates would follow the same lines of arguments as in conventional regression analysis -and I am not sure there is concensus even there...
 Alex posted on Wednesday, June 13, 2007 - 2:04 pm

In estimating a mixture model with covariate (time invariant, i.e. predictors) would you:
(1) estimate the best fitting model without covariates (i.e. estimate a 1, 2, 3, 4, 5, ... models and decide on the best one) and then include covariates only in the best fitting model; OR
(2) Estimate all models with the covariates before choosing the best one; OR
(3) do something else ?

Thank you
 Linda K. Muthen posted on Thursday, June 14, 2007 - 5:36 am
The strategy I would use is to include covariates only after the number of classes has been determined. If adding the covariates changes the class structure, this might point to the need for direct effects from the covariates to the latent class indicator variables. These direct effects represent measurement non-invariance. If they are needed, then the process may need to be done again taking this into account.
 Alex posted on Thursday, June 14, 2007 - 7:39 am
Thank you very much,

Would you allow these direct effects to vary across classes ?
 Linda K. Muthen posted on Thursday, June 14, 2007 - 9:23 am
A model with direct effects that vary across classes may be difficult to estimate depending on the data.
 Jennifer M. Jester posted on Tuesday, June 10, 2008 - 8:54 am
In a growth mixture model, is it possible to determine to what extent the predictors can predict membership in the classes? For instance, I have a 4 trajectory class solution for development of alcohol symptoms across age 15 to 45 and I want to know how well I can predict membership in one of these classes based on predictors at age 15. I'm hoping for something like percent correctly classified.


 Bengt O. Muthen posted on Tuesday, June 10, 2008 - 4:05 pm
Percent correctly classified is used in logistic regression where the categorical dependent variable is observed. Here it is not observed so we don't know what the true status is. I haven't seen a classification approach done in this case.

One approach one can consider is what we do in our Mplus teaching on growth mixture modeling using the reading data example. We estimate the full model with the outcomes for all time points. Then we fix parameters at those values and use the model as a measurement instrument, that is, only estimate the posterior probabilities for each class and individual. Here, you can consider less than the full information. For example, you can tell the program that you have no data on the outcomes by setting those at the missing data flag. Thereby you would use as observed variables only the covariates predicting class membership. You can then crosstabulate most likely class membership when using only covariate information with that using full information. This tells you if you do well or not using only the covariate information in terms of specificity and sensitivity.
 Sarah Dauber posted on Tuesday, September 02, 2008 - 6:11 pm
I am running a 4-class GMM model with covariates. I am trying to determine whether the covariates significantly distinguish among the classes. I know I have to regress the latent class variable (c) on the covariates, but how do I set a particular class to be the reference group? I would like to look at all possible comparisons (ie vary the reference group so all 4 groups are compared with each other). How do I do this?

Thank you,

Sarah Dauber
 Linda K. Muthen posted on Tuesday, September 02, 2008 - 6:15 pm
We give all possible comparisons. You don't have to do anything.
 Sarah Dauber posted on Tuesday, September 02, 2008 - 6:29 pm
The output I got did not give all possible comparisons. Maybe I am specifying something wrong in the input? I have pasted the model part of my input below.
type=mixture missing;
starts=100 10;
i s q | aodpda7@0 aodpda8@1 aodpda9@2 aodpda10@3 aodpda11@4 aodpda12@5
aodpda13@6 aodpda14@7 aodpda15@8 aodpda16@9 aodpda17@10 aodpda18@11;
i s q on newage d4revx bleduc gender;
c#1 on newage d4revx bleduc gender;
c#2 on newage d4revx bleduc gender;
c#3 on newage d4revx bleduc gender;
i on newage d4revx bleduc gender;
i on newage d4revx bleduc gender;
i on newage d4revx bleduc gender;
i on newage d4revx bleduc gender;

tech4 tech11;
 Linda K. Muthen posted on Wednesday, September 03, 2008 - 9:17 am
Please send your input, data, output, and license number to
 Elizabeth Hair posted on Thursday, March 19, 2009 - 12:42 pm
I am using Mplus 5.1 to conduct latent class analyses. The new output is not giving me all of the comparisons for the multinomial regression analyses. Do I need to add something to the code to get this? Thanks!
 Bengt O. Muthen posted on Friday, March 20, 2009 - 11:56 am
Please send input, output, data, and license number to
 Jenee Jackson posted on Tuesday, October 06, 2009 - 5:46 pm
I am using Mplus 5.21 to conduct LCA using longitudinal data. I am new to LCA and I have three conceptual questions. I am interested in exploring group membership based on responses to five categorical items concerning sexual behavior collected at Time 7.

1. I have a number of antecedent variables (e.g., gender, race, early family experience, pubertal timing collected at Time 1 and Time 2) that I believe are causally related to group membership at Time 7. I am planning to include these in the model as covariates. However, I also have a number of concurrent variables (e.g., romantic attachment, attitudes about the opposite sex at Time 7) that I think also relate to group membership. After reading about the auxiliary variable function, I was planning to use these concurrent variables as auxiliary variables (as opposed to covariates). Would this be the correct approach or should I include all variables as covariates?

2. I assume the correct steps would be to first define the number of groups, then run a model including covariates. Should I include aux variables at each step or just the final step?

3. Is M plus capable of running a multiple group LCA for males and females? How is this distinct from just adding gender as a covariate?

Much thanks for any light that you can shed on these issues!
 Bengt O. Muthen posted on Tuesday, October 06, 2009 - 6:21 pm
1. Your use of aux seems preferable if you think class membership is not determined by the concurrent variables.

2. Yes. It doesn't matter.

3. Yes. With mixture modeling of categorical items you can accomplish the same using either approach.
 Idean Ettekal posted on Monday, September 26, 2011 - 6:15 pm
I am trying to specify a growth mixture model with a latent predictor (as opposed to an observed predictor). I am interested in seeing how this latent predictor is associated with class membership. I specified the covariances between the latent predictor and the growth parameters (intercept, slope and quadratic) to be zero.

I noticed in the output that when I include this latent variable as a predictor, I get class specific intercepts for the observed indicators of the latent variable, and I get class specific variances for the latent predictor and class specific residual variances for the observed indicators of the latent predictor, which are all by default constrained to be equal across classes. However, when I run a similar model with an observed predictor, there are no class specific estimates provided for that predictor (i.e., class specific intercept or variance) shown in the output. Why do I get class specific variance estimates for my latent predictor and class specific residual variances for my observed indicators in this model, but not when I only use an observed predictor? Does this approach seem problematic as compared to just using an observed predictor as far as trying to see if this predictor is associated with class membership?
 Bengt O. Muthen posted on Monday, September 26, 2011 - 8:39 pm
I am confused - please send the two outputs: Observed and latent predictor.
 Jeffrey Duong posted on Wednesday, November 30, 2011 - 8:17 am

I have a general question. I was wondering if it was possible to fit a model that was a combination of the following user manual examples:

-7.17: CFA Mixture Model
-7.19: SEM with a categorical latent variable regressed on a continuous latent variable
-7.20: Structural Equation Mixture Modeling

Let's say I have continuous latent variables, f1 and well as a categorical latent variable, C.

Is it possible to fit a model with the following paths simultaneously:

f1 --> C --> f2 and f1 --> f2

I am trying to find examples of this but I haven't had much luck. I wasn't sure if such a model would be appropriate.
 Bengt O. Muthen posted on Wednesday, November 30, 2011 - 3:57 pm
The f1 -->c model is discussed in our Topic 5 short course - see the handout's ending section Structural Equation Mixture Modeling.

c-->f2 is standard, although you don't say f2 ON c, but as the default f2 means change as a function of the c classes.

So you can put it all together in one model.
 Mary H. posted on Friday, February 24, 2012 - 2:16 pm

I am running a latent class analysis with a sample of 204 people. Results show that a 2 class model fits the data the best. However, 198 people are in Class 1 and 6 are in Class 2. Is it still possible/appropriate to examine covariates of membership in these clusters (i.e., if higher income people are more likely to be in one cluster or another)?

Thank you
 Linda K. Muthen posted on Saturday, February 25, 2012 - 7:52 am
It's too small of a class to draw inferences related to that class.
 Mary H. posted on Sunday, February 26, 2012 - 12:30 pm
Thank you for your previous response. I have a follow-up question.

Below is some of the the model fit information I received for a 1-class and a 2-class solution. When I examine a 3-class solution, the model fit indices are not any better and I get a notification about a non-positive definite matrix. The classes for the 3-class model are even less meaningful (class 1=198 people, class 2=1 person, and class 3=5 people). Is it correct for me to interpret from all of this that a one cluster solution is the best?

1 class model


H0 Value -646.770
H0 Scaling Correction Factor 2.736
for MLR

Information Criteria

Number of Free Parameters 6
Akaike (AIC) 1305.539
Bayesian (BIC) 1325.448
Sample-Size Adjusted BIC 1306.438
(n* = (n + 2) / 24)

2 class model


H0 Value -454.893
H0 Scaling Correction Factor 2.318
for MLR

Information Criteria

Number of Free Parameters 10
Akaike (AIC) 929.785
Bayesian (BIC) 962.966
Sample-Size Adjusted BIC 931.283
(n* = (n + 2) / 24)
 Linda K. Muthen posted on Monday, February 27, 2012 - 10:42 am
I would remove the 6 observations. They may be outliers. Run the analysis without them.
 Mary H. posted on Monday, February 27, 2012 - 4:24 pm
Thank you! I looked at the output file that contains the probability estimates to identify these 6 cases. I removed the cases, and re-ran the analyses. This time 5 people were placed into class two (with mean estimates on the idicator similar to previous results). Should I remove these people also and try again? Or does this mean that the 1-class solution is the best?
 Linda K. Muthen posted on Tuesday, February 28, 2012 - 10:23 am
It sounds like you may have removed the wrong observations. I would recheck that.

You can run the analysis with the full sample and ask for LOGLIKELIHOOD in the OUTPUT command. This gives you the loglikelihood contribution for each person. This may help isolate the problem.
 Bernice Garnett posted on Thursday, July 26, 2012 - 7:34 am

I wanted to follow up on the thread from June 14 2007 about class specific direct effects and the issues with estimating this model dependent on data... what specific data requirements are needed to estimate class specific direct effects?

I have estimated a 4 class model with 5 binary indicators and after some investigation it appears that I may have class specific direct effects but I have been having a hard time with this syntax in Mplus - are there any examples that I can follow?

Thank you!!!
 Linda K. Muthen posted on Thursday, July 26, 2012 - 11:58 am
If direct effects are mentioned in the overall part of the MODEL command, the regression coefficients are held equal across classes as the default. To relax this constraint, mention the direct effects in the class-specific parts of the MODEL command also.
 Bernice Garnett posted on Friday, July 27, 2012 - 9:17 am
Hi Linda,

Thanks for your response. I am trying to specify the class specific direct effects but I must be doing something wrong as I keep getting this error message.

C#1 on gender AS AA HI BI SXON BMI2;
C#2 on gender AS AA HI BI SXON BMI2;
C#3 on gender AS AA HI BI SXON BMI2;

! trying to see if there are class specific effects!
D2 on BI;
D2 on AS;
D4 on HI;

D3 on SXON;

D3 on BMI2;
D2 on SXON;
D4 on SXON;
D4 on SXON;

D4 on BMI2;

Output: Tech10 Tech11 patterns residual CINTERVAL Tech7 svalues;

The following MODEL statements are ignored:
* Statements in Class 1:
* Statements in Class 2:
* Statements in Class 3:
* Statements in Class 4:
One or more MODEL statements were ignored. Note that ON statements must
appear in the OVERALL class before they can be modified in class-specific
models. Some statements are only supported by ALGORITHM=INTEGRATION.
 Linda K. Muthen posted on Friday, July 27, 2012 - 10:33 am
You need to have the ON statements in the overall part of the MODEL command if you mention them in the class-specific part. Add them under %OVERALL%.
 Jonathan Larson posted on Friday, February 01, 2013 - 10:14 am
I‘m trying to test whether the interaction between two continuous observed variables predicts class membership in a LCA. As I understand, the most appropriate way to test predictors and covariates is to use ‘r3step’ within the auxiliary option. I’m having a problem, however, when I attempt to center the predictors to create the interaction term. I get an error message saying that variables used in the CENTER option need to be used in the MODEL command (as I mentioned I want to use them in the auxiliary option). I would center the variables in the original datafile prior to importing into Mplus, but am using multiple imputation within MPlus, which I believe precludes this as a viable option. I do not want to alter the class structures by putting the interaction term into the model. Is there something I’m missing or is there a better way to test my interaction term given the constraints I’ve mentioned? Thank you for your help!
 Bengt O. Muthen posted on Saturday, February 02, 2013 - 9:08 am
Why couldn't you use the set of multiple imputed data sets and create a new set by creating the interaction variable (first subtracting the mean from each of the two variables)?
 Jonathan Larson posted on Wednesday, February 06, 2013 - 12:29 pm
Thank you for your prompt reply! We decided to center the variables and compute the interaction term in the original data file, as recommended by multiple sources (,, Unfortunately, now we’ve run into a new problem. We noticed that even when interaction terms are not included as predictors, centered variables produce vastly different r3step regression coefficients than non-centered variables. For example, look at the difference between the coefficients for age and the Life Events Checklist when they are centered (first) and when they are not centered (second):

C#3 ON
AGE 0.075 0.064 1.169 0.243
LECCONT0 -0.101 0.066 -1.518 0.129
C#3 ON
AGE -0.698 0.280 -2.488 0.013
LECCONT0 -0.439 0.122 -3.606 0.000

Why is this happening?
 Linda K. Muthen posted on Wednesday, February 06, 2013 - 1:59 pm
Please send the non-centered data and your input along with your license number to
 Pedro Quinteiro posted on Wednesday, April 10, 2013 - 7:59 am

I am trying to estimate the class membership of the individuals in my analysis (who belongs to which class) but I am not being able to do it.

This is the command I am using (what am I missing?):

Variable: Names are ENR CONF;
Usevariables are ENR CONF;
Missing are all (-9999);
Classes = c(4) ;

Type = Mixture;
Starts = 40 2;

samp Stand Tech11;

Type = Plot3 ;

 Linda K. Muthen posted on Wednesday, April 10, 2013 - 1:34 pm
See the CPROBABILITIES option of the SAVEDATA command.
 kelly kenzik posted on Friday, March 28, 2014 - 8:53 am

I am running an LTA (4 classes over two time points with covariates (dichotomous). In the LTA without covariates it is apparent that some of the transition probabilities between classes are extremely small (0.001 or 0). I understand I need to impose parameter restrictions to set those transitions equal to 0. However, I am not confident in how I constructed the code for this:

There is almost zero probability of transitioning from class 3 to class 4 or class 3 to class 2. Would I incorporate statements like “ c1#3 on c2#4@0”? This is what I have currently with no restrictions.

c1#1 on x1-x6 ;
c1#2 on x1-x6 ;
c1#3 on x1-x6 ;
c2#1-c2#3 on c1#1-c1#3 x1-x6;



[ rmob$1] (1);
[ rself$1 ] (2);
[ rusual$1 ] (3);
[ rpain$1] (4);
[ ranxiet$1] (5);
……. And so on for MODEL C2

Thank you!
 Linda K. Muthen posted on Sunday, March 30, 2014 - 10:48 am
You don't need to fix these parameters to zero unless you want to do so for theoretical reasons. Just leave them as they are.

See the August 2012 Utrecht course handout and video on the website for further information.
 Angela M. Stover posted on Saturday, October 18, 2014 - 5:21 pm

In GMM, is there a way to specify a minimum percentage for classes? For instance, each class must have at least 5% of respondents.

Thank you!
 Bengt O. Muthen posted on Saturday, October 18, 2014 - 5:54 pm
 Christina Lustenberger posted on Wednesday, January 07, 2015 - 4:50 am
Hi there,
I am runnig a LPA with 3 Variables (Job Satisfaction, Person-Job-Fit, Chance Events). The best solution is the one with 5 Classes. Now I would like to analyze if another Variable (Turnover Intentions: toi_t) is associated with each of the Class Memberships (1-5). I tried it with this 3-Step solution:
c ON toi_t;

Now the problem is, that the output shows me only Regressions for Class 1-4 with toi_t, but the last one is not available on the output:
Categorical Latent Variables
C#1 ON
TOI_T 0.353 0.926 0.381 0.703
C#2 ON
TOI_T -6.769 1.264 -5.354 0.000
C#3 ON
TOI_T -2.909 1.071 -2.717 0.007
C#4 ON
TOI_T -4.798 1.070 -4.483 0.000

Did I use a wrong syntax? What could work better?
Thanks a lot for your help!
 Linda K. Muthen posted on Wednesday, January 07, 2015 - 5:44 am
In multinomial logistic regression, the last class is the reference class.
 Christina Lustenberger posted on Wednesday, January 07, 2015 - 6:05 am
Thanks Linda for your quick answer.
And how can I analyze if my "turnover Intention"-variable is associated with each of the 5 classes? Maybe with a correlation?
 Bengt O. Muthen posted on Wednesday, January 07, 2015 - 5:40 pm
You get that information of you treat turnover intention as a distal outcome using the auxiliary option DCAT (for categorical distals) or BCH (for continuous distals).
 Christina Lustenberger posted on Thursday, January 08, 2015 - 12:06 am
Thanks for your answer, that helps me a bit. But with the BCH option I can in the output only see the means and the S.E. of turnover intention within each class. How can I find out if the turnover-variable correlates with each class? I need the p-values of each correlation.
 Bengt O. Muthen posted on Thursday, January 08, 2015 - 5:07 pm
BCH also gives you a chi-square test; that's all that's needed.
 pamela m diamond posted on Tuesday, February 24, 2015 - 1:52 pm
I have cross sectional data and have identified 2 latent classes that make sense conceptually (one with 2 categories and the other with 3). I am interested in including covariates in the model but I also hypothesize that the first latent class variable (c1) predicts the second (c2). I tried a simple analysis including an ON statement with no covariates and got parameter estimates for the regression weights and means, but when I looked at the latent class patterns the class configurations had changed quite a bit and I'm not sure how to interpret the results at this point. I figured it was a good time to make sure I was going about this correctly (before I confuse myself with covariates) and get a better idea of how to interpret the parameter estimates as well as the class configurations. Here is the model statement I used:

C2 ON C1;
!C2 has binary indicators and 3 classes;
!C1 also has binary indicators and 2 classes ;
!I used analysis type=Mixture;

I've spent some time searching the discussion board and my books & notes and have not found examples to follow. Everything with 2 categorical latents that involve predictive relationships seem to be covered in the longitudinal literature as LTA models but my data are cross sectional. Any help is appreciated.
 Bengt O. Muthen posted on Wednesday, February 25, 2015 - 1:21 pm
The default is that the 2 latent class variables are uncorrelated, so when you add ON you would change the class formations.
 pamela m diamond posted on Wednesday, February 25, 2015 - 1:44 pm
Is there any way to constrain the classes to retain their configuration when adding the regression relationship or do I need to just create a new variable based on membership in the class hypothesized to be the predictor (the latent with 2 classes) and then include that in a separate analysis as a manifest variable predicting the downstream latent class variable (with 3 classes).
 Bengt O. Muthen posted on Wednesday, February 25, 2015 - 1:51 pm
You can do it using the 3-step LTA approach discussed in the papers on our website:

Asparouhov, T. & Muthén, B. (2014). Auxiliary variables in mixture modeling: Three-step approaches using Mplus. Structural Equation Modeling: A Multidisciplinary Journal, 21:3, 329-341. The posted version corrects several typos in the published version. An earlier version of this paper was posted as web note 15. Appendices with Mplus scripts are available here.

Nylund-Gibson, K., Grimm, R., Quirk, M., & Furlong, M. (2014): A latent transition mixture model using the three-step specification. Structural Equation Modeling: A Multidisciplinary Journal, 21, 439-454.
 pamela m diamond posted on Wednesday, February 25, 2015 - 1:57 pm
Thanks. I'll check that out.
 Almar Kok posted on Thursday, April 30, 2015 - 7:34 am
For a study on health in older adults I am running a Growth Mixture Model to identify different types of ageing processes. The problem is that respondents were aged 55-85 at baseline, and therefore the variable "baseline age" heavily influences the assignment of individuals into latent classes. As a result, there is a 'young' latent class with high and stable functioning, and other classes are much older (and have worse health...). However, this is not very informative, since I want to assess health trajectories regardless of the age at which individuals entered the study.

It has become clear from this forum and my own experience that regressing C on age, and/or I S on age does not 'adjust' the classification process for age. While some minor changes seem to occur in the classification, results seems only to confirm that age is indeed a strong predictor of class membership. However, in a sense I want to rule out baseline age as a predictor... So, following up on a question previously stated in this discussion, is there some way of "washing out" age differences among classes, and of estimating types of trajectories 'regardless' of baseline age?

Many thanks in advance!
 Bengt O. Muthen posted on Thursday, April 30, 2015 - 12:13 pm
Two approaches are possible. You can let age be a variable to capture individually-varying ages using AT. A faster approach is to re-arrange your data into a couple of cohorts corresponding to age categories at baseline; see Table 2 of the paper on our web site:

Muthén, B. & Asparouhov T. (2015). Growth mixture modeling with non-normal distributions. Statistics in Medicine, 34:6, 1041–1058. doi: 10.1002/sim6388

This approach can be analyzed as either a single group with missing data for certain ages in certain cohorts, or as multiple groups using Knownclass.
 Almar Kok posted on Friday, May 01, 2015 - 2:42 am
Many thanks for your prompt reply and reference! I have tried the second approach with a single group analysis, restructuring the data to variables that express all observations when respondents were e.g. 55 years old, 58 years old, etc... This works quite well. However, in your opinion, would a potential drawback of this approach be that it mixes up cohort and period effects, since respondents did not all have the same ages at the same point in time?

I'm also interested in the first approach you mention. Could you please explain a little bit more about what you mean by "You can let age be a variable to capture individually-varying ages using AT." E.g. what does AT stand for? I have searched the internet and the Mplus manual but could not find it...

Thank you
 Linda K. Muthen posted on Friday, May 01, 2015 - 11:09 am
The index of the user's guide shows that AT is described on pages 686-687. See also Example 6.12 which uses the AT option.
 Sara Guediri posted on Tuesday, June 23, 2015 - 8:59 am
Hi, is it possible to test latent profile membership as a nominal mediator? That is an IV influences latent profile membership and in turn profile membership influences a DV. Both IV and DV are observed, continuous variables and the data is cross-sectional. Are examples that I could adapt to test this?
 Bengt O. Muthen posted on Wednesday, June 24, 2015 - 6:47 pm
This is an advanced case which I discussed in

Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus. Click here to download the paper. Click here to view the Technical appendix that goes with this paper and click here for the Mplus input appendix. Click here to view Mplus inputs, data, and outputs used in this paper.
 'Alim Beveridge posted on Monday, January 25, 2016 - 4:51 am
Dear Bengt, Linda or Tihomir,

I have a 3-class mixture model and would like to regress the categorical LV on a number of latent class predictors. These predictors are of various types: binary, nominal (unordered categorical) and continuous. I want to use R3STEP. how can I tell the R3STEP procedure which predictors are binary, which nominal and which continuous?

must I create dummies for my nominal variables?

 Linda K. Muthen posted on Monday, January 25, 2016 - 3:01 pm
Nominal predictors should be turned into a set of dummy variables. In regression, covariates can be binary or continuous. In both cases, they are treated as continuous. There is no need to specify their scale.
 sfhellman posted on Wednesday, February 03, 2016 - 8:43 pm
I ran a two-class LCA with group assignments (0 = control, 1 = intervention) as a predictor of class membership.

The output includes the following:

Categorical Latent Variables

C#1 ON
ASSIGNMENT 0.846 0.374 2.265 0.024

Is it correct to interpret these results as indicating that the log odds ratio of being in class 1 (compared to class 2) is 0.846 higher in the treatment group relative to the control group? Any recommendations for information on how to interpret the output for predictors of class membership is appreciated!
 Bengt O. Muthen posted on Thursday, February 04, 2016 - 7:08 pm
Q1. Yes.

Q2. See UG Chapter 14 about multinomial regression and also the Topic 2 handout and video from our short courses on our website.
 Elizabeth Pasipanodya posted on Thursday, November 02, 2017 - 6:23 pm

I am doing an LCA and find that one variable, HIV status, influences classification and so should be included it as a predictor variable in an LCA-C.
I also have a number of other covariates that I am examining as auxiliary predictor variables using R3step. However, I am interested in the interaction between HIV status and one of these covariates.
What is the best way to test for interaction effects when I have one covariate influencing class membership but not the other?

 Bengt O. Muthen posted on Friday, November 03, 2017 - 5:19 pm
I don't know what is best but you could include the interaction variable in your auxiliary statement.
 Elizabeth Pasipanodya posted on Saturday, November 04, 2017 - 12:57 pm
Thank you, Bengt, for your response! I had defined the interaction and then used it as an aux variable using R3STEP, but I wanted to double check that this would be appropriate.


 Noni Gaylord-Harden posted on Thursday, June 07, 2018 - 2:22 pm
Dear Dr. Muthen,

I am conducting an LCA with 4 variables and a three-class model provides the best fit to the data. I would like to determine the influence of covariates on class membership, and I am attempting to use the R3STEP command. I conducted the LCA for the three-class model and saved the cprobs to create a nominal most likely class variable. Using the dataset generated when I saved the cprobs, I ran the LCA with covariates using the following syntax:

CATEGORICAL = S2WITcombineprevalence
classes = c(3);

I received the following error message:


However, when I check the Technical 7 output, I cannot find anything unusual with the values for this parameter. Can you please let me know if I conducted the analysis appropriately and how I should proceed given the error message. Thank you!
 Bengt O. Muthen posted on Thursday, June 07, 2018 - 6:00 pm
Send your full output to Support along with your license number.
 Daniel Lee posted on Wednesday, January 16, 2019 - 1:58 pm
Hi Dr. Muthen,

I have selected 5 classes from a growth mixture model that spans early to mid emerging adulthood. I would like to see if these classes predict latent profile classes that were estimated during adulthood. Is this possible? If so, would you have any references that provide sample syntax? Thanks so much!

 Bengt O. Muthen posted on Wednesday, January 16, 2019 - 2:14 pm
Just say

clpa on cgmm;

where clpa refers to the LPA latent class variable and cgmm refers to the GMM latent class variable.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message