Mplus Discussion >> Changes in latent class membership when predictors are included

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Changes in latent class membership wh...

Mplus Discussion > Latent Variable Mixture Modeling >

Message/Author

mike stoolmiller posted on Wednesday, November 29, 2000 - 4:08 pm

We are using Mplus to estimate growth mixture models for repeated assessments of depression. We seem to be getting quite different answers regarding class membership depending on whether or not we include predictors of latent class membership in the model. The estimated class sizes change quite a bit and individuals jump between classes. Is this indicative of some kind of problem or model misspecification or just something to be expected from the additional information included in the model?

bmuthen posted on Wednesday, November 29, 2000 - 5:09 pm

This is an interesting topic and it would be good to accumulate experience now that more researchers get into growth mixture models and other latent class models with covariates predicting class membership. In an ASB latent class example that we use for training sessions, the analysis is done in steps: using only the u indicators, reducing down to the class-defining u indicators, and adding the c-predicting covariates x. In this example, we found a strong agreement in class definition across the 3 steps which is an indication that the model is stable and trustworthy. On the other hand, I have one growth mixture example with y's and x's, where the classes seemed to change when adding x's as predictors of c. So, my early impression is that this problem may happen in some data. I can think of 3 reasons. One is that more information is available when adding x's and therefore this solution is what one should trust. Another is that the model may be misspecified when adding the x's because there may be some omitted direct effects from some x's to some y's/u's (these can be included). A third explanation is more subtle and has to do with individuals' misfit. There may be examples where for some individuals in the sample the y's/u's "pull" the classes in a different direction than the x's. Note that both y/u and x information contribute to class formation. Consider the example where in a 2-class model a high x value has a positive influence on being in class 2, and being in class 2 gives a high probability for u=1 for most u's. Individuals who have many u=1 outcomes but low x values are not fitting this model well. If the x information dominates the u information then these individuals will be differently classified using only u versus using u and x. - Just some preliminary thoughts.

Girish Mallapragada posted on Saturday, November 20, 2004 - 11:50 pm

Hello Dr. Muthen,

I have estimated a 2 class latent class SEM. I then tried to introduce a continuous covariate into the model.
For this, I fixed the loadings of items on this continuous covariate in my %overall% model based on an earlier CFA.
However, when i specify
"c#1 on MUN"
under the %overall% command
(where MUN is my continuous covariate).

I got the following message:
*** FATAL ERROR
RECIPROCAL INTERACTION PROBLEM.
am i doing somethign wrong...how can one incorporate continuous covariates in a LC model.

regards

Linda K. Muthen posted on Sunday, November 21, 2004 - 9:23 am

I would need to see your full output to understand what is happening. Please send it to support@statmodel.com.

anonymous posted on Tuesday, February 14, 2006 - 8:10 am

Hello Bengt and Linda,

I was wondering if any new information has been learned about this classification problem when different predictors/covar/outcomes are included in the model. I'm new with this type of analysis, but I was wondering if one possible solution would be to estimate class membership in an unconditional model and simply fix their class membership in subsequent models. I have an interest in retaining class membership across models for consistency's sake.

Cheers. And Happy Valentine's Day.

bmuthen posted on Tuesday, February 14, 2006 - 5:02 pm

Take a look at the Muthen (2004) chapter in the Kaplan handbook posted on our web site under Recent Papers and you will find this issue discussed. I don't see it as a problem that classification changes when adding covariates - having more information makes the classification better. If there are big changes, I would think the model with covariates is more trustworthy. But feel free to make a counter argument.

anne marie mauricio posted on Friday, November 10, 2006 - 1:46 pm

Hello Dr. Muthen, I am estimating a latent class model with 3 continuous and 3 binary predictors, and I have a couple of questions. (1) At 2 classes, my df for Chi-Square Test of Model Fit for the Binary and Ordered Categorical(Ordinal) Outcomes is 0. When I estimate a 3-class model, I receive notice that the "df cannot be computed for this part of the model". Does this mean that my model is underidentified and,thus, misspecified? If yes, can I constrain some parameters to fix this problem? Also, is a just-identified model ok? (2) I am trying to compute BLRT using Tech 14 with LRTBootstrap = 100 and LRTstarts = 0 0 40 10. I recive a message that the p value is not reliable because the THE BEST LOGLIKELIHOOD VALUE WAS NOT REPLICATED in x out of 100 bootstrap draws. However, with multiple sets of starting values, I get the same parameters estimates and my model seems to replicate, suggesting that I am not reaching a local maxima.

Your assistance is greatly appreciated.

Anne Marie

anne marie mauricio posted on Saturday, November 11, 2006 - 9:56 am

Hi Dr. Muthen, I want to correct my post on 11/10. I meant to say that I am estimating a latent class model with 3 continuous and 3 binary indicators. My questions were: (1) At 2 classes, my df for Chi-Square Test of Model Fit for the Binary and Ordered Categorical(Ordinal) Outcomes is 0. When I estimate a 3-class model, I receive notice that the "df cannot be computed for this part of the model". Does this mean that my model is underidentified and,thus, misspecified? If yes, can I constrain some parameters to fix this problem? Also, is a just-identified model ok? (2) I am trying to compute BLRT using Tech 14 with LRTBootstrap = 100 and LRTstarts = 0 0 40 10. I recive a message that the p value is not reliable because the THE BEST LOGLIKELIHOOD VALUE WAS NOT REPLICATED in x out of 100 bootstrap draws. However, with multiple sets of starting values (e.g., starts = 1000 300), I get the same parameters estimates and my model seems to replicate, suggesting that I am not reaching a local maxima.

Thank you,

Anne Marie

Linda K. Muthen posted on Saturday, November 11, 2006 - 11:23 am

The two chi-square test statistics given are for the categorical latent class indicators in the model. You can ignore them because you also have continuous categorical latent class indicators. So you don't need to worry about identification.

You should adjust the LRTSTARTS option. See TECH14 in the Mplus Version 4.1 User's Guide which is on the website for hints about how to use TECH14.

anne marie mauricio posted on Saturday, November 11, 2006 - 11:53 am

Thank you so much for your quick feedback Linda. Anne Marie

Fernando Terr�s de Ercilla posted on Tuesday, January 23, 2007 - 9:14 am

I�m doing a LCA with covariates. Class indicators are 14 ordinal (5 categories) variables, and I have also 15 potential covariates (2 continuous, 13 binary). My sample size is 12257, and it has sample weights. The result is a model with 4 classes, according to the LMR test.
Then if I introduce the covariates, the classes don�t change very much, but the LMR test suggests a model with 3 classes. �Should I keep 4 or 3 classes?
Thanks in advance, Fernando.

Linda K. Muthen posted on Tuesday, January 23, 2007 - 6:09 pm

I would look at more than the LMR test, for example, loglikelihood, BIC, etc. Also, theory and predictive validity can guide the number of classes. See the following paper that is available on the website:

Muth�n, B. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (ed.), Handbook of quantitative methodology for the social sciences (pp. 345-368). Newbury Park, CA: Sage Publications.

Fernando Terr�s de Ercilla posted on Thursday, January 25, 2007 - 6:44 am

My fit indicators are as follows:
N� clases .........LogL par ..........AIC .........BIC ........ABIC ...LMR Entrop
LCA....1c -166304.14 056 332720.28 333136.07 332958.11 ------ -----
.........2c -149536.60 113 299299.20 300138.21 299779.11 0.0000 0.877
.........3c -146832.22 170 294004.45 295266.68 294726.44 0.0000 0.807
.........4c -144646.08 227 289746.16 291431.61 290710.23 0.0075 0.803
.........5c -143351.53 284 287271.05 289379.72 288477.20 1.0000 0.796
RLCA..4c -142682.02 272 285908.04 287924.61 287060.22 0.2778 0.807
.........3c -144906.28 200 290212.55 291695.32 291059.74 0.0000 0.810

Predictive validity, in terms of expected effect of covariates is Ok for both RLCA models (I don't have distal outcomes). Item profiles are parallel for LCA4, RLCA4, and RLCA3. Sorry for the insistence but I don't see any other indications in that paper.
Thanks in advance, Fernando.

Linda K. Muthen posted on Thursday, January 25, 2007 - 8:41 am

You need to look at the fourth class and see where it comes from. For example, in the four class results are two classes the same as in the three class results. And does the fourth class come from a division of one class? If so, is the covariate profile different in those two classes? Is there substantive meaning to the fourth class? Basically, statistics can take you only so far. Then you need to use substance and logic.

Fernando Terr�s de Ercilla posted on Wednesday, February 21, 2007 - 9:12 am

Linda, thank you very much. After reviewing the theory, and rerunning my models, I have reached the conclusion that the problem lies in the covariates that simultaneously affect the classes and the items directly. My questions are:
1) Is it best to include only direct effects when supported by theory, or to include all covariates as direct effects and proceed backwards by deleting the non significative?
2) Must I let the coefficients of the direct covariates vary among classes, or would that hide the class formation? I don�t have a clear criterion about this issue.
3) Do you know any reference on this issue (covariate selection and setting)?
Thanking you in advance, Fernando.

Bengt O. Muthen posted on Saturday, February 24, 2007 - 3:36 pm

1) I think you would assume that you have relatively few direct effects because otherwise you have a low degree of measurement invariance across the latent clases. Therefore, I would add direct effects as needed by significance, both statistical and substantive.

2) Direct effects that vary across classes are typically hard to estimate with any precision so I would let them be class-invariant.

3) No, none except writings such as my own 2004 chapter in the Kaplan handbook.

Karl Hallmackenreuther posted on Thursday, June 19, 2008 - 4:56 am

I have a growth mixture model (regarding the variances, only the intercept variance is class invariant relaxed) and 6 Covariates. Estimating unconditional models leads to 4 groups and a very bad classifcation quality. Adding covariates (predicting class membership) in the model suggests 3 classes and leads to substantially improved classification quality.
In addition, class sizes change when adding these covariates. Referring to your first argument on 29. november 2000 and to 14. Februar 2006 would it be ok to report the class solutions based on the model with covariates included? They seem more trustworthy.

Linda K. Muthen posted on Thursday, June 19, 2008 - 9:57 am

This could point to the need for direct effects from the covariates to the latent class indicators. See the following paper which is available on the website:

Muth�n, B. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (ed.), Handbook of quantitative methodology for the social sciences (pp. 345-368). Newbury Park, CA: Sage Publications.

Karl Hallmackenreuther posted on Friday, June 20, 2008 - 8:14 am

thank you, as far as I understand one should include all significant effects of covariates in the class finding process. So should I also include effects of my covariates on the intercepts (growth parameters) of the classes, albeit its variance is hold equal? Seems to me, that i should include class invariant effects of my covariates on the intercept, but only significant ones, or all?

Linda K. Muthen posted on Friday, June 20, 2008 - 9:19 am

All.

Karl Hallmackenreuther posted on Friday, August 08, 2008 - 1:56 am

Meanwhile, I included all possible effects of covariates on growth parameters held class invariant, as you said. I asked myself if these class invariant coefficients can be interpreted, or are they just for specification issues? My main interest lays on the effects on class membership, but there are also some interesting effects on the growth parameters, albeit the same for all classes.

Bengt O. Muthen posted on Friday, August 08, 2008 - 8:45 am

You first consider the effect of covariates on class membership and then the further, within-class effect of covariates on the growth factors. The latter effect is interpretable and can be understood e.g. as high x value making it more likely to be extra high within the class.

Hao Duong posted on Sunday, September 21, 2008 - 1:15 pm

Dear Dr. Muthen,
I have three questions:

1. My three-class model looks better than two-class model based on fit indexes and practical interpretation. However, when I add covariates into the three-class model, p-value of LO-MENDELL-RUBIN LRT TEST is large (0.867). What should I do? Should I try the two-class model with covariates?

2. In two-level GMM for a continuous outcome, there are three options for building the model:
a. Individual-level categorical latent variable
b. Between-level categorical latent variable
c. And both individual-level and between-level categorical latent variables
Run all three seems complicated? What would you suggest me to do?

3. I would like to exam if treatment has different effects on different classes. Treatment is a between-level variable (school-level). It does not work when I try the regression of the categorical latent variable (individual-level) on the treatment in the between part of the model. I believe that I have to build the model with between-level categorical latent variable, then in each class, I regress the categorical latent variable (between-level) on the treatment. It is correct?

I am looking forward to your advice
Thank you
Hao

Bengt O. Muthen posted on Monday, September 22, 2008 - 8:34 am

1. Yes, and also look for direct effects of the covariates onto the outcomes.

2. The simplest approach is a., and it is typically the one that captures most of the variability.

3. On the between level, the between-level treatment variable can influence the random intercept of the within-level latent class variable. That's how the two levels can be connected.

Hao Duong posted on Monday, September 29, 2008 - 11:18 am

Dr. Muthen,
Thank you for the clarifications.
When I run the model with only two-groups, the LMR is still not significant. Can I just ignore LMR and look at the variability?
Thank you
Hao

Linda K. Muthen posted on Monday, September 29, 2008 - 2:58 pm

You should look at more than one test to decide on the number of classes. See the Nylund paper from the SEM journal which is posted on our website.

Jochebd G Gayles posted on Thursday, February 26, 2009 - 2:48 pm

Hello

I am trying to run a LCA model with covariates. I first ran the model without the covariates and am now adding them in. Some of my covariates are categorical and some are continuous. when I specify the variables as categorical I get the fatal error message saying reciprocal interaction problem. below is my input

VARIABLE: Names ARE ARCODE grade pub lang relat auton confr attach stressE gpa autobh somat negaff posaff sepos seneg anxmn acadmo;

USEVARIABLES ARE grade pub relat-stressE
somat-acadmo;

CLASSES = C(4);
CATEGORICAL ARE grade relat;
MISSING is all(99);
IDVAR = ARCODE;

ANALYSIS: TYPE = MIXTURE;
STARTS 50 10;
ESTIMATOR=ML;
ALGORITHM=INTEGRATION;

MODEL:
%OVERALL%
C#1-C#3 on grade relat;
C#1-C#3 on pub auton
confr attach stressE;

Do I need to add more model specifications to solve this problem? Or can I not have continuous and categorical covariates? Also I should mention that one of my categorical variable has two levels and one has 7 levels. Is that a problem for Mplus?

Thanks so much

Joche

Linda K. Muthen posted on Thursday, February 26, 2009 - 4:11 pm

The CATEGORICAL option is for dependent variables only. Do not put covariates on this list. Covariates are either binary or continuous and are both treated as continuous as in regular regression. If you have a nominal covariate, you need to create a set of dummy variables.

Jochebd G Gayles posted on Thursday, February 26, 2009 - 6:16 pm

Thanks so much. That was very helpful!

Joche

Jon Heron posted on Wednesday, June 24, 2009 - 7:12 am

Hi,

recently, I was very interested to read

Clark, S. & Muth�n, B. (2009). Relating latent class analysis results to variables not included in the analysis.

as this issue has been bugging me for some time, and judging by the age of this thread, I'm not the only one!

Since then (i.e. this morning) I read the following:

Multinomial Logit Latent-Class Regression Models: An Analysis of the
Predictors of Gender-Role Attitudes among Japanese Women Kazuo Yamaguchi
The American Journal of Sociology, Vol. 105, No. 6 (May, 2000), pp.
1702-1740

which describes a model which in my mind sits half-way between the 1- and 2-stage approaches. This paper presents a conditional LCA model in which there is no 2 or 3-way interactions between the X's and U's and between the X's, C's and U's respectively. Perhaps this is only possible within a log-linear approach, but it seems to be exactly what I, and other researchers, are after - a model which reflects the latent nature of C without allowing X to affect the class-specific probabilities.

I have been attempting to implement this model with the following model - see next post

Jon Heron posted on Wednesday, June 24, 2009 - 7:14 am

Analysis:
TYPE = mixture random;
proc = 2 (starts);
STARTS = 5000 250;
STITERATIONS = 20;
STSCALE=15;
ALGORITHM=INTEGRATION;

MODEL:
%OVERALL%

c on dda_91;
ddaclass | dda_91 xwith c;
wh_1 wh_2 wh_3 wh_4 wh_5 wh_6 wh_7 on ddaclass@0;

%c#1%
wh_1 wh_2 wh_3 wh_4 wh_5 wh_6 wh_7 on dda_91@0;

%c#2%
wh_1 wh_2 wh_3 wh_4 wh_5 wh_6 wh_7 on dda_91@0;

%c#3%
wh_1 wh_2 wh_3 wh_4 wh_5 wh_6 wh_7 on dda_91@0;

%c#4%
wh_1 wh_2 wh_3 wh_4 wh_5 wh_6 wh_7 on dda_91@0;

%c#5%
wh_1 wh_2 wh_3 wh_4 wh_5 wh_6 wh_7 on dda_91@0;

%c#6%
wh_1 wh_2 wh_3 wh_4 wh_5 wh_6 wh_7 on dda_91@0;

but this is not working - gives the same model as one without constraints. Please can you help?

many thanks, Jon

Bengt O. Muthen posted on Wednesday, June 24, 2009 - 9:11 am

XWITH cannot be used for an interaction with a categorical latent variable - your c. Interactions with c should instead be handled by letting a relationship between variables vary across the c classes.

I have to look at the Yamaguchi article to know what he is doing.

Jon Heron posted on Wednesday, June 24, 2009 - 9:17 am

Thanks Bengt,

I'd be interested to hear what you think

cheers, Jon

Bengt O. Muthen posted on Thursday, June 25, 2009 - 5:26 pm

The way I read the 200 Yamaguchi article is that his eqn. (1) specifies in Mplus terms:

usev = r s t a b c;
categorical = r s t;
classes = y(2);

MODEL:
%overall%
y on a b c;

This would give the 2-way terms of (1) for YA, YB, YC, and for RY, SY, TY. So (1) is the same as a standard latent class regression model in Mplus. As such, the "x's" a, b, and c do influence the latent class formation. Which by the way, I think is natural that they do - exactly in line with the case of factor scores in a MIMIC model with continuous latent variables.

Jon Heron posted on Friday, June 26, 2009 - 3:18 am

Hi Bengt,

thanks for taking the time to read that paper

the bit I am stuck on is on p1712:

"The substantive meaning of latent classes is determined by the pattern of association
of Y with the response variables. If we further allow this pattern of association
to depend on covariates, the meaning of contrasts between being in
one latent class versus another class changes with covariates, which is
clearly undesirable in comparing regression coefficients across variables.
Hence, even though a statistical improvement may be obtained by doing
so, we should not introduce such three-factor interactions into the
multinomial logit latent-class regression models; we make latent classes
reflect the most statistically significant latent division of the entire population�
even if such a division is not the most significant one among some
subgroups of the population."

Is this is subtly different issue?

Cheers, Jon

Bengt O. Muthen posted on Friday, June 26, 2009 - 12:02 pm

I read that to mean that Yamaguchi does not want direct effects of the covariates on the response variables.

Jon Heron posted on Sunday, June 28, 2009 - 1:38 am

Thanks Bengt,

J

Karl Hallmackenreuther posted on Monday, October 26, 2009 - 5:43 am

Is anything known with respect to the performance of BLRT in conditional vs. unconditional growth mixture models? BLRT points to 4 classes (unconditional) or 2 classes (conditional). LMRT seems more consistent and suggests 2 classes regardless of conditional or unconditional.

Linda K. Muthen posted on Monday, October 26, 2009 - 9:16 am

I don't think there have been any thorough studies comparing BLRT and LMRT. I think their behaviors will vary based on the data and the model. No one statistic should be used to determine the number of classes. Several should be examined and substantive theory should definitely play in. See the following paper on the website:

Nylund, K.L., Asparouhov, T., & Muthen, B. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling. A Monte Carlo simulation study. Structural Equation Modeling, 14, 535-569.

Karen Nylund-Gibson posted on Tuesday, October 27, 2009 - 9:27 am

Katherine Masyn and I have done more studies on the performance of the
LMR and BLRT for latent class analysis (LCA) models. In our studies,
the BLRT consistently identifies the correct number of classes for
unconditional models. However, for any type of conditional model there
was not a consistent pattern of which worked better (BLRT or LMR) since
both showed some weakness under different misspecifications (with
respect to the covariates paths of influence). One main finding from
this study (which will be under review soon) is that you should do class
enumeration on the unconditional model using BLRT and BIC and then
systematically include covariates. One thing to pay close attention to
is that a significant change in the nature of the class formation from
the unconditional to conditional models (even for the same number of
classes) may signal a misspecification in the conditional model.

Harald Gerber posted on Saturday, November 19, 2011 - 6:36 am

Hello,

I have a 2-class solution that almost does not change (with respect to growth shapes and membership proportions) when I add some theoretically important covariates to the model (effects on C and within-class effects on growth factors). However, with regard to a 3 class solution the additional 3rd class alters in its shape (from a stable trend to a declining trend) when I add covariates and proportions of class membership significantly change in comparison to the 3 class model without covariates. I tried direct effects of covariates on manifest indicators but the problem persisted. Additionally, the 3rd class (9% of the sample) is not distinguished by covariates from the next lower trajectory (effects on C).
In this case, is it plausible to argue that the 3 class model is unstable, not trustworthy, and does not add substantial information in comparison to the 2 class model? Especially I would be interessted in the "not trustworthy" or "unstable" argument.

Cheers, Harald

Bengt O. Muthen posted on Sunday, November 20, 2011 - 8:35 pm

I would think in terms of whether the 3rd class adds a substantively important class that is different from what you see with 2 classes. To help decide, I would also use BIC both with and without covariates.

I don't think it is a good argument againts 3 classes to say that the 3rd class is not distinguished by covariate influence from another class. There is always a chance that some important covariate is left out and there is also a possibility that the 3rd class makes a difference for a distal outcome.

I do think that a model with direct effects included from covariates possibly is more reliable than a model without covariates. If those covariate effects are significant, leaving out covariates - that is, having class indicators be influenced only by the latent class - is a misspecified model because latent class isn't the only predictor.

Michael Green posted on Wednesday, September 26, 2012 - 6:56 am

Hi,

I've been doing an analysis where I've been trying to produce multiple imputations of plausible values for latent class membership as suggested in:

Asparouhov, T. & Muth�n, B. (2010). Plausible values for latent variables using Mplus. Technical Report.

However, I found that when I run an imputation model using ESTIMATOR=BAYES I get substantially different proportions assigned to each class compared to the proportions estimated from the ML estimated latent class model, even without including additional indicators of class membership (which I do want to do). I found this was happening even if the response probability parameters for all the latent class indicators were constrained to identical values in both models.

This makes me worry whether or not I can trust the imputations of class membership.

Would this kind of effect be expected? If so, what causes it?

Or am I maybe just doing something wrong?

Note: the latent class indicators in my model are repeated measurements of 3 outcomes from a cohort study, so there is data missing due to drop out for some of the latent class indicators.

Tihomir Asparouhov posted on Thursday, September 27, 2012 - 8:49 am

I am not aware of any such discrepancy. Try generating 1000 plausible values after running a minimum of 10000 iterations. If that does not fix the problem then it will be something specific to your data. Theoretically the result should be the same with any data though so if you can't resolve the issue send all the information you can to support@statmodel.com

Tihomir Asparouhov posted on Wednesday, October 03, 2012 - 10:02 am

The ML run is using logit link and Bayes is using probit. Rerun the ML with probit link and then use the values in the Bayes run.

emmanuel bofah posted on Monday, December 09, 2013 - 2:26 pm

In typical example like EXAMPLE 7.12: LCA WITH BINARY LATENT CLASS INDICATORS USING AUTOMATIC STARTING VALUES WITH RANDOM STARTS WITH A COVARIATE AND A DIRECT EFFECT.
By default what will be the reference class if the classes were three i.e c(3).
Will like to confirm if by default mplus uses the third category as the default thereafter interchange the references class.

Linda K. Muthen posted on Monday, December 09, 2013 - 5:31 pm

The last class is the reference class.

David Buitenweg posted on Monday, October 13, 2014 - 2:09 am

Dear Dr. Muthen,

I have estimated a 4 class model which contains 11 latent indicators. Next, I want to run a number of analyses, leaving out one of the 11 indicators (resulting in 11 different models with 10 latent indicators).
How would you recommend to compare the 11 indicator model with the 10 indicator models? I have been including the 11th indicator as a covariate, allowing me to compute Walds test through the Model test option but I am not sure if that is the ideal method.
I have also computed Loglikelihood difference test according to the following article http://www.statmodel.com/chidiff.shtml resulting in chi square difference values which seemed unrealistic.

Many thanks,

David

Bengt O. Muthen posted on Monday, October 13, 2014 - 8:29 am

Not sure what you want to know about your 11 models, that is, what you are comparing. (1) Perhaps which indicator is the least informative about the latent classes? Or, (2) which indicator contributes more to misfit?

You cannot work with logL or BIC because your 11 models have different DVs. For (1) you can use the new version 7.3 OUTPUT option Entropy in the 11-indicator run. For (2) you can use TECH10 (assuming categorical indicators).

David Buitenweg posted on Tuesday, October 14, 2014 - 12:04 am

Thanks for the quick reply!

Anna Hawrot posted on Monday, November 24, 2014 - 1:27 am

Hi,
I've been reading recently about different procedures for dealing with changes in LCA solution after adding covariates/distal outcomes (DU3STEP, BCH). However, all these methods assume that covariates/distal outcomes are observed variables. Is it possible to benefit from these procedures in models in which covariates and distal outcomes are latent or both latent and observed?

Tihomir Asparouhov posted on Monday, November 24, 2014 - 9:07 am

Yes. Both 3-step and BCH can be used with an arbitrary secondary model. See

section 3 in http://statmodel.com/download/webnotes/webnote15.pdf
section 3 in http://statmodel.com/examples/webnotes/webnote21.pdf

For example, your secondary model can be a measurement model for a latent distal outcome.

David Buitenweg posted on Sunday, February 01, 2015 - 11:50 pm

Hi,
Mplus 7.3 enables the computation of variable specific entropy Ej for latent indicator Uj. According to the note on �Variable specific entropy contribution, October 17, 2014�, univariate entropies are directly comparable among each other. My question is whether there is an absolute threshold X, so that Ej < X means the indicator Uj provides to little information about the latent variable and should therefore be excluded?

Linda K. Muthen posted on Monday, February 02, 2015 - 9:10 am

We don't have enough experience with these to say.

David Buitenweg posted on Thursday, February 05, 2015 - 7:47 am

I see, thanks for the quick reply.

LS posted on Wednesday, July 24, 2019 - 9:50 am

Dear Drs. Muthen,

I have run a conditional 4-class GMM-CI with S@0 due to convergence problems. The initial entropy level for the unconditional model was .675.

Why does my entropy drop to .552 when I add covariates to my model? What does this mean? Is the model still valid?

Also, classification slightly varies.

Yours Sincerely,
LS

Bengt O. Muthen posted on Wednesday, July 24, 2019 - 5:38 pm

The drop in entropy can happen; it is not a problem.

When classification varies you may want to study Web Notes 15 and 21.

LS posted on Sunday, October 13, 2019 - 9:47 am

Dear Drs. Muthen,
Thank you for the previous answer.

I would like to follow up with this new question.

My final optimal unconditional model of health has 4 classes, which I labelled as stable, recovering, slightly improving and improving.

Due to missing data on X, when adding my predictor in the one-step approach procedure, the number of observations drops from 9076 to 5679.

It follows that classification rates change a lot. However, when I have a closer look at the mean intercepts, mean slopes and mean quadratic terms of each new class, they haven't changed that much compared to the unconditional model parameters.

For example, mean intercept of the improving class in the unconditional models was 2.514; now in the conditional model, is 2.334 (and similarly for slope and q term).

Can I still call the new classes as the old ones (stable, recovering, slightly improving and improving classes) based on the similar parameters?

Thank you so much for the clarification.
LS

Bengt O. Muthen posted on Monday, October 14, 2019 - 2:08 pm

The class percentages can change when adding an X variable due to there being direct effects from X to the latent class indicators that are not included in the model.

The answer to your final question is Yes.

John Mallett posted on Saturday, October 19, 2019 - 3:16 am

I am new to LCA and so have a very basic question.

I want to run a 3-class LCA (20 binary indicators).

If I include covariates in the USEVARIABLES statement, then class membership is very different to that obtained if I exclude the covariates. The difference in the 3-class composition is 40%, 45% and 15% versus 49.5%, 47.1%, 3.4%.

When I include covariates in the USEVARIABLES statement, I have not included any subsequent statements such as c on x1 x2 x3 and have not specified the covariates using the r3step command.

I am assuming that the covariates are being used in some way to determine or refine class membership probabilities or is there another reason for the changes in class membership ?

Thanks for your consideration
John

Bengt O. Muthen posted on Saturday, October 19, 2019 - 2:47 pm

When you include a covariate X in the model, you should also include c ON X. Otherwise, the model is probably mis-specified. When including c ON X, the class formation may be very different from that of not including X in the analysis for reasons described in Web Note 15 and 21.

John Mallett posted on Tuesday, October 22, 2019 - 5:18 am

Thanks for this Bengt

The model did not seem to throw up any warnings in this regard (when I did not explicitly specify the c ON X command).

1. I was thinking that it would be better to use R3STEP instead of c on X when specifying covariates. Would this be ok ?

I also want to add in some concurrent outcome scores for depression and anxiety as I suspect that the classes will differ on these outcomes (continous outcomes).

My syntax/modeling question is...

2. Can I use MODEL CONSTRAINT command to look for class differences on these outcomes (depression & anxiety) alongside R3step command for covariates?
I want to set up class difference scores and test for significance.

Is this an appropriate approach?

Thanks again for your consideration.
John

Bengt O. Muthen posted on Tuesday, October 22, 2019 - 4:56 pm

1. Yes.

2. When you have both covariates and distals, you should use the manual approach of Section 3 of web note 15 (or section 3.2 of web note 21).

Daniel Christensen posted on Sunday, November 24, 2019 - 10:25 pm

Dear Drs Muthen

I�m trying to run a 4 class LCA with a distal binary outcome (O1) in Version 7.31.

Variable:
Names are ID I1-I13 O1-O3 C1-C8 ;
USEVARIABLES ARE ID I1-I13 ;
Categorical are I1-I13 ;
Idvariable is ID;
Auxiliary = O1 (DCAT) ;
Missing = all(-1234) ;
Classes = L (4);

Analysis:
Type = mixture ;

As series of questions:

1) I thought BCH was 'best' for latent class uncertainty. Does DCAT vs. BCH matter in this case?

2) At the moment the OR association between my latent classes and outcome are estimated relative to class 4. I�d like to make Class 2 (the biggest group) my reference category � is there a trick for this?

3) I�ve identified a bunch of potential confounds for the relationship (C1-C8). I wanted to look at the association between these and latent class membership, and also if the latent class > outcome relationship is affected by covariate control. How do I add these without affecting latent class identification?
These confounds are all binary so I�d like to estimate odds ratios.

4) Can I take into account survey weights/ clusters?

Thanks

Daniel

Bengt O. Muthen posted on Monday, November 25, 2019 - 11:43 am

See Table 6 of Web Note 21 for choices of approaches. BCH is for continuous distal outcomes while DCAT should be used for categorical distals.

Confounders can be used as X's in the "manual" approach shown in Appendix E of Web note 15.

Weights and clusters can be handled by the manual approach.

Daniel Christensen posted on Tuesday, December 10, 2019 - 5:41 pm

I am still working my way through this, but thanks so much for the prompt response.

Si Wen posted on Tuesday, June 16, 2020 - 8:08 am

Dear Drs. Muthen,

In one of my studies, I have two research questions. One is to identify smoker subtypes who attend a smoking intervention. Another is to explore whether the smokers subtypes could predict time to drop out of the intervention. To answer the first question, I plan to conduct an LCA; and to answer the second question, I plan to conduct a survival analysis. Here, I have one question:

In my LCA, I also plan to include 5 covariates: age, gender, education, income, and marital status. Since the 5 covariates are assumed to influence the structure of smoker subtypes, I plan to use one-step approach to conduct the LCA. Afterwards, I plan to conduct a survival analysis to investigate the prediction from the class membership on the time to drop out of the intervention. Since the 5 covariates listed above are also important factors to predict intervention attrition, I plan to control them in the survival analysis again to test the pure effect of the class membership on intervention attrition. But, I am not sure whether this model is correct? Since the 5 covariates have been used in the LCA to identify the subtypes of smokers, do you think I should control them again in the survival analysis to test the prediction from the smoker subtypes on intervention attrition? If the answer is no, could you please let me know what are the problems when controlling the 5 covariates again in my survival analysis? Thanks!

Si

Bengt O. Muthen posted on Tuesday, June 16, 2020 - 5:41 pm

Regarding the survival modeling, you can use survival mixture modeling as in the User's Guide examples 8.16 and 8.17.

Si Wen posted on Wednesday, June 17, 2020 - 7:30 am

Dear Drs. Muthen,

Thanks for your quick response and suggestions. The example is really helpful!

Si