Including covariates Xs into a LCA is likely to change latent class distribution. If so, this implies that the covariates Xs may have effects on the indicators Us; then influence membership classification. My questions are:
1) Does this mean Xs should be specified to predict the latent class variable C, as well as the categorical indicators (i.e., Us)? If Xs are specified only to predict C, not Us, is there any model misspecification problem?
2) I tried to regress one indicator, U1, on Xs, then no conditional probabilities were provided in Mplus output. Is there any option to print out conditional probabilities in Mplus output in this regard?
3) I also tried to regress a categorical indicator U2 (a 3-level ordinal measure) on Xs in the model. I got one set of coefficients in each class. Are these cumulative logistic regression coefficients?
1. If a u ON x coefficient is significant, this means that if the x is not included in the model, the classes will be different.
2. Conditional probabilities are not computed when there are x's because they vary with the value of x.
3. If you get one set for each class, you must have mentioned this ON statement in the class-specific part of the MODEL. If I understand the term cumulative, these are not. They are simply the logistic regression coefficients for each class with no order implied.
J.W. posted on Monday, November 01, 2004 - 2:25 pm
Thank you very much for your prompt answeres to my questions.
1) It is very likely that covariates, such as socio-demographics, would be related to the u indicators, which are often outcome measures. If we want to use socio-demographics to predict latent class membership, do we have to specify the relationship between all x’s and all u’s? That would be i) very tedious and difficult for model specification when there are a lot of u indicators and x covariates; ii) there would be no conditional probabilities reported, as you pointed out, because they vary with the values of covariates. iii) If we use socio-demographic covariates to predict class membership without relating them to the u indicators, would the LCA model be misspecified?
2) Sorry, I did not make my question 3 clear in my last message. Let me try again. Regressing a K level categorical u indicator (e.g., 3 categories) on x covariates, I was expecting K-1 (e.g., 2) sets of multinomial logit model coefficients for each class. However, I only got one set of coefficients when I regressed a 3 level categorical u variable on x covariates. I was wondering if the coefficients were from a proportional odds logit model (sometimes called “cumulative logistic regression”) since the u variable is an ordinal measure.
bmuthen posted on Monday, November 01, 2004 - 4:00 pm
1) A significant direct effect from a covariate to a u indicator shows that the u measurement is not invariant with respect to the groups of people represented by the values of the covariates. Such a measurement non-invariance check cannot be done by including all direct effects in addition to the effect of the covariates on the latent class variable because this model is not identified. One can investigate one u at a time, allowing the covariates to influence this u directly, not only via the latent class variable. (i) yes, measurement non-invariance investigations can be tedious, but can be important (seldom done unfortunately). (ii) correct. (iii) yes
2) Yes, you are right, these estimates are from a proportional odds model since polytomous u's are taken to be ordinal when the categorical = option in the Variable command is used. If you want to specify the u's as nominal, use the nominal = option.
J.W. posted on Wednesday, November 03, 2004 - 11:30 am
When LCA is used to assess the pattern of outcome measures, such as diagnosed symptoms or risk behaviors, very often LCA was conducted without covariates or relationships between class membership and covariates was assessed separately after class membership was estimated. This was inappropriate, because the LCA model was misspecified. To my understanding, covariates that influence, in theory, the latent class membership should be included in LCA. Estimation of latent class membership and the relationships between the class membership and covariates should be done simultaneously. What bothers me is that covariates (e.g., socio-demographic characteristics), that influence the class membership, would also very likely to influence the u outcome indicators. Is there any covariate that does not influence the u outcome indicators, but class membership? The difficulties are: 1) If multiple or all the u indicators are significantly related with the covariates, we can’t regress all these measurement non-invariant u indicators on covariates, when the covariates are also used to predict class membership, as you pointed out in your last message, the “model is not identified.” 2) Even with just one measurement non-invariant u indicator, it would be difficult to define the latent classes because the conditional probabilities would not be available when covariates are used to predict the u indicator. Now, I find myself in dilemma. Excluding covariates, the LCA is misspecified; including covariates, I have the above difficulties. Any solution? Many thanks in advance.
bmuthen posted on Wednesday, November 03, 2004 - 11:46 am
The u indicators are in fact correlated with the covariates even when the covariates only point to the latent class variable and not directly to the u's. This is because the covariates then have an indirect effect on the u's. So this model has strong correlations between the covariates and the u's (even without direct effects).
2) Conditional probabilities are available even with a direct effect to a u. You can compute the conditional probability for each class and each value of the covariate (Mplus doesn't do it, but it can be done by hand using the estimates).
Hope this answers your questions.
J.W. posted on Thursday, November 04, 2004 - 11:10 am
I'm doing an LCA using two demographic covariates to predict latent class membership for 8 indicator variables. I'd like to compare models with various numbers of latent classes using AIC, BIC, etc. This seems pretty straightforward for 2 or more latent classes.
But I also want to compare a model with a single latent class, and can't figure out how to model a single latent class with covariates. If I include my covariates in the USEVARIABLES line, without specifying an ON statement for the relation between covariates and latent class, does MPlus know that these variables are covariates?
Does it even make sense to include covariates in a single class model? Mplus calculates parameters for them, but I'm not sure it's doing what I think it's doing.
With one class, there is no latent class membership to predict. Everyone is in one class.
bmuthen posted on Monday, April 25, 2005 - 3:38 pm
If you only include the covariates in the USEV list and not in the model, the covariates are treated as variables that are uncorrelated among themselves and with the other observed variables (see the warning you got) - this is not what you want; don't include the covariates in the USEV list when you only have 1 class (unless you want them to influence the outcomes).
anonymous posted on Friday, January 13, 2006 - 12:05 pm
when conducting LCA with covariates, can the covariates have more than 2 categories for which i then substitute different values for x when calulating the probabilities using the logitistic regression coefficients (e.g.0,1,2)? or do i need to create 3 dummy variables to represent the 3 categories of this covariate?
Covariates can be continuous or a set of dummy variables. You would create two dummy variables to represent three categories.
anonymous posted on Wednesday, January 18, 2006 - 7:34 am
HI can the odds ratios be interpreted for covariates that are not binary, i.e. that have 3 or 4 nominal categories? also, if i have 2 or 3 covariates is it possible to look at the odds ratios for each covariate in turn for each class? for example when including sex and religion, do i say the odds for females of being in class 1 is higher than for males, and the odds of catholics being in class 1 is higher than for protestants, etc.
If you have nominal covariates, you need to create a set of dummy variables. Covariates can be continuous or binary as in regular regresssion.
You would want to add to "for males" the words "holding other covariates constant".
anonymous posted on Wednesday, January 18, 2006 - 10:36 am
thanks. i included a set of dummy variables, but got the following message:
ONE OR MORE MULTINOMIAL LOGIT PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY BECAUSE THE MODEL IS NOT IDENTIFIED, OR BECAUSE OF EMPTY CELLS IN THE JOINT DISTRIBUTION OF THE CATEGORICAL LATENT VARIABLES AND ANY INDEPENDENT VARIABLES. THE FOLLOWING PARAMETERS WERE FIXED: 119 123 127 131 135 139
i'm not sure how to rectify this. i'm assuming this has something to do with the inclusion of the dummy variables, so i may have done something wrong. for a nominal variable with 3 categories i created 3 dummy variables to represent membership (0) or non-membership (1) for each of the 3 categories respectively.
For a nominal variable with three categories, you would create two dummy variables just like in regular regression.
anonymous posted on Wednesday, January 18, 2006 - 12:08 pm
sorry, just to clarify once more. given then that i will have one reference category against which i will be comparing the other 2 categories (dummy variables), does this mean that i will not be able to calculate the probabilities of class membership for this given reference category? or do i simply regard the slope as 0 for this reference category, where the logit=intercept?
Think of p u indicators and q x covariates. What identifies relationships between u's and x's can be thought of as logistic regression relationships for u on x. There are pxq such slopes. Even with a binary latent class variable c, we already use up q slopes for c ON x, so there isn't enough information left for pxq more slopes.
For my LCA model with 2 dummy variables to reflect a 3 category covariate RACE, I get the warning: "ONE OR MORE MULTINOMIAL LOGIT PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. SINGULARITY IS MOST LIKELY BECAUSE THE MODEL IS NOT IDENTIFIED, OR BECAUSE OF EMPTY CELLS IN THE JOINT DISTRIBUTION OF CATEGORICAL LATENT VARIABLES AND ANY INDEPENDENT VARIABLES. FOLLOWING PARAMETERS WERE FIXED: 90 93
Syntax is: USEVARIABLES are ..... BLK HISP; Classes = c(6); Analysis:Type=mixture ; MODEL: %OVERALL% c ON BLK HISP;
Fixed parameters seem to be BLK in C#4 and HISP in C#5. GAMMA(C) BLK HISP C#4 90 91 C#5 92 93
I was wondering if anyone on the discussion forum could help. I am running the following 4 Class LCA model, with a direct effect (Age).......
Variable: Names are scrser area gor6 nrf2005 education age swimming snooker darts football fishing newoutdo wintersp water tennis badmin squash cycling fitness cricket tabten golf horserid yoga tenpin jog bats rackets rackets2; Missing are all (-9999) ; USEVARIABLES swimming snooker football newoutdo wintersp water cycling fitness golf tenpin jog cricket rackets2 age; CATEGORICAL swimming snooker football newoutdo wintersp water cycling fitness golf tenpin jog cricket rackets2; CLASSES = C (4);
ANALYSIS: TYPE = MIXTURE;
MODEL: %OVERALL% swimming - rackets2 on newage1;
........... My question is, the age variable is categorical (5 categories), and how do I let MPlus know that this variable is categorical? as I cant put it into the CATEGORICAL section as it will place it in the latent class model.
I am doing a two-level mixture model with binary indicators and I would like to know how to get the condutional probabalities for each indicator. For example, in the regular mixture model, TECH1 produces conditional probabilities and thresholds; however, in the two-level model, there are only thresholds, no conditional probabilities for the indicators. Is there a way to use the thresholds to compute conditional probabilities in the two-level model?
Hello, I have a follow-up to my previous question. If I am unable to compute conditional probabilities by hand and I cannot estimate them in the two-level model, would you suggest estimating them in a single level model? I am using complex data with clustering at 2 levels and the single level mixture model only allows one cluster variable. How concerned should I be if I am unable to estimate the conditional probabilities with both clustering variables in the model?
I would not change my model. I would instead interpret the sign and significance of the latent class indicators and look at the profiles. You will not learn anything more from the probabilities than from the parameter estimates.
nina chien posted on Wednesday, May 07, 2008 - 11:14 am
When I include covariates, all cases with missing data on just ONE of the covariates are dropped from the analysis.
I am using the TYPE = MISSING command.
"Data set contains cases with missing on x-variables. These cases were not included in the analysis. Number of cases with missing on x-variables: 418"
Is this supposed to happen? Thanks very much for your help.
nina chien posted on Wednesday, May 07, 2008 - 3:59 pm
I am doing an LPA with covariates. There is one covariate - poverty status - that I know influences profile membership.
But I also want to use poverty status, later down the line, as a predictor to test for interaction effects with profile membership (profile x poverty status) on some outcome variables.
My options are: 1) Include poverty status as a covariate in the LPA. Then use it later again as a predictor variable when testing the interaction of profile x poverty status. 2) Do not include poverty status as a covariate in the LPA. Use it later as a predictor variable when testing the interaction. 3) Include poverty status as a covariate in the LPA. I cannot use it as a predictor variable later (i.e., I must drop my research question having to do with the interaction entirely).
Which is the correct one (I really hope not 3)? Or another option entirely? Thanks very much for your help.
A model is estimated conditioned on the covariates. As a result no distributional assumptions are made about them. If you don't want the observations with missing values on the covariates deleted from the analysis, you must bring these variables into the model and thereby make distributional assumptions about them. You can do this by mentioning their variances in the MODEL command.
I would include the covariate in the LPA and also when the distal outcome is added. You would regress both the categorical latent variable and the distal outcome on poverty. By allowing the regression to vary across classes, you would capture the interaction you are interested in.
Kaigang Li posted on Tuesday, May 27, 2008 - 8:35 pm
Based on your answer to Nina Chien's question on May 07, could you please clarify how to mention the variances in the MODEL command? Should I compute the variances using other stats package and fix the variances in the MODEL command using @?
I performed a cross-sectional LCA using early adult factors as the indicators. The optimal number of classes was 4. I now want to see how these 4 classes vary on a specific set of adolescent factors. One such variable is high-school GPA.
It seems that there are two ways to examine such class differences. One is to include "C on gpa" in the %Overall% model statement. A second way is to include "Auxiliary = gpa(e)" in the Variable section of the code. Each approach appears to provide results that are consistent with the other. The relative risks (change in prob of class membership for an increase of 1 in gpa) calculated from the “C on gpa” approach are consistent with the gpa means provided for each class from the “Auxiliary” approach.
My questions are: (1) Do these two approaches provide equivalent results (though in different metrics)? Am I comparing “apples to apples”? (2) If they are not equivalent, in the "C on gpa" method does heterogeneity in gpa contribute to/influence actual class structure/assignment?
Thanks for the quick response. So, say my LCA has 5 "u" variables or indicators. Would a model with the 5 "u" variables in the class specific model statements (i.e., %C#1%, %C#2%, etc.) and with "C on gpa" yield the same class structure and probabilities as a model with the 5 "U" variables AND GPA in the class specific model statements? (With the obvious difference between the two models being that the former also provides changes in membership probability given changes in levels of GPA).
Here are the two versions in syntax form...
%Overall% c on GPA; %C#1% [U1$1*1.435]; [U2$1*.045]; [U3$2*.317]; [U4$1*.664] [U5$1*1.765];
I posted a couple of days ago (on another thread) about changing reference classes in a 3 class lca model in order to be able to have my desired ORs reported in the output. Your recommendation was to use the ending values as starting values for the classes in a new analysis. This works very well, thank you.
The challenge I am having now is when I add my three covariates to the model the classes return to the order of when I am not using the ending values as starting values. It is like the covariates nullify the command to switch the order.
Is it possible to reorder the classes with the ending values and to simultaneously add covariates? If so, is there additional syntax I am missing?
The starting values from the model without covariates may not be correct for the model with covariates. There may be direct effects needed between the covariates and the latent class indicators that make the classes change when covariates are added. See the following paper which is available on the website for a discussion of this:
Muthén, B. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (ed.), Handbook of quantitative methodology for the social sciences (pp. 345-368). Newbury Park, CA: Sage Publications.
Dear, I'm doing LCA (what works great), but now I want to include two covariates (age and sex).
This is my syntax: ... VARIABLE: NAMES ARE idnr it1 it2 it3 it4 it5 it6 it7 it8 age sex; USEVARIABLES ARE it1 it2 it3 it4 it5 it6 it7 it8 age sex; CLASSES = c (2); CATEGORICAL = it1 it2 it3 it4 it5 it6 it7 it8 sex; ANALYSIS: TYPE = MIXTURE; ALGORITHM = INTEGRATION; MODEL: %OVERALL% c ON age sex; OUTPUT: SAMPSTAT STANDARDIZED MODINDICES; SAVEDATA: FILE IS prob2class_710mergeAGESEXok.dat; SAVE IS CPROB;
- Is this the right way: c ON age sex; to use age and sex as covariates? I don't think so because my fit indices change when I leave out "sex" in the category "CATEGORICAL". And I can't put "age" in "CATEGORICAL" because this is a nog a categorical variable. So, my question is I can control for age and/or sex.
To come back on my previous question, is the following correct because age and sex are still seen as dependent continuous variables: Number of dependent variables 10 Number of independent variables 0 Number of continuous latent variables 0 Number of categorical latent variables 1
Dear, When I control for age and sex (c ON age sex;), I put age and sex in USEVARIABLES. My question is: When I control for only 1 variable, do I have to put the other one also in USEVARIABLES? E.g. when I only control for sex (which is of course in USEVARIABLES), do I also have to put age in USEVARIABLES?
I am running an LCA using covariates by regressing class membership on these variables. I would also like to examine class differences on an outcome (i.e., using the auxiliary function), but I also wanted to control for covariates at this step. Is this possible to do? I'm assuming that including the covariates in the creation of the classes is not the same as controlling for them when examining how those classes relate to an outcome. I haven't seen any examples of this and wanted to know if you can help.
The AUXILIARY (e) option is used for screening purposes, not for model estimation. You should include the distal outcome on the USEVARIABLES list. To control for the covariates, regress the distal outcome on the covariates. The effect of the distal outcome is seen in the varying of the intercepts of the distal outcome across classes.
Thank you for your help! Would you by any chance know of a good LPA/LCA reference that used a distal outcome and covariates in this manner? I can't seem to find any on your website. I am mostly interested in seeing how the results should be presented in general and interpreted.
Thanks! I have another question I was hoping you could help me with. Is there a way to specify my models so that when I run various analyses (i.e., with different covariates, outcomes, etc.) the classes are extracted in the same order, so that I can more easily make comparisons between different models?
Hi, I am a bit confused. In this thread Linda said that the "auxiliary" function is to be used to exploratory purposes only. However I understood that Asparouhov & Bengt Muthen (2012) and the relative ppt presentation it is meant to use "auxiliary" for substantive purposes. Could some please clarify?
I mean: can I simply use the most likely class as nominal dv or i should weight it fo it its probability compared to the others probability classes? in this case is there a smart way to do it with mplus (except r3step option)?
You should wait for R3STEP or do it by hand according to the instructions given in the handout where R2STEP is described.
anonymous posted on Tuesday, March 12, 2013 - 9:10 am
I'm running a LPA with three covariates. The covariates appear to not significantly differ across classes, but when I enter the covariates in the model using the 1-step method and look at a plot of the estimated probability as a function of covariate #1, it looks like this differs based on gender such that covariate #1 appears to differ across classes for females but not for males. Is there a way to empirically demonstrate whether two covariates interact in predicting the classes? Would it be appropriate to add an interaction term as a covariate or to replace gender and covariate #1 with the interaction term? Or would it be more appropriate to run this as a multi-group LPA, with males and females in separate groups?
Just create a product term between gender and covariate #1 and include all 3 variables in the prediction of c.
ian jantz posted on Thursday, May 09, 2013 - 7:32 am
Hi, I selected a 4-class model based on 10 indicator variables. When I introduced covariates, it seems as if a substantial number of individuals were reclassified. As such, class prevalences in the model without covariates differed from the model one with covariates. Is there a resource which provides some guidance for conducting measurement non-invariance investigations of indicator variables? The first posts in this thread (October 25 through November 4, 2004 from J.W. and professors Muthen) were very helpful. But, I have some basic questions about diagnosing measurement non-invariance, what counts as substantial reclassification, the steps to isolate the indicator variables and covariates responsible for the non-invariance, and some potential solutions to addressing the issue. Any guidance is much appreciated.
Have you read the following paper which is available on the website:
Muthén, B. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (ed.), Handbook of quantitative methodology for the social sciences (pp. 345-368). Newbury Park, CA: Sage Publications.
ian jantz posted on Friday, May 10, 2013 - 1:50 pm
Hi Dr. Muthen, I read Muthen (2004) and it was very useful. I guess one basic question I have is what counts as substantial reclassification. When I introduce one combination of covariates, class prevalence changes very little, say, 2 or 3%. However, when introduce all covariates of interest,prevalence for one class changes by about 12%. Thanks so much.
This is really a substantive decision. Besides looking at the percent changes in the classes, you should look at the individual changes in posterior probabilities.
anonymous posted on Sunday, February 23, 2014 - 8:33 pm
In response to the following user question "ONE OR MORE MULTINOMIAL LOGIT PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. SINGULARITY IS MOST LIKELY BECAUSE THE MODEL IS NOT IDENTIFIED, OR BECAUSE OF EMPTY CELLS IN THE JOINT DISTRIBUTION OF CATEGORICAL LATENT VARIABLES AND ANY INDEPENDENT VARIABLES. FOLLOWING PARAMETERS WERE FIXED: 90 93 ....
Am I simply getting this warning to alert me that there are no BLK in Class 4 nor BLK/HISP in Class 5? If so,is this a warning I can ignore?"
Bengt responded "yes and yes" on December 14, 2007 (see above).
I have a follow-up question. I am in a similar situation as that user, I have one class with no males (gender was entered as a covariate); however, this makes sense based on the characteristics of the class so I haven't been concerned. Is there a way to still calculate and report inferential statistics for gender, though? Thanks!
I have no received response to my answer, I think ?
1/ what is the differences between actives covariates and inactive covariates ? 2/ Is it possible to use active covariates, then estimate posterior membership and assess OR associated with classes (the dependent variables) and the covariates (the same as the active which contribute to define the classes)(X variables) ? or is it an error and it is better to use posterior probbaility of the model without active covariates ?
3/ When is it better to use active or inactive covariates ?
I have seen active and inactive used in LGold. When covariates are inactive, they did not affect size, or estimates of the classes as they are not included into the model; however, when they are active, estimates for indicators may differ. So I did not know if covariates should be used are active or inactives ... I have this problem because I think that my covariates are associated with indicators which could explain why the definition of classes differ...
Scuse me for my bas english best regards Emilie
Jon Heron posted on Wednesday, February 26, 2014 - 9:22 am
I've always felt that LG's inactive-covariates approach was the same as Mplus' Pseudo-Class draws (auxiliary with "(r)").
You may not get perfect agreement across the packages though because LG does something to deal with covariate missingness.
Adding to Jon's comments, it sounds to me like the inactive covariates would be best handled by the Mplus 3-step method done by the Auxiliary R3STEP option. But if you suspect direct effects from some covariates to some latent class indicators you would want to take a regular (1-step) approach where all covariates are "active".
anonymous posted on Friday, February 28, 2014 - 12:01 pm
As a follow-up to my February 23rd question above, if a parameter is fixed in say a four class model but was not in the three class model, can Likelihood Ratio Tests such as the Lo-Mendell-Rubin Test (tech 11) still be interpreted?
thanks for your previous answer. I think covariates are associated with indicators and they have a direct effects on estimates as well as number of classes. However, can I use the classification of indicators obtained after inclusion of covariates than using posterior mebership (as ine the three step) assess associations (Or° BETWEEN the classes and other variables such as mortality or hospitalisation or other characteristics ?
or it is an error and we has to use only the classification obaitned without the covariates ?
You can use a model where covariates have both an effect on the latent class variable and some (but not all) of the latent class indicators. If you have strong direct effects, 3-step methods are not suitable - we describe this in our Oct 28 3-step paper on our website.
I have a question for running a regression mixture model, which is presented as Example7.1. In the example, although the correlation between X1 and X2 are included in the figure, "X1 with X2" statement is not included in the Model statement. When I included the statement, I didn't get the result because the program told me that I had to add ALGORITHM=INTEGRATION;. After adding it to the Analysis option, I could run the model, but the results were different from what I got using the original statement (as shown in the example). 1. Which one should be used if I want to know the relationship between X1 and XZ? 2. If I need to add "X1 with X2", is the statement not corresponding to the presented figure? Thank you so much for your help.
In regression, the model is estimated conditioned on the observed exogenous variables. Their means, variances, and covariances are not model parameters. We show the covariance between x1 and x2 because it is not zero during model estimation. If you want to know its value, ask for SAMPSTAT or TYPE=BASIC.
I have a problem in LCA analysis regarding health profiles in patients with a specific disease, and with concomittent variables.
1/ in the model without covariates --> BIC and BLRT indicate to choose the 3-class solution 2/ including concomittent variates (age, sex, tumor site) -> BIC says 3-class is better but BLRT is better with 4 classes (and appears more realistic)
moreover, prevalence of classes is different with and without covariates
I have noted that concomittent variables were strongly associated with indicators (all p<0.001)
so What is it better and how to deal with this problem ?
- to keep the 4-class solution with covariates ? - to keep the 3-class solution with covariates ? - to keep the 3-class without covariates ? - or to change covariates as indicators (age, sex and tumor site as indicators ?)
Moreover, if I want realize the 3-step method to assess associations between classes and other variables (not included into the model) such as healtcare utilization: how to do ??? use the classification of indicators obtained with covariates ? or without covariates ??
Thanks very much for the answers you could provide to these questions...
It is not necessarily a problem that the concomittent variables (I call them covariates) are strongly related to the latent indicators - if the covariates influence the latent class variable, that implies that the two sets of variables are related.
The easy way out to handle covariates is to use the new 3-step approach of R3STEP described in our Web Note 15 (or, DCAT/DCON for distal outcomes). But if you really want to understand the class prevalence differences you mention you want to explore direct effects from covariates to latent class indicators.
thank you prof Muthen but I did not understant the last sentence
if I want to assess OR between classes and other variables (different from covariates: e;e.: institutionalisation): I need to choose classification obtained with covariates (i.e; age, sex) or classification obtained without covariates
how to state if there is a relationship between the covariate and latent class ? - with the P value between covariate and LC ? - because the add of covariates change estimates ? (but from when we can assert it ?)
Asparouhov & Muthén (2013). Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus. Accepted for publication in Structural Equation Modeling. An earlier version of this paper is posted as web note 15. Appendices with Mplus scripts are available here.
I have read this article but (difficulties in english) It is not clear for me to how assess strong or no effects of covariables on LC or relationship between LC and covariates (p value ? other ?). It is important because I Have read that in case of strong effects, one-step is a better approach than 3-step as you have also answered in a previous question. Furthermore, it seems important for my research because without covariates BLRT and AIC3 both state 3-class , whereas with active covaraites BLRT and AIC3 both state 4 classes... so interpretation is not the same. (BIC is always 3 classes, but Nylund showed that BLRT performed the best)
thank you very much best regards Emilie (from France)
To check for direct effects from covariates to latent class indicators, you include in your model a regression of one indicator at a time on all the covariates and check for significant effects. You then include all such significant effects when you assess BIC for that number of classes.
I'm conducting LCA with 4 indicators variables, but I want to include a binary exogenous variable to only have a direct effect on an indicator variable (with 3 categories). I have tried coding it as indicator (U) on exogenous variable (X):
U on X
but the output reports the error that a nominal variable may not appear on the right-hand side of an ON statement. (Currently, I have my binary exogenous/independent variable as a nominal variable).
My questions are: 1) Why can a nominal variable not appear on the right-hand side of an ON statement? 2) How else can I code the exogenous variable to have an effect on my indicator?
Covariates must be binary or continuous. You need to create a set of dummy variables for your nominal variable. It should not be put on the NOMINAL list. This is for dependent variables. In regression, covariates are treated as continuous variables.
I have an additional follow-up question. I've read that CATEGORICAL includes binary and polytomous indicators, but in the Mplus Users Guide, it says that CATEGORICAL includes binary and ordered categorical indicators and NOMINAL includes polytomous indicators. So I'm confused how polytomous indicators are coded and how both binary and polytomous indicators can be coded to conduct LCA.
With that said, I'm conducting LCA with 4 indicators - 2 binary (X1 and X2) and 2 categorical indicators with 3-levels each (X3 and X4). What would the appropriate code for this?
CATEGORICAL is for binary and ordered-categorical or ordered polytomous variables. NOMINAL is for binary or unordered-polytomous variables.
Daniel Lee posted on Friday, April 24, 2015 - 8:46 pm
Hi Dr. Muthen,
I obtained odds ratios in my conditional growth mixture (looking at between-class effects). However, I could not find significance tests (p-value, confidence intervals) for these odds ratio. Is there a command I should type in the input to obtain these significance test?
If they are not printed, you have to compute these. See the 2 FAQs on our website:
Odds ratio confidence interval from logOR estimate and SE
Odds ratio interpretation with a nominal DV in multinomial logistic regression
Brian C posted on Friday, January 29, 2016 - 2:27 pm
Hi Drs. Muthen,
In published papers I have read on LCA, usually the researchers identify the "best" fitting class at the model building stage, by starting with 1 class and up with just the indicators (without covariates) and comparing the BIC, etc. Then the class prevalence and posterior probabilities are presented and interpreted.
After the best class solution is identified (e.g., 3-class) I would see an analysis of association between the classes and covariates (e.g., logistic/multinomial logistic regression), which I understand can be done via the auxiliary option or in a single-step approach.
But what is unclear to me is that when the researchers present association analysis, they don’t address the fact that the 3 classes are not necessarily the same 3 classes identified in the model-building stage (when no covariates were considered). Can I assume that that is because they used the auxiliary option (but which one, R?) such that the analysis of association between classes and covariates does NOT change the unconditional 3-class model?
Conversely, if the single-step approach is used, does that mean the covariates need to be included at the model-building stage (i.e., from 1 class up)? I haven’t see this done as all LCA papers I have seen simply start with the indicators, and only after the best solution is identified would they venture into covariates.
That's a long story. Part of it is described in the paper on our website:
Asparouhov, T. & Muthén, B. (2014). Auxiliary variables in mixture modeling: Three-step approaches using Mplus. Structural Equation Modeling: A Multidisciplinary Journal, 21:3, 329-341. The posted version corrects several typos in the published version. An earlier version of this paper was posted as web note 15. Download appendices with Mplus scripts.
You are right that a 3-step method (such as using R3STEP) can overlook matters that change the class formation when including covariates in the model (one-step). This includes cases where there are some direct effects of covariates on some of the LCA indicators - which is a case of measurement noninvariance that would seem to be quite common. Unfortunately, this is often overlooked.
Brian C posted on Sunday, January 31, 2016 - 12:08 pm
I am running a series of mixture models in several steps, to build a gmm of social skills over 5 time-Points With the following recommended steps: a)Testing an unconditional single-class model, b) Specifying a (LCGA) model comparing 2, 3, and 4 class solutions (comparing BIC, entropy, theoretical considerations, etc.), c) testing a conditional LCGM with the best-fitting model from step 2,adding our hypothesized covariates. d) specifying a conditional growth mixture model (GMM), entering the covariates, e) testing whether the covariates have different effects on the growth factors within each class and f)finally, testing whether parameter estimates are replicated using the OPTSEED.
Q1: How do I determine whether the model in e) is better or worse than the one in d)? Do I look at significance of the class-specific estimates of the covariates? Or the BIC, entropy etc. ? or all of these combined?
In the modification index, I get (among other many other modification suggestions):
Hi again Drs. Muthen, to follow up on my previous questions:
I have found that a 3-class solution works best according to the various indices. In the output under MODEL RESULTS, I get, for Class 1, that I ON x is -0.96 (p<0.01) and S ON x is 0.08 (p = 0.6).
Q3: Does this mean that the greater the the starting-out score for Class 1, the lower the score on x? And that the greater the score on x, the more upward the slope goes (almost sig.)?
Q4: When I look at the plots for my 3 classes, they have clearly different trajectories, one starting high, going down, one starting in the middle and staying stable, and one starting low and going up. When I look at the Intercepts for the slope factors for the 3 classes, none of them reach significance. For class 2, it makes sense, but not for the other two. I think the ‘problem’ is that all the SEs are big.
Q5. Does that indicate that the individuals in each of these classes are too dissimilar in their slopes? And how can I can remedy this situation?
I am using LCA model with 4 nominal indicators including covariates (Country and gender. In the output,I want to see if the country or gender can predict significantly the latent class membership.
So, I took a look LOGISTIC REGRESSION ODDS RATIO RESULTS Categorical Latent Variables C#1 ON CNT 0.976 GENDER 0.295 C#2 ON CNT 1.172 GENDER 0.316 But, it seems no significance test to know if the covariates are significant. Or, should I look another part?
If I have two covariates in my LCA model and one is categorical and the other is continuous, do I need to standardize the continuous covariate if I want to calculate the class probabilities by hand or Model Constraint (e.g., logit= intercept + b1(covariate1) + b2(covariate2); exp(logit)/sum)??
I'd like to investigate whether a covariate predicts latent class prevalence by gender, but not latent class structure.
I included gender as a grouping variable and fixed item response probabilities across gender. Next, I:
1) compared a model where class prevalences were freely estimated across gender to a model where they were fixed across gender. Both models had the covariate predicting class (c on covariate). Does this model comparison accurately test whether the covariate is a better predictor when class prevalences are free vs. fixed across gender?
2) I compared the best fitting model from step 1 to a model without the covariate to see if the covariate was a significant predator of class membership.
Do these comparisons answer my question if my covariate predicts class membership by gender?
Second post: I am not quite sure what you are doing. You say "fixed" but I wonder if you really fix or just hold equal.
The class prevalence is different across gender if you regress your latent class variable (c) on gender (cg). The item probabilities are different across gender if you let them vary across cg classes.
Jin Qu posted on Wednesday, August 10, 2016 - 11:52 am
Dr. Muthen, I am conducting a LPA analysis and I have obtained 4 classes. Now I included an interaction term (AxB, both continuous variables) as a predictor of class membership. The results showed that this interaction term is significant in predicting the likelihood of being in class 2 in comparison to being in class 1. Now I wonder how can I probe this interaction further? Can I do it in Mplus?
From my understanding. the example in 3.18 describes a model in which m is mediating the link between x and y; z moderates the link between x and m (y on m; m on x z xz).
The model in this Addendum is: y on m xz z; m on z xz x;
Is this still the same model as 3.18?
Also, my I understanding correctly that the analysis I want to pursue is a multinominal regression, so I should get the variable of membership (1,2,3,4) out from LPA analysis, and use this new variable as my depedent variable, rather than running LPA and probing interaction at the same time?
Your model is not the same as 3.18 but the ideas for exploring interactions are the same. You can probe interactions in a single analysis.
Jin Qu posted on Friday, August 12, 2016 - 4:10 pm
Thanks for your reply. I am able to obtain the plot for interactions. However, when I added in the Bootstrap command to see whether the slopes in the interaction graph are significant or not, I received an error message that says "BOOTSTRAP is not available for estimators MLM, MLMV, MLF and MLR." Would you mind taking a look at my codes (c is the variable that I obtained from LPA using "cprobabilities." c has 4 classes. I want to use mRS, mSC and mRSxmSC to predict c)?
nominal is c; define: center mRS_6 mSC_6 (grand mean); mSCpRS = mRS_6*mSC_6;
c#1 on mRS_6 (b1) mSC_6 (b2) mSCpRS (b3);
c#2-c#3 on mRS_6 mSC_6 mSCpRS;
MODEL CONSTRAINT: PLOT(lowSC highSC); LOOP(mRS_6,-2,2,0.5);
1. What is the difference between regressing a latent class variable c on an independent variable x1 (as in example 7.1 in the UG), and treating x1 as a latent class predictor (using AUXILIARY and R3STEP)? 2. What is the difference between treating a binary variable g (indicating sex for instance) as a categorical latent variable which has known class (group) membership, using KNOWNCLASS (as in example 7.21 in the UG) and treating it as a latent class predictor (using AUXILIARY and R3STEP)? 3. Must COUNT variables have only positive integer values? 4. What criteria or cutoff should one use to decide whether a COUNT variable should be treated as zero-inflated? 5. What criteria should one use to decide whether a variable should be treated as truncated? (e.g., should a percentage be treated as truncated?) 6. Can the differences in parameter estimates (means, probabilities) across classes be tested to see if the difference is statistically significant?
Mplus Discussion is not really the place to learn about basic LCA but I will give some quick answers:
1. No difference if you have only 1 x. But if you have more x's, the difference is that c on x does not assume that the x's are uncorrelated. But as indicators, LCA assumes that the x's are uncorrelated within class.
2. None unless the predictor also has a direct effect on the LCA indidcators.
3. Positive or zero.
5. Usually by having a strong floor or ceiling effect, say 25% or more.
Thank you, Bengt. I have rerun my LCA, recasting 4 variables that had a floor effect of 25% or more as censored from below. This has led to a lower BIC, AIC, and aBIC. I tried also declaring all 4 censored variables as inflated. For the 2-class solution this also produces lower indices, but for the 3-class solution they are almost identical. In the 3-class solution, 2 means for the inflated variable were set at -15, one is not not significant, and one is significant (this is the same within all 3 classes). Can I conclude from this if it is better to treat all 4 (or some) of these censored variables as inflated or not in the 3-class solution?
I have an ordinal variable (a measure of organization size) which I wish to include as a control variable in my LCA, as oneway ANOVAs have shown that there are significant differences in most of the indicators across the levels of this variable. 1) Is the correct approach to simply include it as another indicator or should it be declared as auxiliary? 2) When including it as an indicator, I have tried both the CATEGORICAL and NOMINAL types (in case the effect of size is not monotonic). I planned to use BIC to decide which type to use, but both produced the same indices (BIC, AIC, etc.). Is this always the case? Or does this mean that in this particular case the results are unaffected by which type I use for this variable?
Thanks and happy Thanksgiving!
Guiyun Hou posted on Friday, November 25, 2016 - 3:19 am
Dear Bengt and Linda, I have some problems in analysing LTA. I have 3 times, and 20 items per time, all of the items are continous data. When I run the MPLUS ,it can not output the BIC, and it gives me some warning as follows, accoring to the warning, I set the START as 500 50, it also can not work. but when I reduce the number of item to 16, the value of AIC and BIC may occur. Looking forward to your reply.
WARNING: THE BEST LOGLIKELIHOOD VALUE WAS NOT REPLICATED. THE SOLUTION MAY NOT BE TRUSTWORTHY DUE TO LOCAL MAXIMA. INCREASE THE NUMBER OF RANDOM STARTS. STARTING VALUES. PROBLEM INVOLVING PARAMETER 77.
1) I see a control variable as a covariate that influences the latent class variable (and perhaps some indicators directly).
2) Categorical and Nominal typically produce different number of parameters (unless it is for a binary outcome) in which case BICs would not be the same. I can't say what's going on in your case without seeing the full output.
Guiyun Hou posted on Friday, November 25, 2016 - 7:13 pm
Dear Bengt and Linda, my output is as follows, thanks for your help. WARNING: WHEN ESTIMATING A MODEL WITH MORE THAN TWO CLASSES, IT MAY BE NECESSARY TO INCREASE THE NUMBER OF RANDOM STARTS USING THE STARTS OPTION TO AVOID LOCAL MAXIMA.
WARNING: THE BEST LOGLIKELIHOOD VALUE WAS NOT REPLICATED. THE SOLUTION MAY NOT BE TRUSTWORTHY DUE TO LOCAL MAXIMA. INCREASE THE NUMBER OF RANDOM STARTS.
THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ILL-CONDITIONED FISHER INFORMATION MATRIX. CHANGE YOUR MODEL AND/OR STARTING VALUES.
THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A NON-POSITIVE DEFINITE FISHER INFORMATION MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.260D-20.
THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THIS IS OFTEN DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. CHANGE YOUR MODEL AND/OR STARTING VALUES. PROBLEM INVOLVING PARAMETER 77.
Thanks for your response. I believe I now understand how to proceed, but I want to double check I have understood correctly based on your responses in this forum and various articles on LCA.
1. Because I suspect the covariates have a direct effect on some indicators, I should use single-step regression (one-step method).
2. However, for class enumeration I should first only use the indicators. Once I have determined the number of classes, I should include covariates via single-step regression. Not only is this common practice, it is the best approach according to Nylund-Gibson & Masyn's (2016) recent MC simulation study.
3. Covariates must be binary or continuous. Whether I want to treat the categorical covariate 'size' as ordinal or nominal, I should create K-1 dummies for it.
4. Because I suspect a direct effect between my covariates and indicators, I should include in my model "a regression of one indicator at a time on all the covariates and check for significant effects" (you wrote this above). The final model should only retain significant effects.
I am trying to implement some of the models shown in Morin & Marsh (2015) with my data. According to the syntax provided with the article, when all indicators are regressed on a covariate, one must explicitly establish conditional independence by constraining all covariances between indicators to 0. I try to do that but get an error because some of my indicators are censored-inflated: *** ERROR One or more MODEL statements were ignored. These statements may be incorrect.
How can I set the covariances between censored-inflated indicators and other indicators to 0?
Dear Bengt, this is a mixture model (TYPE=MIXTURE) with continuous and censored variables (I set them as censored-inflated) (aka latent profile analysis - LPA). In Morin & Marsh (2015) four models are shown. In all 4, a direct effect is added from a covariate (control) to all indicators, but not from that covariate to the latent class var (it's not meant to be a latent class predictor). I'm trying to create the first model with my data. In the syntax provided (supplement) it indicates that because of the direct paths from the covariate to all indicators, one must explicitly establish conditional independence by constraining all covariances between indicators to 0. (by the way, is there a shortcut way to do this, since I have 20 indicators?)
When I tried I got the error I mentioned for all covariances between a continuous indicator and a censored-inflated var that I tried to set to 0: *** ERROR One or more MODEL statements were ignored. These statements may be incorrect.
I am wondering what to do. How can I tell Mplus I want it to assume conditional independence?
Morin, A. J. S., & Marsh, H. W. 2015. Disentangling shape from level effects in person-centered analyses: an illustration based on university teachers’ multidimensional profiles of effectiveness. Structural Equation Modeling, 22(1): 39–59.
Please first check if you get "WITH" estimates that show that you violate conditional independence. I don't see why that would occur. If you need to, send output to Support.
WEN Congcong posted on Wednesday, January 11, 2017 - 12:24 am
Hello, I want to illustrate the local independence for latent profile models, I don¡¯t know if there is something wrong with my illustration.
The observed variable variance has three possible origins, the variance explained by the factor, the factor variance or the measurement error. If the observed variables want to covariate, only factor variance and measurement error should be considered because factors can explain the outcomes but not the inverse. No matter in what situations, in the traditional FA or LPA, the residuals have 0 covariances as it is a model assumption. In LPA, the model has no factors and can therefore be regarded as having 0 factor variance. Hence, with 0 factor variance and 0 residual covariances, the observed variables can not covariate. If this illustration is correct, I think the residual covariance only include measurement error. But in the paper investigating population heterogeneity with factor mixture models, the residual variance includes both the factor variance and the measurement error.
A set of variables is correlated if they are all influenced by the same latent class variable. The latent variable does not need to be continuous.
WEN Congcong posted on Wednesday, January 11, 2017 - 6:36 pm
But in the paper Performance of Factor Mixture Models as a Function of Model Size, Covariate Effects, and Class-Specific Parameters, it says ¡°The latent profile model can be represented as a special case of the factor mixture model where residual factor scores have 0 variance. As a result, the covariance matrix of observed variables Y conditional on class membership equals the covariance matrix of the residuals. Because residuals are assumed to have zero covariances (discussed earlier), the conditional covariance matrix of observed variables is diagonal, and observed variables are independent given class.¡±
According to my understanding, because the the conditional covariance matrix of observed variables is diagonal, the covariance matrix only includes the variables¡¯ variances (diagonal values), the covariances between observed variables are 0(off-diagonal values). The correlation coefficient is calculated based on the covariance, so they don¡¯t correlate.
I probably misunderstand the sentences, thanks for correcting me!
Covariates are typically not given a distribution form. Just like in regular regression, they are conditioned on, that is, their marginal distribution is not specified. They are treated as continuous variables in the estimation.
Thank you for your continued support and guidance!
I have completed the enumeration process with an unconditional model through LPA (3 indicators/predictors, no covariates) and arrived at a 3 class model in accordance with Nylund-Gibson & Masyn’s (2016) article.
Using R3STEP to examine the multinomial logistic regression of potential covariates (in this case gender, minority status, and grade level), I have found both minority status and grade level to be significant.
To incorporate them into the model to generate the most accurate and interpretable class membership, I have a couple questions:
1. Do I need to examine them through a 1-step process (regress each indicator/predictor upon the covariates) to determine significant direct effects upon my indicators/predictors?
1b. Why do Nylund-Gibson & Masyn suggest this same step-wise process to examine direct effects for K-1 also?
2. Does the influence of one covariate effect the inspection of another covariate during this stepwise one step process? Should I simply take one covariate and individually regress each indicator variable upon it and then take the other through the same process and only retain the significant regressions (direct effects)?
Your multi-group run tested metric invariance, that is, invariant loadings, or scalar invariance, that is, invariant loading and intercepts. The direct effect tests invariant intercepts while holding loadings invariant. So the tests aren't exactly the same.
I'm running a 5-class LCA using 9 binary indicators. I am also attempting to include several demographic covariates. I understand the simplest way to do this is using R3STEP. However, I found one recent paper applying LCA in which the authors examined the demographic composition of classes using DCAT (essentially treating the covariates as distal outcomes). Is this an appropriate use of DCAT?
Dear professors, I ran an LCA with continuos covariates (predictors) using the manual 3 step approach. I am not sure now about how to interpret the output. Are the estimates for the categorical latent variables in the Model Results section, the beta coefficients for the regression? Similarly, in the Alternative Parameterizations section, are the estimates odds ratios or beta coefficients?
Dear Dr. Muthen, As a follow up of my previous question, I would like to know with what factors do I multiply the logistic regression coefficient to calculate the log odds for each class if I have continuos covariates. The examples in the UG are very helpful but they are based on binary covariates, so they use factors 1 or 0. I really appreciate your help. Thanks!
The log odds are for a 1-unit change in a continuous X. For a standardized X, this means a 1-standard deviation change in X. If X is on a very large scale (has an SD much larger than 1), you want to consider a larger change in X than 1. If X has smaller scale (has an SD much smaller than 1), you want to consider a smaller change in X than 1. In both cases, multiplying the log odds by the X SD gives you the result for a 1-SD unit change.