Including covariates Xs into a LCA is likely to change latent class distribution. If so, this implies that the covariates Xs may have effects on the indicators Us; then influence membership classification. My questions are:
1) Does this mean Xs should be specified to predict the latent class variable C, as well as the categorical indicators (i.e., Us)? If Xs are specified only to predict C, not Us, is there any model misspecification problem?
2) I tried to regress one indicator, U1, on Xs, then no conditional probabilities were provided in Mplus output. Is there any option to print out conditional probabilities in Mplus output in this regard?
3) I also tried to regress a categorical indicator U2 (a 3-level ordinal measure) on Xs in the model. I got one set of coefficients in each class. Are these cumulative logistic regression coefficients?
1. If a u ON x coefficient is significant, this means that if the x is not included in the model, the classes will be different.
2. Conditional probabilities are not computed when there are x's because they vary with the value of x.
3. If you get one set for each class, you must have mentioned this ON statement in the class-specific part of the MODEL. If I understand the term cumulative, these are not. They are simply the logistic regression coefficients for each class with no order implied.
J.W. posted on Monday, November 01, 2004 - 2:25 pm
Thank you very much for your prompt answeres to my questions.
1) It is very likely that covariates, such as socio-demographics, would be related to the u indicators, which are often outcome measures. If we want to use socio-demographics to predict latent class membership, do we have to specify the relationship between all x’s and all u’s? That would be i) very tedious and difficult for model specification when there are a lot of u indicators and x covariates; ii) there would be no conditional probabilities reported, as you pointed out, because they vary with the values of covariates. iii) If we use socio-demographic covariates to predict class membership without relating them to the u indicators, would the LCA model be misspecified?
2) Sorry, I did not make my question 3 clear in my last message. Let me try again. Regressing a K level categorical u indicator (e.g., 3 categories) on x covariates, I was expecting K-1 (e.g., 2) sets of multinomial logit model coefficients for each class. However, I only got one set of coefficients when I regressed a 3 level categorical u variable on x covariates. I was wondering if the coefficients were from a proportional odds logit model (sometimes called “cumulative logistic regression”) since the u variable is an ordinal measure.
bmuthen posted on Monday, November 01, 2004 - 4:00 pm
1) A significant direct effect from a covariate to a u indicator shows that the u measurement is not invariant with respect to the groups of people represented by the values of the covariates. Such a measurement non-invariance check cannot be done by including all direct effects in addition to the effect of the covariates on the latent class variable because this model is not identified. One can investigate one u at a time, allowing the covariates to influence this u directly, not only via the latent class variable. (i) yes, measurement non-invariance investigations can be tedious, but can be important (seldom done unfortunately). (ii) correct. (iii) yes
2) Yes, you are right, these estimates are from a proportional odds model since polytomous u's are taken to be ordinal when the categorical = option in the Variable command is used. If you want to specify the u's as nominal, use the nominal = option.
J.W. posted on Wednesday, November 03, 2004 - 11:30 am
When LCA is used to assess the pattern of outcome measures, such as diagnosed symptoms or risk behaviors, very often LCA was conducted without covariates or relationships between class membership and covariates was assessed separately after class membership was estimated. This was inappropriate, because the LCA model was misspecified. To my understanding, covariates that influence, in theory, the latent class membership should be included in LCA. Estimation of latent class membership and the relationships between the class membership and covariates should be done simultaneously. What bothers me is that covariates (e.g., socio-demographic characteristics), that influence the class membership, would also very likely to influence the u outcome indicators. Is there any covariate that does not influence the u outcome indicators, but class membership? The difficulties are: 1) If multiple or all the u indicators are significantly related with the covariates, we can’t regress all these measurement non-invariant u indicators on covariates, when the covariates are also used to predict class membership, as you pointed out in your last message, the “model is not identified.” 2) Even with just one measurement non-invariant u indicator, it would be difficult to define the latent classes because the conditional probabilities would not be available when covariates are used to predict the u indicator. Now, I find myself in dilemma. Excluding covariates, the LCA is misspecified; including covariates, I have the above difficulties. Any solution? Many thanks in advance.
bmuthen posted on Wednesday, November 03, 2004 - 11:46 am
The u indicators are in fact correlated with the covariates even when the covariates only point to the latent class variable and not directly to the u's. This is because the covariates then have an indirect effect on the u's. So this model has strong correlations between the covariates and the u's (even without direct effects).
2) Conditional probabilities are available even with a direct effect to a u. You can compute the conditional probability for each class and each value of the covariate (Mplus doesn't do it, but it can be done by hand using the estimates).
Hope this answers your questions.
J.W. posted on Thursday, November 04, 2004 - 11:10 am
I'm doing an LCA using two demographic covariates to predict latent class membership for 8 indicator variables. I'd like to compare models with various numbers of latent classes using AIC, BIC, etc. This seems pretty straightforward for 2 or more latent classes.
But I also want to compare a model with a single latent class, and can't figure out how to model a single latent class with covariates. If I include my covariates in the USEVARIABLES line, without specifying an ON statement for the relation between covariates and latent class, does MPlus know that these variables are covariates?
Does it even make sense to include covariates in a single class model? Mplus calculates parameters for them, but I'm not sure it's doing what I think it's doing.
With one class, there is no latent class membership to predict. Everyone is in one class.
bmuthen posted on Monday, April 25, 2005 - 3:38 pm
If you only include the covariates in the USEV list and not in the model, the covariates are treated as variables that are uncorrelated among themselves and with the other observed variables (see the warning you got) - this is not what you want; don't include the covariates in the USEV list when you only have 1 class (unless you want them to influence the outcomes).
anonymous posted on Friday, January 13, 2006 - 12:05 pm
when conducting LCA with covariates, can the covariates have more than 2 categories for which i then substitute different values for x when calulating the probabilities using the logitistic regression coefficients (e.g.0,1,2)? or do i need to create 3 dummy variables to represent the 3 categories of this covariate?
Covariates can be continuous or a set of dummy variables. You would create two dummy variables to represent three categories.
anonymous posted on Wednesday, January 18, 2006 - 7:34 am
HI can the odds ratios be interpreted for covariates that are not binary, i.e. that have 3 or 4 nominal categories? also, if i have 2 or 3 covariates is it possible to look at the odds ratios for each covariate in turn for each class? for example when including sex and religion, do i say the odds for females of being in class 1 is higher than for males, and the odds of catholics being in class 1 is higher than for protestants, etc.
If you have nominal covariates, you need to create a set of dummy variables. Covariates can be continuous or binary as in regular regresssion.
You would want to add to "for males" the words "holding other covariates constant".
anonymous posted on Wednesday, January 18, 2006 - 10:36 am
thanks. i included a set of dummy variables, but got the following message:
ONE OR MORE MULTINOMIAL LOGIT PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY BECAUSE THE MODEL IS NOT IDENTIFIED, OR BECAUSE OF EMPTY CELLS IN THE JOINT DISTRIBUTION OF THE CATEGORICAL LATENT VARIABLES AND ANY INDEPENDENT VARIABLES. THE FOLLOWING PARAMETERS WERE FIXED: 119 123 127 131 135 139
i'm not sure how to rectify this. i'm assuming this has something to do with the inclusion of the dummy variables, so i may have done something wrong. for a nominal variable with 3 categories i created 3 dummy variables to represent membership (0) or non-membership (1) for each of the 3 categories respectively.
For a nominal variable with three categories, you would create two dummy variables just like in regular regression.
anonymous posted on Wednesday, January 18, 2006 - 12:08 pm
sorry, just to clarify once more. given then that i will have one reference category against which i will be comparing the other 2 categories (dummy variables), does this mean that i will not be able to calculate the probabilities of class membership for this given reference category? or do i simply regard the slope as 0 for this reference category, where the logit=intercept?
Think of p u indicators and q x covariates. What identifies relationships between u's and x's can be thought of as logistic regression relationships for u on x. There are pxq such slopes. Even with a binary latent class variable c, we already use up q slopes for c ON x, so there isn't enough information left for pxq more slopes.
For my LCA model with 2 dummy variables to reflect a 3 category covariate RACE, I get the warning: "ONE OR MORE MULTINOMIAL LOGIT PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. SINGULARITY IS MOST LIKELY BECAUSE THE MODEL IS NOT IDENTIFIED, OR BECAUSE OF EMPTY CELLS IN THE JOINT DISTRIBUTION OF CATEGORICAL LATENT VARIABLES AND ANY INDEPENDENT VARIABLES. FOLLOWING PARAMETERS WERE FIXED: 90 93
Syntax is: USEVARIABLES are ..... BLK HISP; Classes = c(6); Analysis:Type=mixture ; MODEL: %OVERALL% c ON BLK HISP;
Fixed parameters seem to be BLK in C#4 and HISP in C#5. GAMMA(C) BLK HISP C#4 90 91 C#5 92 93
I was wondering if anyone on the discussion forum could help. I am running the following 4 Class LCA model, with a direct effect (Age).......
Variable: Names are scrser area gor6 nrf2005 education age swimming snooker darts football fishing newoutdo wintersp water tennis badmin squash cycling fitness cricket tabten golf horserid yoga tenpin jog bats rackets rackets2; Missing are all (-9999) ; USEVARIABLES swimming snooker football newoutdo wintersp water cycling fitness golf tenpin jog cricket rackets2 age; CATEGORICAL swimming snooker football newoutdo wintersp water cycling fitness golf tenpin jog cricket rackets2; CLASSES = C (4);
ANALYSIS: TYPE = MIXTURE;
MODEL: %OVERALL% swimming - rackets2 on newage1;
........... My question is, the age variable is categorical (5 categories), and how do I let MPlus know that this variable is categorical? as I cant put it into the CATEGORICAL section as it will place it in the latent class model.
I am doing a two-level mixture model with binary indicators and I would like to know how to get the condutional probabalities for each indicator. For example, in the regular mixture model, TECH1 produces conditional probabilities and thresholds; however, in the two-level model, there are only thresholds, no conditional probabilities for the indicators. Is there a way to use the thresholds to compute conditional probabilities in the two-level model?
Hello, I have a follow-up to my previous question. If I am unable to compute conditional probabilities by hand and I cannot estimate them in the two-level model, would you suggest estimating them in a single level model? I am using complex data with clustering at 2 levels and the single level mixture model only allows one cluster variable. How concerned should I be if I am unable to estimate the conditional probabilities with both clustering variables in the model?
I would not change my model. I would instead interpret the sign and significance of the latent class indicators and look at the profiles. You will not learn anything more from the probabilities than from the parameter estimates.
nina chien posted on Wednesday, May 07, 2008 - 11:14 am
When I include covariates, all cases with missing data on just ONE of the covariates are dropped from the analysis.
I am using the TYPE = MISSING command.
"Data set contains cases with missing on x-variables. These cases were not included in the analysis. Number of cases with missing on x-variables: 418"
Is this supposed to happen? Thanks very much for your help.
nina chien posted on Wednesday, May 07, 2008 - 3:59 pm
I am doing an LPA with covariates. There is one covariate - poverty status - that I know influences profile membership.
But I also want to use poverty status, later down the line, as a predictor to test for interaction effects with profile membership (profile x poverty status) on some outcome variables.
My options are: 1) Include poverty status as a covariate in the LPA. Then use it later again as a predictor variable when testing the interaction of profile x poverty status. 2) Do not include poverty status as a covariate in the LPA. Use it later as a predictor variable when testing the interaction. 3) Include poverty status as a covariate in the LPA. I cannot use it as a predictor variable later (i.e., I must drop my research question having to do with the interaction entirely).
Which is the correct one (I really hope not 3)? Or another option entirely? Thanks very much for your help.
A model is estimated conditioned on the covariates. As a result no distributional assumptions are made about them. If you don't want the observations with missing values on the covariates deleted from the analysis, you must bring these variables into the model and thereby make distributional assumptions about them. You can do this by mentioning their variances in the MODEL command.
I would include the covariate in the LPA and also when the distal outcome is added. You would regress both the categorical latent variable and the distal outcome on poverty. By allowing the regression to vary across classes, you would capture the interaction you are interested in.
Kaigang Li posted on Tuesday, May 27, 2008 - 8:35 pm
Based on your answer to Nina Chien's question on May 07, could you please clarify how to mention the variances in the MODEL command? Should I compute the variances using other stats package and fix the variances in the MODEL command using @?
I performed a cross-sectional LCA using early adult factors as the indicators. The optimal number of classes was 4. I now want to see how these 4 classes vary on a specific set of adolescent factors. One such variable is high-school GPA.
It seems that there are two ways to examine such class differences. One is to include "C on gpa" in the %Overall% model statement. A second way is to include "Auxiliary = gpa(e)" in the Variable section of the code. Each approach appears to provide results that are consistent with the other. The relative risks (change in prob of class membership for an increase of 1 in gpa) calculated from the “C on gpa” approach are consistent with the gpa means provided for each class from the “Auxiliary” approach.
My questions are: (1) Do these two approaches provide equivalent results (though in different metrics)? Am I comparing “apples to apples”? (2) If they are not equivalent, in the "C on gpa" method does heterogeneity in gpa contribute to/influence actual class structure/assignment?
Thanks for the quick response. So, say my LCA has 5 "u" variables or indicators. Would a model with the 5 "u" variables in the class specific model statements (i.e., %C#1%, %C#2%, etc.) and with "C on gpa" yield the same class structure and probabilities as a model with the 5 "U" variables AND GPA in the class specific model statements? (With the obvious difference between the two models being that the former also provides changes in membership probability given changes in levels of GPA).
Here are the two versions in syntax form...
%Overall% c on GPA; %C#1% [U1$1*1.435]; [U2$1*.045]; [U3$2*.317]; [U4$1*.664] [U5$1*1.765];
I posted a couple of days ago (on another thread) about changing reference classes in a 3 class lca model in order to be able to have my desired ORs reported in the output. Your recommendation was to use the ending values as starting values for the classes in a new analysis. This works very well, thank you.
The challenge I am having now is when I add my three covariates to the model the classes return to the order of when I am not using the ending values as starting values. It is like the covariates nullify the command to switch the order.
Is it possible to reorder the classes with the ending values and to simultaneously add covariates? If so, is there additional syntax I am missing?
The starting values from the model without covariates may not be correct for the model with covariates. There may be direct effects needed between the covariates and the latent class indicators that make the classes change when covariates are added. See the following paper which is available on the website for a discussion of this:
Muthén, B. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (ed.), Handbook of quantitative methodology for the social sciences (pp. 345-368). Newbury Park, CA: Sage Publications.
Dear, I'm doing LCA (what works great), but now I want to include two covariates (age and sex).
This is my syntax: ... VARIABLE: NAMES ARE idnr it1 it2 it3 it4 it5 it6 it7 it8 age sex; USEVARIABLES ARE it1 it2 it3 it4 it5 it6 it7 it8 age sex; CLASSES = c (2); CATEGORICAL = it1 it2 it3 it4 it5 it6 it7 it8 sex; ANALYSIS: TYPE = MIXTURE; ALGORITHM = INTEGRATION; MODEL: %OVERALL% c ON age sex; OUTPUT: SAMPSTAT STANDARDIZED MODINDICES; SAVEDATA: FILE IS prob2class_710mergeAGESEXok.dat; SAVE IS CPROB;
- Is this the right way: c ON age sex; to use age and sex as covariates? I don't think so because my fit indices change when I leave out "sex" in the category "CATEGORICAL". And I can't put "age" in "CATEGORICAL" because this is a nog a categorical variable. So, my question is I can control for age and/or sex.
To come back on my previous question, is the following correct because age and sex are still seen as dependent continuous variables: Number of dependent variables 10 Number of independent variables 0 Number of continuous latent variables 0 Number of categorical latent variables 1
Dear, When I control for age and sex (c ON age sex;), I put age and sex in USEVARIABLES. My question is: When I control for only 1 variable, do I have to put the other one also in USEVARIABLES? E.g. when I only control for sex (which is of course in USEVARIABLES), do I also have to put age in USEVARIABLES?
I am running an LCA using covariates by regressing class membership on these variables. I would also like to examine class differences on an outcome (i.e., using the auxiliary function), but I also wanted to control for covariates at this step. Is this possible to do? I'm assuming that including the covariates in the creation of the classes is not the same as controlling for them when examining how those classes relate to an outcome. I haven't seen any examples of this and wanted to know if you can help.
The AUXILIARY (e) option is used for screening purposes, not for model estimation. You should include the distal outcome on the USEVARIABLES list. To control for the covariates, regress the distal outcome on the covariates. The effect of the distal outcome is seen in the varying of the intercepts of the distal outcome across classes.
Thank you for your help! Would you by any chance know of a good LPA/LCA reference that used a distal outcome and covariates in this manner? I can't seem to find any on your website. I am mostly interested in seeing how the results should be presented in general and interpreted.
Thanks! I have another question I was hoping you could help me with. Is there a way to specify my models so that when I run various analyses (i.e., with different covariates, outcomes, etc.) the classes are extracted in the same order, so that I can more easily make comparisons between different models?
Hi, I am a bit confused. In this thread Linda said that the "auxiliary" function is to be used to exploratory purposes only. However I understood that Asparouhov & Bengt Muthen (2012) and the relative ppt presentation it is meant to use "auxiliary" for substantive purposes. Could some please clarify?
I mean: can I simply use the most likely class as nominal dv or i should weight it fo it its probability compared to the others probability classes? in this case is there a smart way to do it with mplus (except r3step option)?
You should wait for R3STEP or do it by hand according to the instructions given in the handout where R2STEP is described.
anonymous posted on Tuesday, March 12, 2013 - 9:10 am
I'm running a LPA with three covariates. The covariates appear to not significantly differ across classes, but when I enter the covariates in the model using the 1-step method and look at a plot of the estimated probability as a function of covariate #1, it looks like this differs based on gender such that covariate #1 appears to differ across classes for females but not for males. Is there a way to empirically demonstrate whether two covariates interact in predicting the classes? Would it be appropriate to add an interaction term as a covariate or to replace gender and covariate #1 with the interaction term? Or would it be more appropriate to run this as a multi-group LPA, with males and females in separate groups?
Just create a product term between gender and covariate #1 and include all 3 variables in the prediction of c.
ian jantz posted on Thursday, May 09, 2013 - 7:32 am
Hi, I selected a 4-class model based on 10 indicator variables. When I introduced covariates, it seems as if a substantial number of individuals were reclassified. As such, class prevalences in the model without covariates differed from the model one with covariates. Is there a resource which provides some guidance for conducting measurement non-invariance investigations of indicator variables? The first posts in this thread (October 25 through November 4, 2004 from J.W. and professors Muthen) were very helpful. But, I have some basic questions about diagnosing measurement non-invariance, what counts as substantial reclassification, the steps to isolate the indicator variables and covariates responsible for the non-invariance, and some potential solutions to addressing the issue. Any guidance is much appreciated.
Have you read the following paper which is available on the website:
Muthén, B. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (ed.), Handbook of quantitative methodology for the social sciences (pp. 345-368). Newbury Park, CA: Sage Publications.
ian jantz posted on Friday, May 10, 2013 - 1:50 pm
Hi Dr. Muthen, I read Muthen (2004) and it was very useful. I guess one basic question I have is what counts as substantial reclassification. When I introduce one combination of covariates, class prevalence changes very little, say, 2 or 3%. However, when introduce all covariates of interest,prevalence for one class changes by about 12%. Thanks so much.
This is really a substantive decision. Besides looking at the percent changes in the classes, you should look at the individual changes in posterior probabilities.
anonymous posted on Sunday, February 23, 2014 - 8:33 pm
In response to the following user question "ONE OR MORE MULTINOMIAL LOGIT PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. SINGULARITY IS MOST LIKELY BECAUSE THE MODEL IS NOT IDENTIFIED, OR BECAUSE OF EMPTY CELLS IN THE JOINT DISTRIBUTION OF CATEGORICAL LATENT VARIABLES AND ANY INDEPENDENT VARIABLES. FOLLOWING PARAMETERS WERE FIXED: 90 93 ....
Am I simply getting this warning to alert me that there are no BLK in Class 4 nor BLK/HISP in Class 5? If so,is this a warning I can ignore?"
Bengt responded "yes and yes" on December 14, 2007 (see above).
I have a follow-up question. I am in a similar situation as that user, I have one class with no males (gender was entered as a covariate); however, this makes sense based on the characteristics of the class so I haven't been concerned. Is there a way to still calculate and report inferential statistics for gender, though? Thanks!
I have no received response to my answer, I think ?
1/ what is the differences between actives covariates and inactive covariates ? 2/ Is it possible to use active covariates, then estimate posterior membership and assess OR associated with classes (the dependent variables) and the covariates (the same as the active which contribute to define the classes)(X variables) ? or is it an error and it is better to use posterior probbaility of the model without active covariates ?
3/ When is it better to use active or inactive covariates ?
I have seen active and inactive used in LGold. When covariates are inactive, they did not affect size, or estimates of the classes as they are not included into the model; however, when they are active, estimates for indicators may differ. So I did not know if covariates should be used are active or inactives ... I have this problem because I think that my covariates are associated with indicators which could explain why the definition of classes differ...
Scuse me for my bas english best regards Emilie
Jon Heron posted on Wednesday, February 26, 2014 - 9:22 am
I've always felt that LG's inactive-covariates approach was the same as Mplus' Pseudo-Class draws (auxiliary with "(r)").
You may not get perfect agreement across the packages though because LG does something to deal with covariate missingness.
Adding to Jon's comments, it sounds to me like the inactive covariates would be best handled by the Mplus 3-step method done by the Auxiliary R3STEP option. But if you suspect direct effects from some covariates to some latent class indicators you would want to take a regular (1-step) approach where all covariates are "active".
anonymous posted on Friday, February 28, 2014 - 12:01 pm
As a follow-up to my February 23rd question above, if a parameter is fixed in say a four class model but was not in the three class model, can Likelihood Ratio Tests such as the Lo-Mendell-Rubin Test (tech 11) still be interpreted?
thanks for your previous answer. I think covariates are associated with indicators and they have a direct effects on estimates as well as number of classes. However, can I use the classification of indicators obtained after inclusion of covariates than using posterior mebership (as ine the three step) assess associations (Or° BETWEEN the classes and other variables such as mortality or hospitalisation or other characteristics ?
or it is an error and we has to use only the classification obaitned without the covariates ?
You can use a model where covariates have both an effect on the latent class variable and some (but not all) of the latent class indicators. If you have strong direct effects, 3-step methods are not suitable - we describe this in our Oct 28 3-step paper on our website.