Mplus Discussion >> Including covariates in LCA

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Including covariates in LCA

Mplus Discussion > Latent Variable Mixture Modeling >

Message/Author

J.W. posted on Monday, October 25, 2004 - 1:44 pm

Including covariates Xs into a LCA is likely to change latent class distribution. If so, this implies that the covariates Xs may have effects on the indicators Us; then influence membership classification. My questions are:

1) Does this mean Xs should be specified to predict the latent class variable C, as well as the categorical indicators (i.e., Us)? If Xs are specified only to predict C, not Us, is there any model misspecification problem?

2) I tried to regress one indicator, U1, on Xs, then no conditional probabilities were provided in Mplus output. Is there any option to print out conditional probabilities in Mplus output in this regard?

3) I also tried to regress a categorical indicator U2 (a 3-level ordinal measure) on Xs in the model. I got one set of coefficients in each class. Are these cumulative logistic regression coefficients?

Thank you very much for your help!

Linda K. Muthen posted on Monday, October 25, 2004 - 2:14 pm

1. If a u ON x coefficient is significant, this means that if the x is not included in the model, the classes will be different.

2. Conditional probabilities are not computed when there are x's because they vary with the value of x.

3. If you get one set for each class, you must have mentioned this ON statement in the class-specific part of the MODEL. If I understand the term cumulative, these are not. They are simply the logistic regression coefficients for each class with no order implied.

J.W. posted on Monday, November 01, 2004 - 2:25 pm

Thank you very much for your prompt answeres to my questions.

1) It is very likely that covariates, such as socio-demographics, would be related to the u indicators, which are often outcome measures. If we want to use socio-demographics to predict latent class membership, do we have to specify the relationship between all x’s and all u’s? That would be i) very tedious and difficult for model specification when there are a lot of u indicators and x covariates; ii) there would be no conditional probabilities reported, as you pointed out, because they vary with the values of covariates. iii) If we use socio-demographic covariates to predict class membership without relating them to the u indicators, would the LCA model be misspecified?

2) Sorry, I did not make my question 3 clear in my last message. Let me try again.
Regressing a K level categorical u indicator (e.g., 3 categories) on x covariates, I was expecting K-1 (e.g., 2) sets of multinomial logit model coefficients for each class. However, I only got one set of coefficients when I regressed a 3 level categorical u variable on x covariates. I was wondering if the coefficients were from a proportional odds logit model (sometimes called “cumulative logistic regression”) since the u variable is an ordinal measure.

bmuthen posted on Monday, November 01, 2004 - 4:00 pm

1) A significant direct effect from a covariate to a u indicator shows that the u measurement is not invariant with respect to the groups of people represented by the values of the covariates. Such a measurement non-invariance check cannot be done by including all direct effects in addition to the effect of the covariates on the latent class variable because this model is not identified. One can investigate one u at a time, allowing the covariates to influence this u directly, not only via the latent class variable. (i) yes, measurement non-invariance investigations can be tedious, but can be important (seldom done unfortunately). (ii) correct. (iii) yes

2) Yes, you are right, these estimates are from a proportional odds model since polytomous u's are taken to be ordinal when the categorical = option in the Variable command is used. If you want to specify the u's as nominal, use the nominal = option.

J.W. posted on Wednesday, November 03, 2004 - 11:30 am

When LCA is used to assess the pattern of outcome measures, such as diagnosed symptoms or risk behaviors, very often LCA was conducted without covariates or relationships between class membership and covariates was assessed separately after class membership was estimated. This was inappropriate, because the LCA model was misspecified.
To my understanding, covariates that influence, in theory, the latent class membership should be included in LCA. Estimation of latent class membership and the relationships between the class membership and covariates should be done simultaneously. What bothers me is that covariates (e.g., socio-demographic characteristics), that influence the class membership, would also very likely to influence the u outcome indicators. Is there any covariate that does not influence the u outcome indicators, but class membership? The difficulties are: 1) If multiple or all the u indicators are significantly related with the covariates, we can’t regress all these measurement non-invariant u indicators on covariates, when the covariates are also used to predict class membership, as you pointed out in your last message, the “model is not identified.”
2) Even with just one measurement non-invariant u indicator, it would be difficult to define the latent classes because the conditional probabilities would not be available when covariates are used to predict the u indicator.
Now, I find myself in dilemma. Excluding covariates, the LCA is misspecified; including covariates, I have the above difficulties. Any solution? Many thanks in advance.

bmuthen posted on Wednesday, November 03, 2004 - 11:46 am

The u indicators are in fact correlated with the covariates even when the covariates only point to the latent class variable and not directly to the u's. This is because the covariates then have an indirect effect on the u's. So this model has strong correlations between the covariates and the u's (even without direct effects).

2) Conditional probabilities are available even with a direct effect to a u. You can compute the conditional probability for each class and each value of the covariate (Mplus doesn't do it, but it can be done by hand using the estimates).

Hope this answers your questions.

J.W. posted on Thursday, November 04, 2004 - 11:10 am

Thank you so much!

ADC posted on Monday, April 25, 2005 - 2:51 pm

I'm doing an LCA using two demographic covariates to predict latent class membership for 8 indicator variables. I'd like to compare models with various numbers of latent classes using AIC, BIC, etc. This seems pretty straightforward for 2 or more latent classes.

But I also want to compare a model with a single latent class, and can't figure out how to model a single latent class with covariates. If I include my covariates in the USEVARIABLES line, without specifying an ON statement for the relation between covariates and latent class, does MPlus know that these variables are covariates?

Does it even make sense to include covariates in a single class model? Mplus calculates parameters for them, but I'm not sure it's doing what I think it's doing.

Thanks for any help you can provide.

Linda K. Muthen posted on Monday, April 25, 2005 - 2:59 pm

With one class, there is no latent class membership to predict. Everyone is in one class.

bmuthen posted on Monday, April 25, 2005 - 3:38 pm

If you only include the covariates in the USEV list and not in the model, the covariates are treated as variables that are uncorrelated among themselves and with the other observed variables (see the warning you got) - this is not what you want; don't include the covariates in the USEV list when you only have 1 class (unless you want them to influence the outcomes).

ADC posted on Tuesday, April 26, 2005 - 6:45 am

That makes much more sense. Thanks for your help!

anonymous posted on Friday, January 13, 2006 - 12:05 pm

when conducting LCA with covariates, can the covariates have more than 2 categories for which i then substitute different values for x when calulating the probabilities using the logitistic regression coefficients (e.g.0,1,2)?
or do i need to create 3 dummy variables to represent the 3 categories of this covariate?

Linda K. Muthen posted on Friday, January 13, 2006 - 12:11 pm

Covariates can be continuous or a set of dummy variables. You would create two dummy variables to represent three categories.

anonymous posted on Wednesday, January 18, 2006 - 7:34 am

HI
can the odds ratios be interpreted for covariates that are not binary, i.e. that have 3 or 4 nominal categories?
also, if i have 2 or 3 covariates is it possible to look at the odds ratios for each covariate in turn for each class? for example when including sex and religion, do i say the odds for females of being in class 1 is higher than for males, and the odds of catholics being in class 1 is higher than for protestants, etc.

Linda K. Muthen posted on Wednesday, January 18, 2006 - 9:30 am

If you have nominal covariates, you need to create a set of dummy variables. Covariates can be continuous or binary as in regular regresssion.

You would want to add to "for males" the words "holding other covariates constant".

anonymous posted on Wednesday, January 18, 2006 - 10:36 am

thanks. i included a set of dummy variables, but got the following message:

ONE OR MORE MULTINOMIAL LOGIT PARAMETERS WERE FIXED TO AVOID SINGULARITY
OF THE INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY BECAUSE THE
MODEL IS NOT IDENTIFIED, OR BECAUSE OF EMPTY CELLS IN THE JOINT
DISTRIBUTION OF THE CATEGORICAL LATENT VARIABLES AND ANY INDEPENDENT
VARIABLES. THE FOLLOWING PARAMETERS WERE FIXED:
119 123 127 131 135 139

i'm not sure how to rectify this. i'm assuming this has something to do with the inclusion of the dummy variables, so i may have done something wrong. for a nominal variable with 3 categories i created 3 dummy variables to represent membership (0) or non-membership (1) for each of the 3 categories respectively.

Linda K. Muthen posted on Wednesday, January 18, 2006 - 10:52 am

For a nominal variable with three categories, you would create two dummy variables just like in regular regression.

anonymous posted on Wednesday, January 18, 2006 - 12:08 pm

sorry, just to clarify once more.
given then that i will have one reference category against which i will be comparing the other 2 categories (dummy variables), does this mean that i will not be able to calculate the probabilities of class membership for this given reference category?
or do i simply regard the slope as 0 for this reference category, where the logit=intercept?

Linda K. Muthen posted on Wednesday, January 18, 2006 - 1:04 pm

I think you should read the section in Chapter 13 called Calculating Probabilities From Logistic Regression Coefficients. This should answer your questions.

Cameron McPhee posted on Friday, May 26, 2006 - 9:14 am

This may be a dumb question, but can you explain to me why an LCA model would be not identified if you have direct effects froon a covariate to all u indicators and to a latent classs variable.

Bengt O. Muthen posted on Friday, May 26, 2006 - 9:54 am

Think of p u indicators and q x covariates. What identifies relationships between u's and x's can be thought of as logistic regression relationships for u on x. There are pxq such slopes. Even with a binary latent class variable c, we already use up q slopes for c ON x, so there isn't enough information left for pxq more slopes.

Theresa Thompson posted on Friday, December 14, 2007 - 3:06 pm

For my LCA model with 2 dummy variables to reflect a 3 category covariate RACE, I get the warning: "ONE OR MORE MULTINOMIAL LOGIT PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. SINGULARITY IS MOST LIKELY BECAUSE THE MODEL IS NOT IDENTIFIED, OR BECAUSE OF EMPTY CELLS IN THE JOINT DISTRIBUTION OF CATEGORICAL LATENT VARIABLES AND ANY INDEPENDENT VARIABLES. FOLLOWING PARAMETERS WERE FIXED: 90 93

Syntax is:
USEVARIABLES are ..... BLK HISP;
Classes = c(6);
Analysis:Type=mixture ;
MODEL: %OVERALL%
c ON BLK HISP;

Fixed parameters seem to be BLK in C#4 and HISP in C#5.
GAMMA(C)
BLK HISP
C#4 90 91
C#5 92 93

Categorical Latent Variables
C#4 ON
BLK -23.313 0.000 999.000 999.000
HISP 0.914 0.234 3.908 0.000
C#5 ON
BLK 22.400 0.351 63.901 0.000
HISP 20.592 0.000 999.000 999.000

LOGISTIC REGRESSION ODDS RATIOS
C#4 ON
BLK 0.000
HISP 2.493
C#5 ON
BLK *********
HISP *********

Am I simply getting this warning to alert me that there are no BLK in Class 4 nor BLK/HISP in Class 5? If so,is this a warning I can ignore?

Bengt O. Muthen posted on Friday, December 14, 2007 - 5:49 pm

Yes. Yes.

Paul Widdop posted on Friday, March 14, 2008 - 3:58 am

Good Morning,

I was wondering if anyone on the discussion forum could help. I am running the following 4 Class LCA model, with a direct effect (Age).......

Variable:
Names are scrser area gor6 nrf2005
education age swimming snooker darts football fishing newoutdo wintersp water tennis badmin squash cycling fitness cricket tabten golf horserid yoga tenpin
jog bats rackets rackets2;
Missing are all (-9999) ;
USEVARIABLES swimming snooker football newoutdo wintersp water cycling
fitness golf tenpin jog cricket rackets2 age;
CATEGORICAL swimming snooker football newoutdo wintersp water cycling
fitness golf tenpin jog cricket rackets2;
CLASSES = C (4);

ANALYSIS:
TYPE = MIXTURE;

MODEL:
%OVERALL%
swimming - rackets2 on newage1;

........... My question is, the age variable is categorical (5 categories), and how do I let MPlus know that this variable is categorical? as I cant put it into the CATEGORICAL section as it will place it in the latent class model.

Any help would be great.

Cheers

Paul

Linda K. Muthen posted on Friday, March 14, 2008 - 6:45 am

You don't need to put it on the CATEGORICAL list. This is only for dependent variables. All covariates are treated as continuous in regression analysis.

Vernon Woodley posted on Monday, March 31, 2008 - 10:35 am

Hello,

I am doing a two-level mixture model with binary indicators and I would like to know how to get the condutional probabalities for each indicator. For example, in the regular mixture model, TECH1 produces conditional probabilities and thresholds; however, in the two-level model, there are only thresholds, no conditional probabilities for the indicators. Is there a way to use the thresholds to compute conditional probabilities in the two-level model?

Thanks,
Vernon

Linda K. Muthen posted on Monday, March 31, 2008 - 10:51 am

We don't provide conditional probabilities when numerical integration is involved. You cannot compute the conditional probabilities by hand because they must be computed using numerical integration.

Vernon Woodley posted on Tuesday, April 01, 2008 - 10:23 am

Hello,
I have a follow-up to my previous question.
If I am unable to compute conditional probabilities by hand and I cannot estimate them in the two-level model, would you suggest estimating them in a single level model?
I am using complex data with clustering at 2 levels and the single level mixture model only allows one cluster variable. How concerned should I be if I am unable to estimate the conditional probabilities with both clustering variables in the model?

Linda K. Muthen posted on Tuesday, April 01, 2008 - 5:59 pm

I would not change my model. I would instead interpret the sign and significance of the latent class indicators and look at the profiles. You will not learn anything more from the probabilities than from the parameter estimates.

nina chien posted on Wednesday, May 07, 2008 - 11:14 am

When I include covariates, all cases with missing data on just ONE of the covariates are dropped from the analysis.

I am using the TYPE = MISSING command.

"Data set contains cases with missing on x-variables.
These cases were not included in the analysis.
Number of cases with missing on x-variables: 418"

Is this supposed to happen? Thanks very much for your help.

Nina

nina chien posted on Wednesday, May 07, 2008 - 3:59 pm

I am doing an LPA with covariates. There is one covariate - poverty status - that I know influences profile membership.

But I also want to use poverty status, later down the line, as a predictor to test for interaction effects with profile membership (profile x poverty status) on some outcome variables.

My options are:
1) Include poverty status as a covariate in the LPA. Then use it later again as a predictor variable when testing the interaction of profile x poverty status.
2) Do not include poverty status as a covariate in the LPA. Use it later as a predictor variable when testing the interaction.
3) Include poverty status as a covariate in the LPA. I cannot use it as a predictor variable later (i.e., I must drop my research question having to do with the interaction entirely).

Which is the correct one (I really hope not 3)? Or another option entirely? Thanks very much for your help.

Linda K. Muthen posted on Wednesday, May 07, 2008 - 5:50 pm

A model is estimated conditioned on the covariates. As a result no distributional assumptions are made about them. If you don't want the observations with missing values on the covariates deleted from the analysis, you must bring these variables into the model and thereby make distributional assumptions about them. You can do this by mentioning their variances in the MODEL command.

Linda K. Muthen posted on Wednesday, May 07, 2008 - 5:52 pm

I would include the covariate in the LPA and also when the distal outcome is added. You would regress both the categorical latent variable and the distal outcome on poverty. By allowing the regression to vary across classes, you would capture the interaction you are interested in.

Kaigang Li posted on Tuesday, May 27, 2008 - 8:35 pm

Hello Linda,

Based on your answer to Nina Chien's question on May 07, could you please clarify how to mention the variances in the MODEL command? Should I compute the variances using other stats package and fix the variances in the MODEL command using @?

Thanks,

Kaigang

Linda K. Muthen posted on Wednesday, May 28, 2008 - 7:15 am

If the variable is y1, you refer to the variance as:

y1;

You should not fix the variances.

Justin Jager posted on Wednesday, July 14, 2010 - 5:57 pm

I performed a cross-sectional LCA using early adult factors as the indicators. The optimal number of classes was 4. I now want to see how these 4 classes vary on a specific set of adolescent factors. One such variable is high-school GPA.

It seems that there are two ways to examine such class differences. One is to include "C on gpa" in the %Overall% model statement. A second way is to include "Auxiliary = gpa(e)" in the Variable section of the code. Each approach appears to provide results that are consistent with the other. The relative risks (change in prob of class membership for an increase of 1 in gpa) calculated from the “C on gpa” approach are consistent with the gpa means provided for each class from the “Auxiliary” approach.

My questions are:
(1) Do these two approaches provide equivalent results (though in different metrics)? Am I comparing “apples to apples”?
(2) If they are not equivalent, in the "C on gpa" method does heterogeneity in gpa contribute to/influence actual class structure/assignment?

Linda K. Muthen posted on Thursday, July 15, 2010 - 8:00 am

1. AUXILIARY (e) and (r) are meant to be used as screening tools. Once the covariates are selected, they should be included in the model.
2. Yes.

Justin Jager posted on Thursday, July 15, 2010 - 5:42 pm

Dr. Muthen,

Thanks for the quick response. So, say my LCA has 5 "u" variables or indicators. Would a model with the 5 "u" variables in the class specific model statements (i.e., %C#1%, %C#2%, etc.) and with "C on gpa" yield the same class structure and probabilities as a model with the 5 "U" variables AND GPA in the class specific model statements? (With the obvious difference between the two models being that the former also provides changes in membership probability given changes in levels of GPA).

Here are the two versions in syntax form...

%Overall%
c on GPA;
%C#1%
[U1$1*1.435]; [U2$1*.045]; [U3$2*.317];
[U4$1*.664] [U5$1*1.765];

%C#2%
[U1$1*1.435]; [U2$1*.045]; [U3$2*.317];
[U4$1*.664] [U5$1*1.765];

VERSUS...

%Overall%

%C#1%
[U1$1*1.435]; [U2$1*.045]; [U3$2*.317];
[U4$1*.664] [U5$1*1.765]; [GPA];

%C#2%
[U1$1*1.435]; [U2$1*.045]; [U3$2*.317];
[U4$1*.664] [U5$1*1.765]; [GPA];

Linda K. Muthen posted on Friday, July 16, 2010 - 9:28 am

I think these are equivalent parametrizations where conditioned on a categorical latent variable the u's and x are independent. I think you will obtain the same loglikelihood for each model.

Jerry Cochran posted on Friday, December 03, 2010 - 7:53 pm

Hi Dr. Muthen,

I posted a couple of days ago (on another thread) about changing reference classes in a 3 class lca model in order to be able to have my desired ORs reported in the output. Your recommendation was to use the ending values as starting values for the classes in a new analysis. This works very well, thank you.

The challenge I am having now is when I add my three covariates to the model the classes return to the order of when I am not using the ending values as starting values. It is like the covariates nullify the command to switch the order.

Is it possible to reorder the classes with the ending values and to simultaneously add covariates? If so, is there additional syntax I am missing?

Thank you for your help.

Linda K. Muthen posted on Sunday, December 05, 2010 - 11:23 am

The starting values from the model without covariates may not be correct for the model with covariates. There may be direct effects needed between the covariates and the latent class indicators that make the classes change when covariates are added. See the following paper which is available on the website for a discussion of this:

Muthén, B. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (ed.), Handbook of quantitative methodology for the social sciences (pp. 345-368). Newbury Park, CA: Sage Publications.

Elien De Caluwé posted on Wednesday, April 20, 2011 - 7:01 am

Dear,
I'm doing LCA (what works great), but now I want to include two covariates (age and sex).

This is my syntax:
...
VARIABLE: NAMES ARE idnr
it1 it2 it3 it4 it5 it6 it7 it8 age sex;
USEVARIABLES ARE it1 it2 it3 it4 it5 it6 it7 it8 age sex;
CLASSES = c (2);
CATEGORICAL = it1 it2 it3 it4 it5 it6 it7 it8 sex;
ANALYSIS: TYPE = MIXTURE;
ALGORITHM = INTEGRATION;
MODEL:
%OVERALL%
c ON age sex;
OUTPUT: SAMPSTAT STANDARDIZED MODINDICES;
SAVEDATA: FILE IS prob2class_710mergeAGESEXok.dat;
SAVE IS CPROB;

- Is this the right way: c ON age sex;
to use age and sex as covariates? I don't think so because my fit indices change when I leave out "sex" in the category "CATEGORICAL". And I can't put "age" in "CATEGORICAL" because this is a nog a categorical variable.
So, my question is I can control for age and/or sex.

Thank you,
Elien

Linda K. Muthen posted on Wednesday, April 20, 2011 - 7:27 am

c ON age sex; is correct. The CATEGORICAL list is for dependent variables only. You should not put age and sex on this list.

Elien De Caluwé posted on Thursday, April 21, 2011 - 2:29 am

Thank you very much for you quick answer!
I suspect that I must use "c ON age" if I only want to control for age. And "c ON sex" if I only want to control for sex?

Thank you,
Elien

Linda K. Muthen posted on Thursday, April 21, 2011 - 5:50 am

Yes.

Elien De Caluwé posted on Thursday, April 21, 2011 - 6:18 am

Thank you very much!!!

Elien De Caluwé posted on Friday, April 22, 2011 - 1:03 am

Dear,

To come back on my previous question, is the following correct because age and sex are still seen as dependent continuous variables:
Number of dependent variables 10
Number of independent variables 0
Number of continuous latent variables 0
Number of categorical latent variables 1

Observed dependent variables

Continuous
AGE SEX

Binary and ordered categorical (ordinal)
IT1 IT2 IT3 IT4 IT5 IT6
IT7 IT8

Categorical latent variables
C

Elien De Caluwé posted on Friday, April 22, 2011 - 1:13 am

The example above was LCA with 1 class. The following is with 2 classes and now age and sex are seen as independent variables. I suspect this is correct?:
umber of groups 1
Number of observations 477

Number of dependent variables 8
Number of independent variables 2
Number of continuous latent variables 0
Number of categorical latent variables 1

Observed dependent variables

Binary and ordered categorical (ordinal)
IT1 IT2 IT3 IT4 IT5 IT6
IT7 IT8

Observed independent variables
AGE SEX

Categorical latent variables
C

Thank you!

Linda K. Muthen posted on Friday, April 22, 2011 - 6:00 am

Please send the two full outputs and your license number to support@statmodel.com. The information provided is not sufficient to answer your question.

Elien De Caluwé posted on Friday, April 29, 2011 - 5:30 am

Dear,
When I control for age and sex (c ON age sex;), I put age and sex in USEVARIABLES. My question is: When I control for only 1 variable, do I have to put the other one also in USEVARIABLES? E.g. when I only control for sex (which is of course in USEVARIABLES), do I also have to put age in USEVARIABLES?

Kind regards,
Elien

Linda K. Muthen posted on Friday, April 29, 2011 - 6:12 am

Only the analysis variables should be included on the USEVARIABLES list.

Elien De Caluwé posted on Friday, April 29, 2011 - 8:01 am

Thank you very much for your answer!

William Arguelles posted on Friday, March 02, 2012 - 9:48 am

Hi,

I am running an LCA using covariates by regressing class membership on these variables. I would also like to examine class differences on an outcome (i.e., using the auxiliary function), but I also wanted to control for covariates at this step. Is this possible to do? I'm assuming that including the covariates in the creation of the classes is not the same as controlling for them when examining how those classes relate to an outcome. I haven't seen any examples of this and wanted to know if you can help.

Thanks!
William

Linda K. Muthen posted on Saturday, March 03, 2012 - 8:47 am

The AUXILIARY (e) option is used for screening purposes, not for model estimation. You should include the distal outcome on the USEVARIABLES list. To control for the covariates, regress the distal outcome on the covariates. The effect of the distal outcome is seen in the varying of the intercepts of the distal outcome across classes.

William Arguelles posted on Thursday, March 08, 2012 - 10:43 am

Thank you for your help! Would you by any chance know of a good LPA/LCA reference that used a distal outcome and covariates in this manner? I can't seem to find any on your website. I am mostly interested in seeing how the results should be presented in general and interpreted.

Linda K. Muthen posted on Friday, March 09, 2012 - 3:47 pm

See the following paper which is available on the website:

Muthén, B., Brown, C.H., Masyn, K., Jo, B., Khoo, S.T., Yang, C.C., Wang, C.P., Kellam, S., Carlin, J., & Liao, J. (2002). General growth mixture modeling for randomized preventive interventions. Biostatistics, 3, 459-475.

See also papers by Hanno Petras.

William Arguelles posted on Wednesday, March 21, 2012 - 9:35 am

Thanks! I have another question I was hoping you could help me with. Is there a way to specify my models so that when I run various analyses (i.e., with different covariates, outcomes, etc.) the classes are extracted in the same order, so that I can more easily make comparisons between different models?

Bengt O. Muthen posted on Wednesday, March 21, 2012 - 9:01 pm

You can give key starting values.

davide morselli posted on Friday, September 21, 2012 - 2:44 am

Hi,
I am a bit confused. In this thread Linda said that the "auxiliary" function is to be used to exploratory purposes only. However I understood that Asparouhov & Bengt Muthen (2012) and the relative ppt presentation it is meant to use "auxiliary" for substantive purposes. Could some please clarify?

thank you

Davde Morselli

Linda K. Muthen posted on Friday, September 21, 2012 - 6:10 am

These are the new functions of AUXILIARY that will be in Version 7 not the current functions.

davide morselli posted on Friday, September 21, 2012 - 4:06 pm

Thank you linda.
So I cannot use the (r) specification for hypothesis testing? Even if the entropy is quite high (>8)?

Davide

Linda K. Muthen posted on Saturday, September 22, 2012 - 3:34 pm

The (r) specification is an ad hoc approach. You should use the new 3-step approach for hypothesis testing.

davide morselli posted on Sunday, September 23, 2012 - 1:55 pm

Thank you again linda
Given mplus 7 has not been released yet, how can I do it? The muthen and clark paper showed only (e) and (r)..

Davide

davide morselli posted on Sunday, September 23, 2012 - 2:01 pm

I mean: can I simply use the most likely class as nominal dv or i should weight it fo it its probability compared to the others probability classes? in this case is there a smart way to do it with mplus (except r3step option)?

Linda K. Muthen posted on Sunday, September 23, 2012 - 6:08 pm

You should wait for R3STEP or do it by hand according to the instructions given in the handout where R2STEP is described.

anonymous posted on Tuesday, March 12, 2013 - 9:10 am

I'm running a LPA with three covariates. The covariates appear to not significantly differ across classes, but when I enter the covariates in the model using the 1-step method and look at a plot of the estimated probability as a function of covariate #1, it looks like this differs based on gender such that covariate #1 appears to differ across classes for females but not for males. Is there a way to empirically demonstrate whether two covariates interact in predicting the classes? Would it be appropriate to add an interaction term as a covariate or to replace gender and covariate #1 with the interaction term? Or would it be more appropriate to run this as a multi-group LPA, with males and females in separate groups?

Bengt O. Muthen posted on Tuesday, March 12, 2013 - 6:45 pm

Just create a product term between gender and covariate #1 and include all 3 variables in the prediction of c.

ian jantz posted on Thursday, May 09, 2013 - 7:32 am

Hi,
I selected a 4-class model based on 10 indicator variables. When I introduced covariates, it seems as if a substantial number of individuals were reclassified. As such, class prevalences in the model without covariates differed from the model one with covariates. Is there a resource which provides some guidance for conducting measurement non-invariance investigations of indicator variables? The first posts in this thread (October 25 through November 4, 2004 from J.W. and professors Muthen) were very helpful. But, I have some basic questions about diagnosing measurement non-invariance, what counts as substantial reclassification, the steps to isolate the indicator variables and covariates responsible for the non-invariance, and some potential solutions to addressing the issue. Any guidance is much appreciated.

Linda K. Muthen posted on Thursday, May 09, 2013 - 11:32 am

Have you read the following paper which is available on the website:

Muthén, B. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (ed.), Handbook of quantitative methodology for the social sciences (pp. 345-368). Newbury Park, CA: Sage Publications.

ian jantz posted on Friday, May 10, 2013 - 1:50 pm

Hi Dr. Muthen,
I read Muthen (2004) and it was very useful. I guess one basic question I have is what counts as substantial reclassification. When I introduce one combination of covariates, class prevalence changes very little, say, 2 or 3%. However, when introduce all covariates of interest,prevalence for one class changes by about 12%. Thanks so much.

Linda K. Muthen posted on Friday, May 10, 2013 - 5:58 pm

This is really a substantive decision. Besides looking at the percent changes in the classes, you should look at the individual changes in posterior probabilities.

anonymous posted on Sunday, February 23, 2014 - 8:33 pm

In response to the following user question "ONE OR MORE MULTINOMIAL LOGIT PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. SINGULARITY IS MOST LIKELY BECAUSE THE MODEL IS NOT IDENTIFIED, OR BECAUSE OF EMPTY CELLS IN THE JOINT DISTRIBUTION OF CATEGORICAL LATENT VARIABLES AND ANY INDEPENDENT VARIABLES. FOLLOWING PARAMETERS WERE FIXED: 90 93 ....

Am I simply getting this warning to alert me that there are no BLK in Class 4 nor BLK/HISP in Class 5? If so,is this a warning I can ignore?"

Bengt responded "yes and yes" on December 14, 2007 (see above).

I have a follow-up question. I am in a similar situation as that user, I have one class with no males (gender was entered as a covariate); however, this makes sense based on the characteristics of the class so I haven't been concerned. Is there a way to still calculate and report inferential statistics for gender, though?
Thanks!

Linda K. Muthen posted on Monday, February 24, 2014 - 10:32 am

Please send the output and your license number to support@statmodel.com. It depends which parameters are fixed and why. I would need to see to be sure.

If you have not variability of a covariate in a class, you cannot estimate the regression coefficient.

emilie.ferrat posted on Tuesday, February 25, 2014 - 6:59 am

good afternoon

I have no received response to my answer, I think ?

1/ what is the differences between actives covariates and inactive covariates ?
2/ Is it possible to use active covariates, then estimate posterior membership and assess OR associated with classes (the dependent variables) and the covariates (the same as the active which contribute to define the classes)(X variables) ? or is it an error and it is better to use posterior probbaility of the model without active covariates ?

3/ When is it better to use active or inactive covariates ?

thanks very much
E

Bengt O. Muthen posted on Tuesday, February 25, 2014 - 12:16 pm

I don't know what active and inactive covariates is.

emilie.ferrat posted on Wednesday, February 26, 2014 - 12:05 am

Dr Prof Muthen,

I have seen active and inactive used in LGold. When covariates are inactive, they did not affect size, or estimates of the classes as they are not included into the model; however, when they are active, estimates for indicators may differ. So I did not know if covariates should be used are active or inactives ...
I have this problem because I think that my covariates are associated with indicators which could explain why the definition of classes differ...

Scuse me for my bas english
best regards
Emilie

Jon Heron posted on Wednesday, February 26, 2014 - 9:22 am

I've always felt that LG's inactive-covariates approach was the same as Mplus' Pseudo-Class draws (auxiliary with "(r)").

You may not get perfect agreement across the packages though because LG does something to deal with covariate missingness.

Bengt O. Muthen posted on Wednesday, February 26, 2014 - 4:13 pm

Adding to Jon's comments, it sounds to me like the inactive covariates would be best handled by the Mplus 3-step method done by the Auxiliary R3STEP option. But if you suspect direct effects from some covariates to some latent class indicators you would want to take a regular (1-step) approach where all covariates are "active".

anonymous posted on Friday, February 28, 2014 - 12:01 pm

As a follow-up to my February 23rd question above, if a parameter is fixed in say a four class model but was not in the three class model, can Likelihood Ratio Tests such as the Lo-Mendell-Rubin Test (tech 11) still be interpreted?

Bengt O. Muthen posted on Friday, February 28, 2014 - 1:57 pm

No.

emilie.ferrat posted on Saturday, March 01, 2014 - 1:14 am

dear Prof Muthen,

thanks for your previous answer.
I think covariates are associated with indicators and they have a direct effects on estimates as well as number of classes. However, can I use the classification of indicators obtained after inclusion of covariates than using posterior mebership (as ine the three step) assess associations (Or° BETWEEN the classes and other variables such as mortality or hospitalisation or other characteristics ?

or it is an error and we has to use only the classification obaitned without the covariates ?

Bengt O. Muthen posted on Monday, March 03, 2014 - 11:52 am

You can use a model where covariates have both an effect on the latent class variable and some (but not all) of the latent class indicators. If you have strong direct effects, 3-step methods are not suitable - we describe this in our Oct 28 3-step paper on our website.

MJKim posted on Friday, March 07, 2014 - 11:09 am

Dear Drs. Muthen,

I have a question for running a regression mixture model, which is presented as Example7.1. In the example, although the correlation between X1 and X2 are included in the figure, "X1 with X2" statement is not included in the Model statement. When I included the statement, I didn't get the result because the program told me that I had to add ALGORITHM=INTEGRATION;. After adding it to the Analysis option, I could run the model, but the results were different from what I got using the original statement (as shown in the example).
1. Which one should be used if I want to know the relationship between X1 and XZ?
2. If I need to add "X1 with X2", is the statement not corresponding to the presented figure? Thank you so much for your help.

Linda K. Muthen posted on Friday, March 07, 2014 - 3:48 pm

In regression, the model is estimated conditioned on the observed exogenous variables. Their means, variances, and covariances are not model parameters. We show the covariance between x1 and x2 because it is not zero during model estimation. If you want to know its value, ask for SAMPSTAT or TYPE=BASIC.

emilie.ferrat posted on Thursday, March 20, 2014 - 3:32 pm

dear Prof muthen

I have a problem in LCA analysis regarding health profiles in patients with a specific disease, and with concomittent variables.

1/ in the model without covariates --> BIC and BLRT indicate to choose the 3-class solution
2/ including concomittent variates (age, sex, tumor site) -> BIC says 3-class is better but BLRT is better with 4 classes (and appears more realistic)

moreover, prevalence of classes is different with and without covariates

I have noted that concomittent variables were strongly associated with indicators (all p<0.001)

so What is it better and how to deal with this problem ?

- to keep the 4-class solution with covariates ?
- to keep the 3-class solution with covariates ?
- to keep the 3-class without covariates ?
- or to change covariates as indicators (age, sex and tumor site as indicators ?)

Moreover, if I want realize the 3-step method to assess associations between classes and other variables (not included into the model) such as healtcare utilization: how to do ??? use the classification of indicators obtained with covariates ? or without covariates ??

Thanks very much for the answers you could provide to these questions...

best regards
Emilie

Bengt O. Muthen posted on Friday, March 21, 2014 - 8:37 am

I would go with BIC for simplicity.

It is not necessarily a problem that the concomittent variables (I call them covariates) are strongly related to the latent indicators - if the covariates influence the latent class variable, that implies that the two sets of variables are related.

The easy way out to handle covariates is to use the new 3-step approach of R3STEP described in our Web Note 15 (or, DCAT/DCON for distal outcomes). But if you really want to understand the class prevalence differences you mention you want to explore direct effects from covariates to latent class indicators.

emilie.ferrat posted on Friday, March 21, 2014 - 9:38 am

thank you prof Muthen
but I did not understant the last sentence

if I want to assess OR between classes and other variables (different from covariates: e;e.: institutionalisation): I need to choose classification obtained with covariates (i.e; age, sex) or classification obtained without covariates

thank you

emilie.ferrat posted on Sunday, March 23, 2014 - 11:42 am

Scuse me Prof Muthen

how to state if there is a relationship between the covariate and latent class ?
- with the P value between covariate and LC ?
- because the add of covariates change estimates ? (but from when we can assert it ?)

Bengt O. Muthen posted on Sunday, March 23, 2014 - 12:07 pm

The answers are in Web Note 15 on our website:

Asparouhov & Muthén (2013). Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus. Accepted for publication in Structural Equation Modeling. An earlier version of this paper is posted as web note 15. Appendices with Mplus scripts are available here.

emilie.ferrat posted on Thursday, March 27, 2014 - 12:37 pm

Good evening

I have read this article but (difficulties in english) It is not clear for me to how assess strong or no effects of covariables on LC or relationship between LC and covariates (p value ? other ?). It is important because I Have read that in case of strong effects, one-step is a better approach than 3-step as you have also answered in a previous question.
Furthermore, it seems important for my research because without covariates BLRT and AIC3 both state 3-class , whereas with active covaraites BLRT and AIC3 both state 4 classes... so interpretation is not the same. (BIC is always 3 classes, but Nylund showed that BLRT performed the best)

thank you very much
best regards
Emilie (from France)

Bengt O. Muthen posted on Thursday, March 27, 2014 - 2:48 pm

To check for direct effects from covariates to latent class indicators, you include in your model a regression of one indicator at a time on all the covariates and check for significant effects. You then include all such significant effects when you assess BIC for that number of classes.

CB posted on Monday, October 20, 2014 - 1:34 pm

I'm conducting LCA with 4 indicators variables, but I want to include a binary exogenous variable to only have a direct effect on an indicator variable (with 3 categories). I have tried coding it as indicator (U) on exogenous variable (X):

U on X

but the output reports the error that a nominal variable may not appear on the right-hand side of an ON statement. (Currently, I have my binary exogenous/independent variable as a nominal variable).

My questions are:
1) Why can a nominal variable not appear on the right-hand side of an ON statement?
2) How else can I code the exogenous variable to have an effect on my indicator?

Linda K. Muthen posted on Monday, October 20, 2014 - 1:56 pm

Covariates must be binary or continuous. You need to create a set of dummy variables for your nominal variable. It should not be put on the NOMINAL list. This is for dependent variables. In regression, covariates are treated as continuous variables.

CB posted on Monday, October 20, 2014 - 7:34 pm

Thanks for your quick response!

I have an additional follow-up question. I've read that CATEGORICAL includes binary and polytomous indicators, but in the Mplus Users Guide, it says that CATEGORICAL includes binary and ordered categorical indicators and NOMINAL includes polytomous indicators. So I'm confused how polytomous indicators are coded and how both binary and polytomous indicators can be coded to conduct LCA.

With that said, I'm conducting LCA with 4 indicators - 2 binary (X1 and X2) and 2 categorical indicators with 3-levels each (X3 and X4). What would the appropriate code for this?

Linda K. Muthen posted on Tuesday, October 21, 2014 - 11:03 am

CATEGORICAL is for binary and ordered-categorical or ordered polytomous variables. NOMINAL is for binary or unordered-polytomous variables.

Daniel Lee posted on Friday, April 24, 2015 - 8:46 pm

Hi Dr. Muthen,

I obtained odds ratios in my conditional growth mixture (looking at between-class effects). However, I could not find significance tests (p-value, confidence intervals) for these odds ratio. Is there a command I should type in the input to obtain these significance test?

Thank you so much, as always!

Bengt O. Muthen posted on Saturday, April 25, 2015 - 1:07 pm

If they are not printed, you have to compute these. See the 2 FAQs on our website:

Odds ratio confidence interval from logOR estimate and SE

Odds ratio interpretation with a nominal DV in multinomial logistic regression

Brian C posted on Friday, January 29, 2016 - 2:27 pm

Hi Drs. Muthen,

In published papers I have read on LCA, usually the researchers identify the "best" fitting class at the model building stage, by starting with 1 class and up with just the indicators (without covariates) and comparing the BIC, etc. Then the class prevalence and posterior probabilities are presented and interpreted.

After the best class solution is identified (e.g., 3-class) I would see an analysis of association between the classes and covariates (e.g., logistic/multinomial logistic regression), which I understand can be done via the auxiliary option or in a single-step approach.

But what is unclear to me is that when the researchers present association analysis, they don’t address the fact that the 3 classes are not necessarily the same 3 classes identified in the model-building stage (when no covariates were considered). Can I assume that that is because they used the auxiliary option (but which one, R?) such that the analysis of association between classes and covariates does NOT change the unconditional 3-class model?

Conversely, if the single-step approach is used, does that mean the covariates need to be included at the model-building stage (i.e., from 1 class up)? I haven’t see this done as all LCA papers I have seen simply start with the indicators, and only after the best solution is identified would they venture into covariates.

Thanks!

Bengt O. Muthen posted on Friday, January 29, 2016 - 6:08 pm

That's a long story. Part of it is described in the paper on our website:

Asparouhov, T. & Muthén, B. (2014). Auxiliary variables in mixture modeling: Three-step approaches using Mplus. Structural Equation Modeling: A Multidisciplinary Journal, 21:3, 329-341. The posted version corrects several typos in the published version. An earlier version of this paper was posted as web note 15. Download appendices with Mplus scripts.

You are right that a 3-step method (such as using R3STEP) can overlook matters that change the class formation when including covariates in the model (one-step). This includes cases where there are some direct effects of covariates on some of the LCA indicators - which is a case of measurement noninvariance that would seem to be quite common. Unfortunately, this is often overlooked.

Brian C posted on Sunday, January 31, 2016 - 12:08 pm

Thanks, Dr. Muthen.

Kristine Amlund Hagen posted on Wednesday, February 03, 2016 - 1:40 am

I am running a series of mixture models in several steps, to build a gmm of social skills over 5 time-Points With the following recommended steps: a)Testing an unconditional single-class model, b) Specifying a (LCGA) model comparing 2, 3, and 4 class solutions (comparing BIC, entropy, theoretical considerations, etc.), c) testing a conditional LCGM with the best-fitting model from step 2,adding our hypothesized covariates. d) specifying a conditional growth mixture model (GMM), entering the covariates, e) testing whether the covariates have different effects on the growth factors within each class and f)finally, testing whether parameter estimates are replicated using the OPTSEED.

Q1: How do I determine whether the model in e) is better or worse than the one in d)? Do I look at significance of the class-specific estimates of the covariates? Or the BIC, entropy etc. ? or all of these combined?

In the modification index, I get (among other many other modification suggestions):

Class 1
Means/Intercepts/Thresholds

[ INSSRS4 ] 30.381 -4.866 -4.866 -0.351
[ INSSRS5 ] 23.759 -5.291 -5.291 -0.390
[ INSSRS6 ] 112.658 19.717 19.717 1.432

Q2: should I and how do I change my input to allow for these modification suggestions in my model?

Kristine Amlund Hagen posted on Wednesday, February 03, 2016 - 1:43 am

Hi again Drs. Muthen,
to follow up on my previous questions:

I have found that a 3-class solution works best according to the various indices.
In the output under MODEL RESULTS, I get, for Class 1, that
I ON x is -0.96 (p<0.01) and S ON x is 0.08 (p = 0.6).

Q3: Does this mean that the greater the the starting-out score for Class 1, the lower the score on x? And that the greater the score on x, the more upward the slope goes (almost sig.)?

Q4: When I look at the plots for my 3 classes, they have clearly different trajectories, one starting high, going down, one starting in the middle and staying stable, and one starting low and going up. When I look at the Intercepts for the slope factors for the 3 classes, none of them reach significance. For class 2, it makes sense, but not for the other two. I think the ‘problem’ is that all the SEs are big.

Q5. Does that indicate that the individuals in each of these classes are too dissimilar in their slopes? And how can I can remedy this situation?

Thank you!

Bengt O. Muthen posted on Thursday, February 04, 2016 - 7:02 pm

Please don't double post.

Q1. Use BIC

Q2.No, we have found that Modindices are not quite reliable for mixture models.

Q3. Q1 No, you have the causality reversed.
Q2. Yes.

Q4. This significance is not needed.

Q5. This is not a problem.

Ali posted on Monday, February 15, 2016 - 4:53 am

I am using LCA model with 4 nominal indicators including covariates (Country and gender. In the output,I want to see if the country or gender can predict significantly the latent class membership.

So, I took a look
LOGISTIC REGRESSION ODDS RATIO RESULTS
Categorical Latent Variables
C#1 ON
CNT 0.976
GENDER 0.295
C#2 ON
CNT 1.172
GENDER 0.316
But, it seems no significance test to know if the covariates are significant. Or, should I look another part?

Linda K. Muthen posted on Monday, February 15, 2016 - 6:39 am

Please send the output and your license number to support@statmdoel.com.

Meghan Schreck posted on Saturday, April 16, 2016 - 4:27 am

Hello,

If I have two covariates in my LCA model and one is categorical and the other is continuous, do I need to standardize the continuous covariate if I want to calculate the class probabilities by hand or Model Constraint (e.g., logit= intercept + b1(covariate1) + b2(covariate2); exp(logit)/sum)??

Thank you!

Meghan Schreck posted on Saturday, April 16, 2016 - 4:38 am

I'd like to investigate whether a covariate predicts latent class prevalence by gender, but not latent class structure.

I included gender as a grouping variable and fixed item response probabilities across gender. Next, I:

1) compared a model where class prevalences were freely estimated across gender to a model where they were fixed across gender. Both models had the covariate predicting class (c on covariate). Does this model comparison accurately test whether the covariate is a better predictor when class prevalences are free vs. fixed across gender?

2) I compared the best fitting model from step 1 to a model without the covariate to see if the covariate was a significant predator of class membership.

Do these comparisons answer my question if my covariate predicts class membership by gender?

Bengt O. Muthen posted on Sunday, April 17, 2016 - 4:09 pm

First post: No.

Second post: I am not quite sure what you are doing. You say "fixed" but I wonder if you really fix or just hold equal.

The class prevalence is different across gender if you regress your latent class variable (c) on gender (cg). The item probabilities are different across gender if you let them vary across cg classes.

Jin Qu posted on Wednesday, August 10, 2016 - 11:52 am

Dr. Muthen, I am conducting a LPA analysis and I have obtained 4 classes. Now I included an interaction term (AxB, both continuous variables) as a predictor of class membership. The results showed that this interaction term is significant in predicting the likelihood of being in class 2 in comparison to being in class 1. Now I wonder how can I probe this interaction further? Can I do it in Mplus?

Bengt O. Muthen posted on Wednesday, August 10, 2016 - 2:16 pm

Look at our page

http://www.statmodel.com/Mediation.shtml

under the heading

Moderated mediation plot based on User's Guide ex 3.18

This shows that you can use the LOOP and PLOT options to plot the interaction and its confidence intervals. Although this is for linear regression, the same principles apply for a nominal DV.

This plotting is further described in our new book:

http://www.statmodel.com/Mplus_Book.shtml

Jin Qu posted on Wednesday, August 10, 2016 - 7:24 pm

Dr. Muthen,
Thanks for your quick reply. I am a bit confused by the model described in this link: http://www.statmodel.com/Mediation.shtml.

From my understanding. the example in 3.18 describes a model in which m is mediating the link between x and y; z moderates the link between x and m (y on m; m on x z xz).

The model in this Addendum is:
y on m xz z;
m on z xz x;

Is this still the same model as 3.18?

Also, my I understanding correctly that the analysis I want to pursue is a multinominal regression, so I should get the variable of membership (1,2,3,4) out from LPA analysis, and use this new variable as my depedent variable, rather than running LPA and probing interaction at the same time?

Bengt O. Muthen posted on Thursday, August 11, 2016 - 1:59 pm

Your model is not the same as 3.18 but the ideas for exploring interactions are the same. You can probe interactions in a single analysis.

Jin Qu posted on Friday, August 12, 2016 - 4:10 pm

Dr. Muthen,

Thanks for your reply. I am able to obtain the plot for interactions. However, when I added in the Bootstrap command to see whether the slopes in the interaction graph are significant or not, I received an error message that says "BOOTSTRAP is not available for estimators MLM, MLMV, MLF and MLR." Would you mind taking a look at my codes (c is the variable that I obtained from LPA using "cprobabilities." c has 4 classes. I want to use mRS, mSC and mRSxmSC to predict c)?

nominal is c;
define:
center mRS_6 mSC_6 (grand mean);
mSCpRS = mRS_6*mSC_6;

analysis:
bootstrap=500;

Model:

c#1 on mRS_6 (b1)
mSC_6 (b2)
mSCpRS (b3);

c#2-c#3 on mRS_6 mSC_6 mSCpRS;

MODEL CONSTRAINT:
PLOT(lowSC highSC);
LOOP(mRS_6,-2,2,0.5);

lowSC = (b1+b3*(-2.26))*mRS_6+b2*(-2.26);
highSC = (b1+b3*(2.26))*mRS_6+b2*(2.26);

Plot:
TYPE = PLOT2;

output: CINTERVAL(Bootstrap);

I assume that in this input, I am testing the significance (Confidence intervals) of the interaction to predict class 1 vs. 4?

Linda K. Muthen posted on Saturday, August 13, 2016 - 6:48 am

You should use ML with bootstrap. The other estimators have their own particular standard errors. ML withe bootstrap with give ML parameter estimates and bootstrap standard errors.

The reference class is 4. The confidence intervals are for each parameter estimate.

'Alim Beveridge posted on Friday, November 18, 2016 - 8:46 am

Dear Linda and Bengt,

I have a few basic questions about LCA:

1. What is the difference between regressing a latent class variable c on an independent variable x1 (as in example 7.1 in the UG), and treating x1 as a latent class predictor (using AUXILIARY and R3STEP)?
2. What is the difference between treating a binary variable g (indicating sex for instance) as a categorical latent variable which has known class (group) membership, using KNOWNCLASS (as in example 7.21 in the UG) and treating it as a latent class predictor (using AUXILIARY and R3STEP)?
3. Must COUNT variables have only positive integer values?
4. What criteria or cutoff should one use to decide whether a COUNT variable should be treated as zero-inflated?
5. What criteria should one use to decide whether a variable should be treated as truncated? (e.g., should a percentage be treated as truncated?)
6. Can the differences in parameter estimates (means, probabilities) across classes be tested to see if the difference is statistically significant?

Thanks,
'Alim

Bengt O. Muthen posted on Friday, November 18, 2016 - 2:17 pm

Mplus Discussion is not really the place to learn about basic LCA but I will give some quick answers:

1. No difference if you have only 1 x. But if you have more x's, the difference is that c on x does not assume that the x's are uncorrelated. But as indicators, LCA assumes that the x's are uncorrelated within class.

2. None unless the predictor also has a direct effect on the LCA indidcators.

3. Positive or zero.

4. BIC

5. Usually by having a strong floor or ceiling effect, say 25% or more.

6. Yes, for instance using Model Constraint.

'Alim Beveridge posted on Monday, November 21, 2016 - 3:58 am

Thank you, Bengt.
I have rerun my LCA, recasting 4 variables that had a floor effect of 25% or more as censored from below. This has led to a lower BIC, AIC, and aBIC.
I tried also declaring all 4 censored variables as inflated. For the 2-class solution this also produces lower indices, but for the 3-class solution they are almost identical.
In the 3-class solution, 2 means for the inflated variable were set at -15, one is not not significant, and one is significant (this is the same within all 3 classes). Can I conclude from this if it is better to treat all 4 (or some) of these censored variables as inflated or not in the 3-class solution?

Thanks!

Bengt O. Muthen posted on Monday, November 21, 2016 - 4:50 pm

Hard to say. I would probably not use inflation here given your results.

'Alim Beveridge posted on Thursday, November 24, 2016 - 8:00 pm

Dear Bengt and Linda,

I have an ordinal variable (a measure of organization size) which I wish to include as a control variable in my LCA, as oneway ANOVAs have shown that there are significant differences in most of the indicators across the levels of this variable.
1) Is the correct approach to simply include it as another indicator or should it be declared as auxiliary?
2) When including it as an indicator, I have tried both the CATEGORICAL and NOMINAL types (in case the effect of size is not monotonic). I planned to use BIC to decide which type to use, but both produced the same indices (BIC, AIC, etc.). Is this always the case? Or does this mean that in this particular case the results are unaffected by which type I use for this variable?

Thanks and happy Thanksgiving!

Guiyun Hou posted on Friday, November 25, 2016 - 3:19 am

Dear Bengt and Linda,
I have some problems in analysing LTA. I have 3 times, and 20 items per time, all of the items are continous data. When I run the MPLUS ,it can not output the BIC, and it gives me some warning as follows, accoring to the warning, I set the START as 500 50, it also can not work. but when I reduce the number of item to 16, the value of AIC and BIC may occur. Looking forward to your reply.

syntax :
TITLE: LTA Model
DATA:FILE is 99.dat;
VARIABLE:
NAMES ARE a1-a20 b1-b20 c1-c20;
CATEGORICAL = ;
CLASSES = L1(3) L2(3) L3(3);
USEVAR =a1-a20 b1-b20 c1-c20;
ANALYSIS:
TYPE=MIXTURE;
MODEL:
MODEL L1:
%L1#1%
[a1- a20] (1-20);
%L1#2%
[a1- a20] (21-40);
%L1#3%
[a1- a20] (41-60);
MODEL L2:
%L2#1%
[b1- b20] (1-20);
%L2#2%
[b1- b20] (21-40);
%L2#3%
[b1- b20] (41-60);
MODEL L3:
%L3#1%
[c1- c20] (1-20);
%L3#2%
[c1- c20] (21-40);
%L3#3%
[c1- c20] (41-60);
OUTPUT:
TECH1 TECH8;
SAVEDATA:
FILE is 99out.dat;
SAVE = cprob;

WARNING: THE BEST LOGLIKELIHOOD VALUE WAS NOT REPLICATED. THE SOLUTION MAY NOT BE TRUSTWORTHY DUE TO LOCAL MAXIMA. INCREASE THE NUMBER OF RANDOM STARTS.
STARTING VALUES. PROBLEM INVOLVING PARAMETER 77.

Bengt O. Muthen posted on Friday, November 25, 2016 - 10:54 am

Hard to say without seeing the full output - you can send it to Support along with your license number.

I assume you have done an LCA for each time point and had no problems with the 20 items.

Bengt O. Muthen posted on Friday, November 25, 2016 - 11:01 am

Answer to Alim:

1) I see a control variable as a covariate that influences the latent class variable (and perhaps some indicators directly).

2) Categorical and Nominal typically produce different number of parameters (unless it is for a binary outcome) in which case BICs would not be the same. I can't say what's going on in your case without seeing the full output.

Guiyun Hou posted on Friday, November 25, 2016 - 7:13 pm

Dear Bengt and Linda,
my output is as follows, thanks for your help.
WARNING: WHEN ESTIMATING A MODEL WITH MORE THAN TWO CLASSES, IT MAY BE
NECESSARY TO INCREASE THE NUMBER OF RANDOM STARTS USING THE STARTS OPTION
TO AVOID LOCAL MAXIMA.

WARNING: THE BEST LOGLIKELIHOOD VALUE WAS NOT REPLICATED. THE
SOLUTION MAY NOT BE TRUSTWORTHY DUE TO LOCAL MAXIMA. INCREASE THE
NUMBER OF RANDOM STARTS.

THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ILL-CONDITIONED
FISHER INFORMATION MATRIX. CHANGE YOUR MODEL AND/OR STARTING VALUES.

THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A NON-POSITIVE
DEFINITE FISHER INFORMATION MATRIX. THIS MAY BE DUE TO THE STARTING VALUES
BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION
NUMBER IS 0.260D-20.

THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE
COMPUTED. THIS IS OFTEN DUE TO THE STARTING VALUES BUT MAY ALSO BE
AN INDICATION OF MODEL NONIDENTIFICATION. CHANGE YOUR MODEL AND/OR
STARTING VALUES. PROBLEM INVOLVING PARAMETER 77.

Linda K. Muthen posted on Saturday, November 26, 2016 - 6:15 am

Please send the output and your license number to support@statmodel.com.

'Alim Beveridge posted on Saturday, November 26, 2016 - 10:28 am

Dear Bengt,

Thanks for your response. I believe I now understand how to proceed, but I want to double check I have understood correctly based on your responses in this forum and various articles on LCA.

1. Because I suspect the covariates have a direct effect on some indicators, I should use single-step regression (one-step method).

2. However, for class enumeration I should first only use the indicators. Once I have determined the number of classes, I should include covariates via single-step regression. Not only is this common practice, it is the best approach according to Nylund-Gibson & Masyn's (2016) recent MC simulation study.

3. Covariates must be binary or continuous. Whether I want to treat the categorical covariate 'size' as ordinal or nominal, I should create K-1 dummies for it.

4. Because I suspect a direct effect between my covariates and indicators, I should include in my model "a regression of one indicator at a time on all the covariates and check for significant effects" (you wrote this above). The final model should only retain significant effects.

Have I gotten it right?

Thanks for your help.

Bengt O. Muthen posted on Saturday, November 26, 2016 - 5:06 pm

Correct on all 5 - you get an A.

'Alim Beveridge posted on Monday, December 12, 2016 - 9:00 am

Dear Bengt,

I am trying to implement some of the models shown in Morin & Marsh (2015) with my data. According to the syntax provided with the article, when all indicators are regressed on a covariate, one must explicitly establish conditional independence by constraining all covariances between indicators to 0. I try to do that but get an error because some of my indicators are censored-inflated:
*** ERROR
One or more MODEL statements were ignored. These statements may be incorrect.

How can I set the covariances between censored-inflated indicators and other indicators to 0?

Thanks,
'Alim

Bengt O. Muthen posted on Monday, December 12, 2016 - 4:26 pm

I assume you use ML. I am surprised that you find covariances (WITH output) between the indicators (I assume you are talking about a factor model.) ML for censored does not allow WITH statement.

'Alim Beveridge posted on Tuesday, December 13, 2016 - 3:43 am

Dear Bengt,
this is a mixture model (TYPE=MIXTURE) with continuous and censored variables (I set them as censored-inflated) (aka latent profile analysis - LPA). In Morin & Marsh (2015) four models are shown. In all 4, a direct effect is added from a covariate (control) to all indicators, but not from that covariate to the latent class var (it's not meant to be a latent class predictor).
I'm trying to create the first model with my data. In the syntax provided (supplement) it indicates that because of the direct paths from the covariate to all indicators, one must explicitly establish conditional independence by constraining all covariances between indicators to 0. (by the way, is there a shortcut way to do this, since I have 20 indicators?)

When I tried I got the error I mentioned for all covariances between a continuous indicator and a censored-inflated var that I tried to set to 0:
*** ERROR
One or more MODEL statements were ignored. These statements may be incorrect.

I am wondering what to do. How can I tell Mplus I want it to assume conditional independence?

Morin, A. J. S., & Marsh, H. W. 2015. Disentangling shape from level effects in person-centered analyses: an illustration based on university teachers’ multidimensional profiles of effectiveness. Structural Equation Modeling, 22(1): 39–59.

Bengt O. Muthen posted on Tuesday, December 13, 2016 - 6:24 pm

Please first check if you get "WITH" estimates that show that you violate conditional independence. I don't see why that would occur. If you need to, send output to Support.

WEN Congcong posted on Wednesday, January 11, 2017 - 12:24 am

Dear professors,

Hello, I want to illustrate the local independence for latent profile models, I don¡¯t know if there is something wrong with my illustration.

The observed variable variance has three possible origins, the variance explained by the factor, the factor variance or the measurement error. If the observed variables want to covariate, only factor variance and measurement error should be considered because factors can explain the outcomes but not the inverse. No matter in what situations, in the traditional FA or LPA, the residuals have 0 covariances as it is a model assumption. In LPA, the model has no factors and can therefore be regarded as having 0 factor variance. Hence, with 0 factor variance and 0 residual covariances, the observed variables can not covariate.
If this illustration is correct, I think the residual covariance only include measurement error. But in the paper investigating population heterogeneity with factor mixture models, the residual variance includes both the factor variance and the measurement error.

Thank you for your light on this!

Bengt O. Muthen posted on Wednesday, January 11, 2017 - 4:35 pm

A set of variables is correlated if they are all influenced by the same latent class variable. The latent variable does not need to be continuous.

WEN Congcong posted on Wednesday, January 11, 2017 - 6:36 pm

But in the paper Performance of Factor Mixture Models as a Function of Model Size, Covariate Effects, and Class-Specific Parameters, it says ¡°The latent profile model can be represented as a special case of the factor mixture model where residual factor scores have 0 variance. As a result, the covariance matrix of observed variables Y conditional on class membership equals the covariance matrix of the residuals. Because residuals are assumed to have zero covariances (discussed earlier), the conditional covariance matrix of observed variables is diagonal, and observed variables are independent given class.¡±

According to my understanding, because the the conditional covariance matrix of observed variables is diagonal, the covariance matrix only includes the variables¡¯ variances (diagonal values), the covariances between observed variables are 0(off-diagonal values). The correlation coefficient is calculated based on the covariance, so they don¡¯t correlate.

I probably misunderstand the sentences, thanks for correcting me!

Bengt O. Muthen posted on Thursday, January 12, 2017 - 1:45 pm

You have to make a distinction between

1) covariance among observed variables

and

2) covariance among observed variables within class (conditional on class membership)

LPA or LCA with a zero variance factor has zero covariance of type 2) but non-zero covariance of type 1).

samah Zakaria Ahmed posted on Tuesday, February 21, 2017 - 3:47 pm

In latent class model, how can i define the type of covariates in commands such as ordinal or binary or continuous?

Bengt O. Muthen posted on Wednesday, February 22, 2017 - 12:18 pm

Covariates are typically not given a distribution form. Just like in regular regression, they are conditioned on, that is, their marginal distribution is not specified. They are treated as continuous variables in the estimation.

Morgan DeBusk-Lane posted on Thursday, June 15, 2017 - 1:19 pm

Drs. Muthen,

Thank you for your continued support and guidance!

I have completed the enumeration process with an unconditional model through LPA (3 indicators/predictors, no covariates) and arrived at a 3 class model in accordance with Nylund-Gibson & Masyn’s (2016) article.

Using R3STEP to examine the multinomial logistic regression of potential covariates (in this case gender, minority status, and grade level), I have found both minority status and grade level to be significant.

To incorporate them into the model to generate the most accurate and interpretable class membership, I have a couple questions:

1. Do I need to examine them through a 1-step process (regress each indicator/predictor upon the covariates) to determine significant direct effects upon my indicators/predictors?

1b. Why do Nylund-Gibson & Masyn suggest this same step-wise process to examine direct effects for K-1 also?

2. Does the influence of one covariate effect the inspection of another covariate during this stepwise one step process? Should I simply take one covariate and individually regress each indicator variable upon it and then take the other through the same process and only retain the significant regressions (direct effects)?

Thank you for your help!

Bengt O. Muthen posted on Thursday, June 15, 2017 - 6:04 pm

These general analysis strategy questions are suitable for SEMNET (or the authors).

Morgan DeBusk-Lane posted on Thursday, June 15, 2017 - 6:16 pm

My apologies, I'll do that. Perhaps I can ask a more specific question.

Upon individually assessing direct effects, one of my covariates is significant upon each indicator variable.

This same covariate is also measurement invariant through multi-group CFA on the same indicator variables through the scalar step. These findings are counterintuitive

Perhaps I am inputing the model command incorrectly.

To test for direct effects, I am regressing each indicator variable upon the covariates:

Model:
%OVERALL%
SE_ZMech ON Eth_C;

Whereby SE_ZMech is an indicator variable and Eth_C is a covariate. I would then do this same step for each indicator variable for each covariate.

Is this correct?

Thank you again.

Bengt O. Muthen posted on Friday, June 16, 2017 - 6:02 pm

Your multi-group run tested metric invariance, that is, invariant loadings, or scalar invariance, that is, invariant loading and intercepts. The direct effect tests invariant intercepts while holding loadings invariant. So the tests aren't exactly the same.

Bengt O. Muthen posted on Friday, June 16, 2017 - 6:02 pm

By the way, you can regress each indicator on all covariates (not just one covariate at a time).

Allison Schroeder posted on Thursday, September 21, 2017 - 10:09 am

I'm running a 5-class LCA using 9 binary indicators. I am also attempting to include several demographic covariates. I understand the simplest way to do this is using R3STEP. However, I found one recent paper applying LCA in which the authors examined the demographic composition of classes using DCAT (essentially treating the covariates as distal outcomes). Is this an appropriate use of DCAT?

Thanks so much for your help.

Bengt O. Muthen posted on Thursday, September 21, 2017 - 4:24 pm

I recommend R3STEP instead.

Macarena Larrain posted on Friday, July 06, 2018 - 3:06 am

Dear professors,
I ran an LCA with continuos covariates (predictors) using the manual 3 step approach.
I am not sure now about how to interpret the output. Are the estimates for the categorical latent variables in the Model Results section, the beta coefficients for the regression?
Similarly, in the Alternative Parameterizations section, are the estimates odds ratios or beta coefficients?

Thanks in advance!

Bengt O. Muthen posted on Friday, July 06, 2018 - 5:54 pm

The coefficients are multinomial logistic regression coefficients. See descriptions in our UG, Chapter 14 or in RMA book.

Macarena Larrain posted on Saturday, July 07, 2018 - 1:56 pm

Thank you very much, Dr. Muthen.

Macarena Larrain posted on Monday, July 09, 2018 - 2:50 am

Dear Dr. Muthen,
As a follow up of my previous question, I would like to know with what factors do I multiply the logistic regression coefficient to calculate the log odds for each class if I have continuos covariates. The examples in the UG are very helpful but they are based on binary covariates, so they use factors 1 or 0.
I really appreciate your help. Thanks!

Bengt O. Muthen posted on Monday, July 09, 2018 - 10:48 am

The log odds are for a 1-unit change in a continuous X. For a standardized X, this means a 1-standard deviation change in X. If X is on a very large scale (has an SD much larger than 1), you want to consider a larger change in X than 1. If X has smaller scale (has an SD much smaller than 1), you want to consider a smaller change in X than 1. In both cases, multiplying the log odds by the X SD gives you the result for a 1-SD unit change.