Mplus Discussion >> LCA classes as independent variables in regression model

Topics
Last Day
Last 3 Days
Last Week
Tree View


LCA classes as independent variables ...

Mplus Discussion > Latent Variable Mixture Modeling >

Message/Author

Anonymous posted on Tuesday, February 15, 2005 - 6:57 am

I am trying to fit an LCA model where I have 6 unordered, 3 category observed variables and 3 latent classes. I have the following questions:

1. the output gives me estimates for two of the three categories of each observed variable for each class. Are these the logit for the probability of cases within each class giving this response, relative to the reference category (where the refence category is the lowest numerical value of the observed nominal variable?).
2. Is it possible to regress a continuous (or binary) dependent variable on to the latent class variables to address the question of whether class membership is associated with some distal oucome? When I try this I get the following error message:

*** ERROR
The following MODEL statements are ignored:
* Statements in the OVERALL class:
FAVLIB ON C#1
*** ERROR
One or more MODEL statements were ignored. These statements may be
incorrect or are only supported by ALGORITHM=INTEGRATION.

3. Where can I find information about how to use start values to change the reference category in multinomial logistic regression?

thanks,

Patrick Sturgis

Linda K. Muthen posted on Tuesday, February 15, 2005 - 8:34 am

It sounds like you need to add ALGORITHM=INTEGRATION to the MODEL command for that statement.

With observed variables, the reference category is the category with the highest number. I think you would need to renumber the categories using DEFINE if you want to change the reference category. Start values are used to change the last class.

Anonymous posted on Tuesday, February 15, 2005 - 1:05 pm

Thanks Linda

I can't seem to get the algorithm=integration part to work. Below is my output with error messages:

TITLE: LCA OF DEL POLL KNOWLEDGE ITEMS

DATA: FILE IS c:\mplusfiles\LCA\MISINFO.CSV;

VARIABLE: NAMES ARE serialno quiz1 quiz2 quiz3 EURO1W1 EURO2W1
WAGE1W1 WAGE2W1 TAX1W1 TAX2W1 EURO1W2 EURO2W2 WAGE1W2 WAGE2W2
TAX1W2 TAX2W2 IT1 IT2 IT3 IT4
IT5 IT6 ITB1 ITB2 ITB3 ITB4 ITB5 ITB6 VTORY VLAB VLIB FAVTORY
FAVLAB FAVLIB LIBECON ;
USEVARIABLES ARE FAVLIB q1 q2 q3 q4;
CLASSES=C(4);
NOMINAL= q1 q2 q3 q4;

DEFINE: q1=3;
if (euro2w1 eq -9) then q1=2;
if (euro2w1 eq -1) then q1=1;
if (euro2w1 eq 0) then q1=4;
q2=3;
if (WAGE2W1 eq -9) then q2=2;
if (WAGE2W1 eq -1) then q2=1;
if (WAGE2W1 eq 0) then q2=4;
q3=3;
if (TAX1W1 eq -9) then q3=2;
if (TAX1W1 eq -1) then q3=1;
if (TAX1W1 eq 0) then q3=4;
q4=3;
if (TAX2W1 eq -9) then q4=2;
if (TAX2W1 eq -1) then q4=1;
if (TAX2W1 eq 0) then q4=4;

ANALYSIS: TYPE = MIXTURE;
starts=50 50;
algorithm = integration;

MODEL: %OVERALL%

FAVLIB ON C#1;

*** WARNING in Model command
All variables are uncorrelated with all other variables within class.
Check that this is what is intended.
*** ERROR
The following MODEL statements are ignored:
* Statements in the OVERALL class:
FAVLIB ON C#1

Also, when I have a logit of -15 for probability of response within a latent class, can I take this to indicate that noone gave this response in this class? Or does it indicate a problem with the model? thanks again,

Patrick

Linda K. Muthen posted on Tuesday, February 15, 2005 - 2:34 pm

I'm sorry. I didn't really look at your model, just the error message which wasn't accurate in this situation. You cannot regress an observed variable on a categorical latent variable. The means of the observed variable varying across classes gets at that parameter.

Anonymous posted on Wednesday, February 16, 2005 - 12:10 am

Linda

I suppose another way might be to save latent class membership variables and then use these as predictors in a subsequent model? Not as desirable as a single estimation but would get at the question directly. I'm not sure how to obtain the means of different observed variables across classes. If I specify continuous variables in the usevariables command, the output gives me means for these observed variables across classes but it also appears to change the parameters given for the nominal variables. Are the latter still logits for response given class membership? I will send output for the model in question.

Also, you didn't address my question regarding the meaning of logits of -15 and 15 with zero standard error in the LCA. Does this indicate a problem for the model? Thanks,

Patrick

Linda K. Muthen posted on Wednesday, February 16, 2005 - 7:51 am

Saving the latent class membership would not do what you want. In addition, this approach introduces estimation errors and the standard errors will be too small.

The output that you sent does what you want. Think of the regression of y ON c where y is a continuous variable and c is a categorical latent variable. This regression results in the means of y changing over the classes of c. So you don't specify it using an ON statement. You just allow the means to vary over c. Your output has the means varying over c. So you have what you want.

Values are fixed when they become extreme. In your case, it means that in certain cells for certain classes, there is a high of low probability.

Anonymous posted on Wednesday, February 16, 2005 - 10:40 am

But the additional observed variable then seems to go into the formation of the latent classes (which I don't want). It also changes the parameter estimates for the categories of the observed nominal variables to be means (or is this just the label?). My interpretation of the parameters before adding in the 'dependent' observed variable was logits representing the probability of each response alternative in each class. Now I'm not sure what they represent. thanks again for your time and advice,

Patrick

Linda K. Muthen posted on Thursday, February 17, 2005 - 8:15 am

When you add another variable, it will change the results. It is not possible for this variable not to contribute to the formation of the classes because it is related to c. The values listed under means for continuous variables are regular means. The values listed under means for nominal variables are logits.

Elmar posted on Wednesday, March 23, 2005 - 9:00 am

Dear Linda Muthen,

I am new to LCA. Following the discussion above, I understand that one cannot regress an observed variable on a categorical latent variable. My question: is it possible with mplus to regress one latent categorical variable onto another latent categorical variable? E.g. for analysing if membership in one latent class predicts membership in another latent class (in another variable)
I think this can`t be done using latentgold.
Thanks,
Elmar

bmuthen posted on Wednesday, March 23, 2005 - 10:14 am

Yes, you can regress one latent categorical variable on another latent categorical variable in Mplus. There are examples of that in the version 3 User's Guide, e.g. latent transition analysis.

Note that you can also achieve what amounts to regression of an observed variable on a categorical latent variable. You just don't use ON. Instead, the observed variable is influenced by the categorical latent variable by the observed variable mean (and/or variance) changing across the latent classes.

Anonymous posted on Tuesday, March 29, 2005 - 8:54 am

Dear Linda,

I have a question re interpretating the LCA results using nominal indicators (career with 5 categories) over multiple waves. The lowest category is govt job, followed by business, education, law and other. I built the model for c=(2)--- lca two-class model.

The LCA outputs are:

Latent Class 1

Means
CAR1#1 1.184 0.208 5.698
CAR1#2 0.842 0.238 3.533
CAR1#3 -1.193 0.347 -3.437
CAR1#4 -0.762 0.299 -2.554
CAR2#1 1.176 0.210 5.593
CAR2#2 0.862 0.240 3.583
CAR2#3 -1.274 0.359 -3.552
CAR2#4 -0.763 0.299 -2.550
.......
till the last wave.

Latent Class 2

Means
CAR1#1 1.798 0.218 8.245
CAR1#2 -0.355 0.290 -1.222
CAR1#3 0.484 0.255 1.896
CAR1#4 0.745 0.235 3.170
CAR2#1 1.767 0.216 8.163
......
till the last wave.

Categorical Latent Variables

Means
C#1 -0.413 0.092 -4.491

My questions are:
Are the means logits? How should I interpret it in the context of LCA?

Thank you very much for your time. I will really appreciate your help.

Linda K. Muthen posted on Saturday, April 02, 2005 - 8:44 pm

Your means/intercepts are logits. See the second to last section in Chapter 13. The example with no covariates corresponds to a nominal variable LCA where only the intercepts are used.

Jinseok Kim posted on Thursday, February 08, 2007 - 11:30 am

I try to ; use seven binary indicators (DLOCA-DNBGRP) for a four class LC (c); use the LC(c) as one of the predictors of a 4 category nominal DV(PCARE); and use a series of observed covariates (SEX - HINCOME) that influence both the LC(c) and the DV(PCARE). My questions:
1. How can I incorporate the step 2 (PCARE regressed on c) of the model into Mplus if I cannot use "PCARE#1 PCARE#2 PCARE#3 on C#1 C#2 C#3 ..."?;
2. Please explain about what I should look for in the output to interpret it as in multinomial logistic regression?; and
3. The following syntax ran well. Can you tell me what model I estimated?
Thanks. Here's syntax.

TITLE: LCA regression ;
DATA: FILE IS "choicefactor_pcare.txt";
VARIABLE: NAMES ARE BASMID DLOCA DCOST DRELY DLERN DCHIL DHROP DNBGRP PCARE SEX RESPSEX RESPAGE calcmonthage black hisp asia other npguadian momonly MOMGRADE dimomwork welf3yr HGOVCUR HINCOME;
USEVARIABLES ARE DLOCA - HINCOME;
CLASSES = c (4);
CATEGORICAL = DLOCA - DNBGRP SEX RESPSEX black - momonly dimomwork welf3yr HGOVCUR;
NOMINAL = PCARE;
ANALYSIS:
TYPE = MIXTURE;
ALGORITHM = INTEGRATION;
MODEL:
%OVERALL%
PCARE#1 PCARE#2 PCARE#3 on C#1 C#2 C#3 SEX - HINCOME;
C#1 C#2 C#3 ON SEX - HINCOME;

Linda K. Muthen posted on Friday, February 09, 2007 - 8:41 am

Please send your full output and license number to support@statmodel.com.

Jon Elhai posted on Monday, July 28, 2008 - 2:46 pm

Linda,
I ran a 6-class latent class analysis. I am trying to change the start values so that one particular class I'm interested in serves as the reference category for regressing the latent class on covariates. However, after trying to select various of the classes as class #6 (reference class), I can't seem to make my particular class the reference category - it shows up as a class other than class #6. Any suggestions?

Linda K. Muthen posted on Monday, July 28, 2008 - 3:00 pm

I would need to see what you are doing to make any suggestions. Please send your files and license number to support@statmodel.com.

Mads Meier J�ger posted on Wednesday, October 01, 2008 - 11:16 pm

Dear Linda

I have a similar problem. I want to regress a binary outcome variable (educational level) on a latent categorical variable with 3 classes (social class) and one continuous latent variable (IQ).

You explain that when the outcome variable is continuous you don't include the LCA variable in the ON statement because Mplus already models class-specific means of the outcome variable. However, in my setup the outcome variable is binary and I want a logit/probit model. So, I don't want to model class-specific means of the outcome variable but rather class-specific log-odds/logits of y=1 vs. y=0 (with one of the three latent classes being the reference category). Is this possible in Mplus and where in the output do I find the estimated log-odds ratios? My Mplus output has a section called "Categorical latent variable means" with estimates of c#1 and c#2 (I have three classes in my model so I assume that c#3 is the reference group here). These estimates look like what I want but I'm not sure. Are they logits? Also, while the estimates sizes look about right they have the opposite sign of what I would expect - does Mplus estimate y=0 vs. y=1 as the default?

I should say that I also ran my model with a continuous rather than a binary outcome variable (exam results) and this works fine.

Kind regards,

Mads Meier J�ger

Linda K. Muthen posted on Thursday, October 02, 2008 - 10:06 am

For categorical variables, you should be looking at the difference in thresholds over classes.

c#1 and C32 refer to the class proportions. The parameter estimates are logits.

Mads Meier J�ger posted on Thursday, October 02, 2008 - 11:56 pm

Dear Linda

Thank you. Just to make sure I understand your answer correctly. In my model where the binary dependent variable is ALVL (coded 0/1) and I have 3 latent classes I get:

Thresholds
Class 1: ALVL$1 5.571 (0.404)
Class 2: ALVL$1 6.134 (0.395)
Class 3: ALVL$1 6.218 (0.375)

Suppose I want to use LC3 as the reference group and compare the logits of y=1 vs. y=0 for LC1 and LC2 relative to LC3. I would then calculate:

LC1: (6.218-5.571) = 0.647
LC2: (6.218-6.134) = 0.084
LC3: = 0

Since LC3 is the "lowest" social class, the positive logit of 0.647 for the contrast with LC1 (the "highest" class) means that LC1 has a higher probability of y=1 vs. y=0 relative to LC3. This makes sense. How would I construct SEs for the estimate of 0.647?

Thanks

Linda K. Muthen posted on Friday, October 03, 2008 - 3:03 pm

The numbers you create are log odds ratios. You can exponentiate them and obtain odds ratios which is a good way to explain how the classes relate to the distal outcome. You can do this in MODEL CONSTRAINT and thereby obtain standard errors for the odds ratios or the log odds ratios.

yawen posted on Tuesday, March 09, 2010 - 9:48 pm

Dear Dr. Muthen,

I have similar questions, but I still don't know how to obtain standard errors.

My variables are:
usevariables =y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 y11 y12 y13 y14 y15 action;
nominal= y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 y11 y12 y13 y14 y15;
categorical=action;

"action" is a binary variable. I would like to see how the thresholds of action vary across three latent classes. My model is as follow. The question is how I should write model constraints to get the three standard errors I need to compare three thresholds.

Model:
%overall%
%c#1%
[action$1](p1);
%c#2%
[action$1](p2);
%c#3%
[action$1](p3);

Another related question is, if my entropy is high (.9), is it okay to use class membership as an observed variable to predict action if I don't use action as an indicator to identify latent class?

Thank you in advance!

yawen

Linda K. Muthen posted on Wednesday, March 10, 2010 - 9:57 am

You should use MODEL TEST or difference testing to see if the thresholds vary across classes. It is not necessary to compute standard errors.

NR posted on Wednesday, September 01, 2010 - 8:31 pm

Hello,

I'm trying to use class membership from LCA as a dependent variable in the subsequent regression analysis. I read from one of your articles posted here that this may produce incorrect estimates or standard errors, but I have to stick to this two-step approach for several reasons.

My question is, if I create multiple plausible values for latent classes and use them in the subsequent regression, does it help to produce correct estimates and standard errors? Or are there better ways to deal with this problem?

Thank you!!

Bengt O. Muthen posted on Thursday, September 02, 2010 - 9:00 am

There are 2 papers on our web site on this:

Clark, S. & Muth�n, B. (2009). Relating latent class analysis results to variables not included in the analysis.

Asparouhov, T. & Muth�n, B. (2010). Plausible values for latent variables using Mplus.

The second concludes that to generate the plausible values which will not produce biases you need to include the covariates in the model that generates the PVs - which is what you say you don't want to do. If you don't, you need a high entropy, say > 0.8 (see the first paper).