Message/Author 

Anonymous posted on Tuesday, February 15, 2005  12:57 pm



I am trying to fit an LCA model where I have 6 unordered, 3 category observed variables and 3 latent classes. I have the following questions: 1. the output gives me estimates for two of the three categories of each observed variable for each class. Are these the logit for the probability of cases within each class giving this response, relative to the reference category (where the refence category is the lowest numerical value of the observed nominal variable?). 2. Is it possible to regress a continuous (or binary) dependent variable on to the latent class variables to address the question of whether class membership is associated with some distal oucome? When I try this I get the following error message: *** ERROR The following MODEL statements are ignored: * Statements in the OVERALL class: FAVLIB ON C#1 *** ERROR One or more MODEL statements were ignored. These statements may be incorrect or are only supported by ALGORITHM=INTEGRATION. 3. Where can I find information about how to use start values to change the reference category in multinomial logistic regression? thanks, Patrick Sturgis 


It sounds like you need to add ALGORITHM=INTEGRATION to the MODEL command for that statement. With observed variables, the reference category is the category with the highest number. I think you would need to renumber the categories using DEFINE if you want to change the reference category. Start values are used to change the last class. 

Anonymous posted on Tuesday, February 15, 2005  7:05 pm



Thanks Linda I can't seem to get the algorithm=integration part to work. Below is my output with error messages: TITLE: LCA OF DEL POLL KNOWLEDGE ITEMS DATA: FILE IS c:\mplusfiles\LCA\MISINFO.CSV; VARIABLE: NAMES ARE serialno quiz1 quiz2 quiz3 EURO1W1 EURO2W1 WAGE1W1 WAGE2W1 TAX1W1 TAX2W1 EURO1W2 EURO2W2 WAGE1W2 WAGE2W2 TAX1W2 TAX2W2 IT1 IT2 IT3 IT4 IT5 IT6 ITB1 ITB2 ITB3 ITB4 ITB5 ITB6 VTORY VLAB VLIB FAVTORY FAVLAB FAVLIB LIBECON ; USEVARIABLES ARE FAVLIB q1 q2 q3 q4; CLASSES=C(4); NOMINAL= q1 q2 q3 q4; DEFINE: q1=3; if (euro2w1 eq 9) then q1=2; if (euro2w1 eq 1) then q1=1; if (euro2w1 eq 0) then q1=4; q2=3; if (WAGE2W1 eq 9) then q2=2; if (WAGE2W1 eq 1) then q2=1; if (WAGE2W1 eq 0) then q2=4; q3=3; if (TAX1W1 eq 9) then q3=2; if (TAX1W1 eq 1) then q3=1; if (TAX1W1 eq 0) then q3=4; q4=3; if (TAX2W1 eq 9) then q4=2; if (TAX2W1 eq 1) then q4=1; if (TAX2W1 eq 0) then q4=4; ANALYSIS: TYPE = MIXTURE; starts=50 50; algorithm = integration; MODEL: %OVERALL% FAVLIB ON C#1; *** WARNING in Model command All variables are uncorrelated with all other variables within class. Check that this is what is intended. *** ERROR The following MODEL statements are ignored: * Statements in the OVERALL class: FAVLIB ON C#1 Also, when I have a logit of 15 for probability of response within a latent class, can I take this to indicate that noone gave this response in this class? Or does it indicate a problem with the model? thanks again, Patrick 


I'm sorry. I didn't really look at your model, just the error message which wasn't accurate in this situation. You cannot regress an observed variable on a categorical latent variable. The means of the observed variable varying across classes gets at that parameter. 

Anonymous posted on Wednesday, February 16, 2005  6:10 am



Linda I suppose another way might be to save latent class membership variables and then use these as predictors in a subsequent model? Not as desirable as a single estimation but would get at the question directly. I'm not sure how to obtain the means of different observed variables across classes. If I specify continuous variables in the usevariables command, the output gives me means for these observed variables across classes but it also appears to change the parameters given for the nominal variables. Are the latter still logits for response given class membership? I will send output for the model in question. Also, you didn't address my question regarding the meaning of logits of 15 and 15 with zero standard error in the LCA. Does this indicate a problem for the model? Thanks, Patrick 


Saving the latent class membership would not do what you want. In addition, this approach introduces estimation errors and the standard errors will be too small. The output that you sent does what you want. Think of the regression of y ON c where y is a continuous variable and c is a categorical latent variable. This regression results in the means of y changing over the classes of c. So you don't specify it using an ON statement. You just allow the means to vary over c. Your output has the means varying over c. So you have what you want. Values are fixed when they become extreme. In your case, it means that in certain cells for certain classes, there is a high of low probability. 

Anonymous posted on Wednesday, February 16, 2005  4:40 pm



But the additional observed variable then seems to go into the formation of the latent classes (which I don't want). It also changes the parameter estimates for the categories of the observed nominal variables to be means (or is this just the label?). My interpretation of the parameters before adding in the 'dependent' observed variable was logits representing the probability of each response alternative in each class. Now I'm not sure what they represent. thanks again for your time and advice, Patrick 


When you add another variable, it will change the results. It is not possible for this variable not to contribute to the formation of the classes because it is related to c. The values listed under means for continuous variables are regular means. The values listed under means for nominal variables are logits. 

Elmar posted on Wednesday, March 23, 2005  3:00 pm



Dear Linda Muthen, I am new to LCA. Following the discussion above, I understand that one cannot regress an observed variable on a categorical latent variable. My question: is it possible with mplus to regress one latent categorical variable onto another latent categorical variable? E.g. for analysing if membership in one latent class predicts membership in another latent class (in another variable) I think this can`t be done using latentgold. Thanks, Elmar 

bmuthen posted on Wednesday, March 23, 2005  4:14 pm



Yes, you can regress one latent categorical variable on another latent categorical variable in Mplus. There are examples of that in the version 3 User's Guide, e.g. latent transition analysis. Note that you can also achieve what amounts to regression of an observed variable on a categorical latent variable. You just don't use ON. Instead, the observed variable is influenced by the categorical latent variable by the observed variable mean (and/or variance) changing across the latent classes. 

Anonymous posted on Tuesday, March 29, 2005  3:54 pm



Dear Linda, I have a question re interpretating the LCA results using nominal indicators (career with 5 categories) over multiple waves. The lowest category is govt job, followed by business, education, law and other. I built the model for c=(2) lca twoclass model. The LCA outputs are: Latent Class 1 Means CAR1#1 1.184 0.208 5.698 CAR1#2 0.842 0.238 3.533 CAR1#3 1.193 0.347 3.437 CAR1#4 0.762 0.299 2.554 CAR2#1 1.176 0.210 5.593 CAR2#2 0.862 0.240 3.583 CAR2#3 1.274 0.359 3.552 CAR2#4 0.763 0.299 2.550 ....... till the last wave. Latent Class 2 Means CAR1#1 1.798 0.218 8.245 CAR1#2 0.355 0.290 1.222 CAR1#3 0.484 0.255 1.896 CAR1#4 0.745 0.235 3.170 CAR2#1 1.767 0.216 8.163 ...... till the last wave. Categorical Latent Variables Means C#1 0.413 0.092 4.491 My questions are: Are the means logits? How should I interpret it in the context of LCA? Thank you very much for your time. I will really appreciate your help. 


Your means/intercepts are logits. See the second to last section in Chapter 13. The example with no covariates corresponds to a nominal variable LCA where only the intercepts are used. 

Jinseok Kim posted on Thursday, February 08, 2007  5:30 pm



I try to ; use seven binary indicators (DLOCADNBGRP) for a four class LC (c); use the LC(c) as one of the predictors of a 4 category nominal DV(PCARE); and use a series of observed covariates (SEX  HINCOME) that influence both the LC(c) and the DV(PCARE). My questions: 1. How can I incorporate the step 2 (PCARE regressed on c) of the model into Mplus if I cannot use "PCARE#1 PCARE#2 PCARE#3 on C#1 C#2 C#3 ..."?; 2. Please explain about what I should look for in the output to interpret it as in multinomial logistic regression?; and 3. The following syntax ran well. Can you tell me what model I estimated? Thanks. Here's syntax. TITLE: LCA regression ; DATA: FILE IS "choicefactor_pcare.txt"; VARIABLE: NAMES ARE BASMID DLOCA DCOST DRELY DLERN DCHIL DHROP DNBGRP PCARE SEX RESPSEX RESPAGE calcmonthage black hisp asia other npguadian momonly MOMGRADE dimomwork welf3yr HGOVCUR HINCOME; USEVARIABLES ARE DLOCA  HINCOME; CLASSES = c (4); CATEGORICAL = DLOCA  DNBGRP SEX RESPSEX black  momonly dimomwork welf3yr HGOVCUR; NOMINAL = PCARE; ANALYSIS: TYPE = MIXTURE; ALGORITHM = INTEGRATION; MODEL: %OVERALL% PCARE#1 PCARE#2 PCARE#3 on C#1 C#2 C#3 SEX  HINCOME; C#1 C#2 C#3 ON SEX  HINCOME; 


Please send your full output and license number to support@statmodel.com. 

Jon Elhai posted on Monday, July 28, 2008  8:46 pm



Linda, I ran a 6class latent class analysis. I am trying to change the start values so that one particular class I'm interested in serves as the reference category for regressing the latent class on covariates. However, after trying to select various of the classes as class #6 (reference class), I can't seem to make my particular class the reference category  it shows up as a class other than class #6. Any suggestions? 


I would need to see what you are doing to make any suggestions. Please send your files and license number to support@statmodel.com. 


Dear Linda I have a similar problem. I want to regress a binary outcome variable (educational level) on a latent categorical variable with 3 classes (social class) and one continuous latent variable (IQ). You explain that when the outcome variable is continuous you don't include the LCA variable in the ON statement because Mplus already models classspecific means of the outcome variable. However, in my setup the outcome variable is binary and I want a logit/probit model. So, I don't want to model classspecific means of the outcome variable but rather classspecific logodds/logits of y=1 vs. y=0 (with one of the three latent classes being the reference category). Is this possible in Mplus and where in the output do I find the estimated logodds ratios? My Mplus output has a section called "Categorical latent variable means" with estimates of c#1 and c#2 (I have three classes in my model so I assume that c#3 is the reference group here). These estimates look like what I want but I'm not sure. Are they logits? Also, while the estimates sizes look about right they have the opposite sign of what I would expect  does Mplus estimate y=0 vs. y=1 as the default? I should say that I also ran my model with a continuous rather than a binary outcome variable (exam results) and this works fine. Kind regards, Mads Meier Jæger 


For categorical variables, you should be looking at the difference in thresholds over classes. c#1 and C32 refer to the class proportions. The parameter estimates are logits. 


Dear Linda Thank you. Just to make sure I understand your answer correctly. In my model where the binary dependent variable is ALVL (coded 0/1) and I have 3 latent classes I get: Thresholds Class 1: ALVL$1 5.571 (0.404) Class 2: ALVL$1 6.134 (0.395) Class 3: ALVL$1 6.218 (0.375) Suppose I want to use LC3 as the reference group and compare the logits of y=1 vs. y=0 for LC1 and LC2 relative to LC3. I would then calculate: LC1: (6.2185.571) = 0.647 LC2: (6.2186.134) = 0.084 LC3: = 0 Since LC3 is the "lowest" social class, the positive logit of 0.647 for the contrast with LC1 (the "highest" class) means that LC1 has a higher probability of y=1 vs. y=0 relative to LC3. This makes sense. How would I construct SEs for the estimate of 0.647? Thanks 


The numbers you create are log odds ratios. You can exponentiate them and obtain odds ratios which is a good way to explain how the classes relate to the distal outcome. You can do this in MODEL CONSTRAINT and thereby obtain standard errors for the odds ratios or the log odds ratios. 

yawen posted on Wednesday, March 10, 2010  3:48 am



Dear Dr. Muthen, I have similar questions, but I still don't know how to obtain standard errors. My variables are: usevariables =y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 y11 y12 y13 y14 y15 action; nominal= y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 y11 y12 y13 y14 y15; categorical=action; "action" is a binary variable. I would like to see how the thresholds of action vary across three latent classes. My model is as follow. The question is how I should write model constraints to get the three standard errors I need to compare three thresholds. Model: %overall% %c#1% [action$1](p1); %c#2% [action$1](p2); %c#3% [action$1](p3); Another related question is, if my entropy is high (.9), is it okay to use class membership as an observed variable to predict action if I don't use action as an indicator to identify latent class? Thank you in advance! yawen 


You should use MODEL TEST or difference testing to see if the thresholds vary across classes. It is not necessary to compute standard errors. 

NR posted on Thursday, September 02, 2010  2:31 am



Hello, I'm trying to use class membership from LCA as a dependent variable in the subsequent regression analysis. I read from one of your articles posted here that this may produce incorrect estimates or standard errors, but I have to stick to this twostep approach for several reasons. My question is, if I create multiple plausible values for latent classes and use them in the subsequent regression, does it help to produce correct estimates and standard errors? Or are there better ways to deal with this problem? Thank you!! 


There are 2 papers on our web site on this: Clark, S. & Muthén, B. (2009). Relating latent class analysis results to variables not included in the analysis. Asparouhov, T. & Muthén, B. (2010). Plausible values for latent variables using Mplus. The second concludes that to generate the plausible values which will not produce biases you need to include the covariates in the model that generates the PVs  which is what you say you don't want to do. If you don't, you need a high entropy, say > 0.8 (see the first paper). 

Back to top 