I am trying to use a 6-class solution in LCA to predict continuous outcomes (e.g., depression, or t2dep) at Time 2 while controlling for various things at Time 1 (Time 1 depression, or t1dep, sex, age, income, education, and marital status).
I found a lot of information about the syntax to predict class membership, but not much about how to use class membership to predict outcomes. When I used the following syntax, I got the ERROR message below it:
%Overall% T2dep on C T1dep rc001re sexrsp rb003red t1age;
The following MODEL statements are ignored: * Statements in the OVERALL class: T2DEP ON C#1 T2DEP ON C#2 T2DEP ON C#3 T2DEP ON C#4 T2DEP ON C#5
Can you tell me what I am doing wrong? I also tried writing out C#1-C#5 in the model statement, but I got the same error message. What is the appropriate syntax for using class to predict an outcome, controlling for various covariates?
I am looking at Exampl 8.6 however, and it is referring to "GMM with a categorical distal outcome using automatic starting values and random starts" - but my distal outcome is CONTINUOUS (e.g., depression). It is my predictor (class) that is categorical. Am I looking at the wrong example??
It does not matter whether the distal outcome is continuous or categorical or what the model is. The distal outcome is connected with the categorical latent variable. For a continuous distal outcome, the means vary over classes. For a categorical distal outcome, the thresholds vary across classes.
In your first paragraph you mention depression twice: as a DV and as an IV. I assume you mean depression measured at T1 and T2 - otherwise you need to clarify.
I hear this question often - you have a distal outcome that you want to predict from latent class membership but you don't want the distal to influence your classes. It is interesting that this question never seems to come up in SEM when the latent class variable is replaced by a factor - the factor is also determined in part by the distal. My feeling is that the problem should be re-conceptualized - the distal should influence the latent class membership in a first model step. Then, using this model, in a second step a new sample can be considered for which you classify people into the classes not using the distal information, fixing the parameters at the values from the first step.
True, you can do the analysis without the distal in the first step and then do the second step. But that first step would not draw on the strength that the latent classes are informed by their relationship to the distal - if you think the distal has different means/probabilities for different classes, why not use that information in forming the classes. Also, you would not have an estimate of how the classes influence the distal.
Thank you. Yes, there are 2 depression time points, sorry.
I'm new to SEM/MPLUS and haven't used factors/latent variables.
However - my motivation to separate the two (distal and class) is my desire to see if the latent class (comprised of economic variables) is related to depression. I don't wish to confound the classes by including depression as an indicator. I'd like to see if people 'more' successful economically have a different relation as to depression compared to people less successful economically.
And when you suggest using different samples - I'm unclear if you mean an entirely different set of records (people). I only have my main sample of 632 persons.
If at all possible, could you further explain the steps required for both of your suggestions?
You can do LCA on the economic variables only and then (1) fix all those parameters when adding the distal, or (2) get the most likely class membership (if the entropy is big) and let that be an observed variable that predicts the distal. That's one line of thinking.
The other line is saying that you can get the economics latent classes more pertinent to predicting depression if you include the depression distal in the modeling. You can for example split your sample and estimate that full model in the first step and then in the second step use the other half of the sample and use the econ items only to form latent classes with parameters fixed at the values from the first run - and see how that latent class membership relates to depression. We teach about similar approaches in "Topic 6" of our short courses - see videos and handouts on our web site.
Reading your very helpful comments above brought a question to my mind. In a standard LCA with ordered categorical outcomes, would one only need to fix the thresholds prior to adding the distal outcomes (and other covariates)?
If at all possible, I'd appreciate some starting steps to accomplish your suggestions above of:
"You can do LCA on the economic variables only and then (1) fix all those parameters when adding the distal, or (2) get the most likely class membership (if the entropy is big) and let that be an observed variable that predicts the distal."
From the follow up emails I understand I need to fix both thresholds and probabilities - and, unfortunately, haven't found an example I understand.
I 'think' the section of Topic 6 that is related starts on page 142 - but I unclear how to translate the results of my LCA into the 2nd step wiith the distal.
It is a little hard to teach this topic in a short Mplus Discussion format - better to come to our short courses or watch their movies (the Topic 5 mixture movie will shortly be available from our recent Berlin teaching). But let me say a few things:
For the simplest approach (2) above, you get the most likely class membership by requesting cprob in the Save option of the Savedata command - see p. 649 of our UG.
To fix parameters, see UG ex 7.5 where instead of * you use @. The ex shows the item parameters - you also have to add the class prob-related parameters, for example
Jinseok Kim posted on Wednesday, October 21, 2009 - 10:46 am
Hi Bengt I am conducting LCA with a number of binary class indicators and a continuous distal outcome variable. I've got thresholds for class indicators and means and SE for the distal outcome variable for each class. My question is now whether there is any way to test whether the estimated means of the distal outcome variables differ across the classes.
Your previous responses have been most helpful. I've since used 'hard coded' class membership in regressions with distal predictors per the suggestion: "(2) get the most likely class membership (if the entropy is big) and let that be an observed variable that predicts the distal."
I have 2 follow up questions:
I've learned how to interpret resulting using (using 2 of 3 classes) classes as Dependent variables. Could class be a Independent variable predicting a distal 'outcome'? And how would that be interpreted? Would I use Class as a nominal variable - or test an excluded class by including the 2 remaining classes as independent variables?
Second - I can't get my head around the choice between (1) using covariates in class creation - versus (2) creating classes without covariates and including covariates when finding the relations between class membership and distal predictors. I understand this may be an issue specific to the research questions - but any insight would be helpful.
Hi, I have a question similar to those in this thread but a little more elaborate. I want to use latent classes formed from variables X1-X8 as "baseline" covariates in a growth model of Y1-Y4, ie growth of Y over 4 time points.
Specifically: 1) The "baseline" latent classes are formed from the baseline variables X1-X8 only. 2) All individuals are followed up post-baseline with repeated measurements of Y: Y1-Y4. I would like to fit a growth model for Y for each individual allowing the intercepts and slopes to depend on the latent classes formed from (1).
I would like uncertainty in class formation to be taken into account when when doing the growth model regression, but I definitely do *not* want the Y's to be used in the formation of the latent classes. In my specific example it makes no sense to have class formation using future measurements. (eg classes are "baseline" only)
Can this be modelled in MPlus? If so, where should I look for such examples?
At worst case I guess I could fix each individual's class and do the second stage regression as if these were fixed in advance, and then try to concoct a post-hoc misclassification correction of the effects of each class based on the posterior class probabilities for each individual. I haven't seen this done but I suspect it may have been. Any pointers would be appreciated.
This is a complex topic. In one sense the Y's should contribute to the classes for X1-X8 because the Y's are correlated with the X classes so why not use all the information pertaining to the X classes?
One approach that might help is to add a latent class variable for the Y's, so having two latent class variables. And then let those two be related.
Another approach is to impute m data sets for the X class variable (most likely class) using only X1-X8 information. And then do the analyses m times relating X class to the Y's.
I think some object to using both X and Y information in forming the X classes because they want to use the X classes for prediction of Y in later stages where only X information is available. If that's the case, I would conjecture that you get a better prediction model when the X classes have been formed based on both X and Y.
I have read the above discussion and have a follow up question about fixing class probabilities. I choose the option of estimating latent classes in one half of the sample (i.e. social support at one measurement of friends and parents), then using the other half of the sample to fix the class solution and use this latent class variable to predict a distal outcome (a latent growth curve of social anxiety).
you state that: "To fix parameters, see UG ex 7.5 where instead of * you use @. The ex shows the item parameters - you also have to add the class prob-related parameters, for example
What do x and y stand for, exactly? I understand from example 7.5 that I would need to fix the estiamted means and variance for each of the latent classes; but from your response I need to additionally (!) fix the class probabilities. Is that where x and y stand for? How do I calculate/get these?
These are the logit values for the intercepts of the categorical latent variable.
Hyunzee Jung posted on Tuesday, September 02, 2014 - 8:56 pm
My 4-class latent variable has behaved in a stable manner throughout a series of LCA with covariates. However, when distal outcome variables were introduced, the latent classes changed. I used both DU3STEP (DE3STEP) and DCON. Any insights or inputs to give me some guidance forward will be very much appreciated.
DCON should not change the classes. You may want to send to support with your license number.
Hyunzee Jung posted on Tuesday, September 02, 2014 - 10:30 pm
Thanks so much, Bengt. One additional question is whether it makes sense to split my model into two-step process. I have covariates (including a predictor treated as IV), a focal latent class variable, and continuous distal outcome variables. I analyzed the model of covariates and the latent class using the one-step approach. A single run took so much time (over 5-6 hours). Assuming the latent classes are generated in a stable manner throughout in any situations, I wonder if I can do the analyses, first on covariates to latent class and second on latent class to distal outcome, by separate runs. This is a broad, general question, but if you could give me any advice you can give, I'd very much appreciate it!
You should be able to do it in steps using DCON. But I don't see why the run would take so long; perhaps you want to send output and data to support along with your license number.
Hyunzee Jung posted on Wednesday, September 03, 2014 - 2:06 am
Thank you again, Bengt. I will send my data and output. Apart from that, would it generally make sense for an analyst to perform analyses in two steps for him- or herself as described above?
Hyunzee Jung posted on Wednesday, September 03, 2014 - 6:11 am
To give you more information, my IV is a factor (3 indicators) predicting latent classes- also have 4 covariates). Further, I am estimating moderating effect of gender. I used “intx | male XWITH f1” and regressed the latent class on ‘intx’ along with other covariates.
Q1. The reason that I asked about separate runs for the front (IV, intx, and covariates to Latent class) and the latter (Latent classes to continuous distal outcomes) is because interaction variables for TYPE=MIXTURE are available only if ALGORITHM=INTEGRATION is specified in the ANALYSIS command, however, auxiliary variables with DCON are not available with TYPE=MIXTURE in conjunction with ALGORITHM=INTEGRATION. It seems that my front part of the model and the back part are not compatible each other. My question is, "does it make sense or is it a fine choice to go through two separate runs?"
Q2. I am planning to introduce gender as a moderator in the back part of my model as well. Effect of the latent classes on my distal outcomes are hypothesized to differ across gender. Then, interaction variables for TYPE=MIXTURE should come in and ALGORITHM=INTEGRATION should also be specified, which can't be done with DCON. What options would I have to estimate an interaction effect in this latter path?
I am posting this because this seems like a more general question than specific to a certain dataset. Thanks so much again!