Anonymous posted on Thursday, November 15, 2001 - 12:53 pm
I have a six class model, and one of the classes has the lowest number of members (around 15). When I add X(0,1) as a predictor, this smallest class divides up as 13 0's and 2 1's. This seems to be causing a problem, especially since I have other predictors in the model as well. What do I do in this case? And does Mplus have a conditional maximum likelihood method? Thanks.
Yes, this happens with small classes. You probably have a slope and an intercept for this small class that start going towards extreme values because almost everyone has the same x value in this class. You can fix these coefficients at plus or minus 15 to get estimates for the other parameters.
I am running a LCA with 8 dichotomous u variables indicating status of utilization of specific mental health services. The best model I came up with has 3 classes. I know want to run a model with a dichotomous covariate (type of service delivery system) predicting the latent categorical variable. I specify the following:
I get a message saying that the model estimation terminated abnormally due to an ill-conditioned Fisher information matrix. And there is a problem with the gamma(c) matrix starting value for the regression of c#2 on siteid. Do you have any advice on how I should proceed? I have tried specifying different starting values for c#2 on siteid, but to no avail. Thanks for any insight you can provide.
Anonymous posted on Friday, January 11, 2002 - 10:59 pm
Most probably the "siteid" variable is constant (or close to constant) in class 2. That makes the slope in "c#2 on siteid" unidentifiable, however by fixing that slope to zero (or any other number for that matter) you should get just as good a model (check the log-likelihood value to make sure). Also if you request Tech 7 in the output command, you can check whether the variance of "siteid" in class 2 is zero.
Another possibility is that the slope in "c#2 on siteid" is very large or very small, i.e., it approaches +/- infinity (and therefore it is unidentifiable). In that case fixing the slope to +/- 15 will produce an identified model that is just as good (check the log-likelihood of the two runs). This second possibility corresponds to the case when all subjects with certain "siteid" are classified in the same class.
Anonymous posted on Friday, January 18, 2002 - 3:31 am
How is it that this parameter can be fixed to zero or any other number and you can get a good solution for the model?
Anonymous posted on Saturday, January 19, 2002 - 1:09 am
Thanks for the question. Now I realize that my message from Jan 11 is incorrect. Sorry about the confusion…
When the variance of an X variable is 0 in class C then the slope of U on X in class C is unidentified and can be fixed to zero or any other number and you can get a good solution. This can not occur in the latent class regression (C on X).
An empirical non-identification in the latent class regression usually arises from empty cells in the joint distribution of C and X. Say that C=1,2,3 and X=0,1. If P(C=2,X=1)=0 then the slope C#2 on X is – infinity. If P(C=3,X=1)=0 then the slope C#2 on X is + infinity.
The bottom line is that you have an empirical non-identification. Something in the data makes the c#2 on siteid an unidentifiable slope. Take the ending value from the failed run and fix the slope to that value. This way you will be able to get SE for the rest of the parameters.
Jason Bond posted on Monday, April 28, 2003 - 2:27 pm
Hello. I have a latent class problem where I have an observed class variable which has 5 classes. So I have created 5 dummy indicators variables Y910 - Y914. I also have a continuous predictor variable, BAC. Essentially, what I would like to do is estimate 5 cutpoints, or thresholds (call them T1-T4) on the continuous variable BAC such that:
It seems that what one should look at in the output, to accomplish the above goal, is the latent class regression model part giving the intercept and slope of the regressions of Y910 - Y914 on BAC, but I seem to be a little confused about where the estimates of the thresholds T1-T4 are obtained. Any sugesstions for my confusion? Also, I seem to be having trouble with the thresholds approaching the extreme values. Should I not be using all 5 category indicators Y910 - Y914, or could this just be a starting values problem? Thanks much for any input.
I have a difficult time understanding these questions. To move forward, instead of stating the problem in terms of the Mplus analyses you are contemplating, can you please describe conceptually what it is that you want to do? BAC sounds like blood alcohol concentration. What do the BAC thresholds correspond to? What are the y's? How should the relationship between BAC and y's be viewed? Stating these things very briefly might help me understand.
D.Gross posted on Tuesday, November 02, 2004 - 3:49 am
I am trying to determine how many males and females from one nominal covariate (gender) are in one of my two classes. Is there any command I can use to do that in Mplus?
bmuthen posted on Tuesday, November 02, 2004 - 11:47 am
Use Mplus graphics, where you can look at histograms of the analysis variables by class. Clicking on a histogram bar gives you the number of people for it.
Anonymous posted on Tuesday, May 03, 2005 - 7:40 am
I am estimating a latent class growth mixture model with 2 classes and 2 predictors of the latent slope and latent intercept. While I assume the predictors to have a different impact on eta depending on class, I have strong reason to believe they are mutually uncorrelated (throughout all classes). By default, however, Mplus treats the predictors as correlated. The (default) model converges without any problems, however, after adding p1 WITH p2@0; in the %overall% model statement I get the message that "This latent class regression requires numerical integration" - why? Thank you very much!
When you mention p1 and p2, they are no longer considered to be independent variables. They are brought into the model and numerical integration is required. Add ALGORITHM=INTEGRATION; to the ANALYSIS command.
Anonymous posted on Tuesday, May 03, 2005 - 11:39 am
thank you for your extremely fast reply! Unfortunately, I still do not understand (sorry): All I want to do is fix a parameter to zero (independent of class membership). This should make the model more restrictive (are not the models even nested?) and I am having difficulties understanding why I have to change the estimation algorithm when the more general model is estimated without problems. Can you hint me to any references on this? Thank you so much - this board is so helpful!!!
bmuthen posted on Tuesday, May 03, 2005 - 12:09 pm
Does your model specify that the 2 predictors influence not only the 2 growth factors (intercept and slope) but also the latent class variable?
Anonymous posted on Thursday, June 09, 2005 - 1:20 pm
Hello. I am estimating a latent class growth model and would like to get the predicted values for the outcome variable for each individual. So, for example, if the model places the ith person in class j, could I get a N by T (where N is the number of respondents and T is the number of time points available) dataset where for the ith row, the predicted values are those for class j? Thanks much,
You can get a plot of these values using the PLOT command, requesting individual estimated values. When you are in the graphics module you can use the Save graph data menu function to save the individual values used in the plot.
I have a conceptual question for LCA with covariates. I am trying to define subgroups of drug users based on the types drugs they usually use (y1-y14). I would like to evaluate several covariates and then based on the classification of cases compare groups on certain external validators (indicators of risk, drug side effects, etc.). As I understand the use of covariates, they should be background variables (age, race/ethnicity, income, etc.). My question is, are covariates in a sense "washing out" the effects of the covariate on latent class membership such that the differences between classes on a given drug use (y1-y14) are independent of the given covariate?
My second question is how to determine or justify the inclusion/exclusion of covariates into the model statistically. For instance, if I include age as a covariate, but find it does not significantly contribute to class membership, am I justified in dropping it from the model?
bmuthen posted on Friday, September 23, 2005 - 8:11 am
Regarding your first question, the covariates do not wash out the differences in covariate means across classes. If a covariate has a strong effect on class membership then it will have different means in the classes.
Strategies for the inclusion/exclusion of covariates would follow the same lines of arguments as in conventional regression analysis -and I am not sure there is concensus even there...
In estimating a mixture model with covariate (time invariant, i.e. predictors) would you: (1) estimate the best fitting model without covariates (i.e. estimate a 1, 2, 3, 4, 5, ... models and decide on the best one) and then include covariates only in the best fitting model; OR (2) Estimate all models with the covariates before choosing the best one; OR (3) do something else ?
The strategy I would use is to include covariates only after the number of classes has been determined. If adding the covariates changes the class structure, this might point to the need for direct effects from the covariates to the latent class indicator variables. These direct effects represent measurement non-invariance. If they are needed, then the process may need to be done again taking this into account.
In a growth mixture model, is it possible to determine to what extent the predictors can predict membership in the classes? For instance, I have a 4 trajectory class solution for development of alcohol symptoms across age 15 to 45 and I want to know how well I can predict membership in one of these classes based on predictors at age 15. I'm hoping for something like percent correctly classified.
Percent correctly classified is used in logistic regression where the categorical dependent variable is observed. Here it is not observed so we don't know what the true status is. I haven't seen a classification approach done in this case.
One approach one can consider is what we do in our Mplus teaching on growth mixture modeling using the reading data example. We estimate the full model with the outcomes for all time points. Then we fix parameters at those values and use the model as a measurement instrument, that is, only estimate the posterior probabilities for each class and individual. Here, you can consider less than the full information. For example, you can tell the program that you have no data on the outcomes by setting those at the missing data flag. Thereby you would use as observed variables only the covariates predicting class membership. You can then crosstabulate most likely class membership when using only covariate information with that using full information. This tells you if you do well or not using only the covariate information in terms of specificity and sensitivity.
Sarah Dauber posted on Tuesday, September 02, 2008 - 6:11 pm
Hello, I am running a 4-class GMM model with covariates. I am trying to determine whether the covariates significantly distinguish among the classes. I know I have to regress the latent class variable (c) on the covariates, but how do I set a particular class to be the reference group? I would like to look at all possible comparisons (ie vary the reference group so all 4 groups are compared with each other). How do I do this?
We give all possible comparisons. You don't have to do anything.
Sarah Dauber posted on Tuesday, September 02, 2008 - 6:29 pm
The output I got did not give all possible comparisons. Maybe I am specifying something wrong in the input? I have pasted the model part of my input below. Thanks, Sarah analysis: type=mixture missing; algorithm=integration; integration=montecarlo; starts=100 10; process=2; Model: %overall% i s q | aodpda7@0aodpda8@1aodpda9@2aodpda10@3aodpda11@4aodpda12@5 aodpda13@6aodpda14@7aodpda15@8aodpda16@9aodpda17@10aodpda18@11; s-q@0; i s q on newage d4revx bleduc gender; c#1 on newage d4revx bleduc gender; c#2 on newage d4revx bleduc gender; c#3 on newage d4revx bleduc gender; %c#1% i; i on newage d4revx bleduc gender; %c#2% i; i on newage d4revx bleduc gender; %c#3% i; i on newage d4revx bleduc gender; %c#4% i; i on newage d4revx bleduc gender;
I am using Mplus 5.1 to conduct latent class analyses. The new output is not giving me all of the comparisons for the multinomial regression analyses. Do I need to add something to the code to get this? Thanks!
I am using Mplus 5.21 to conduct LCA using longitudinal data. I am new to LCA and I have three conceptual questions. I am interested in exploring group membership based on responses to five categorical items concerning sexual behavior collected at Time 7.
1. I have a number of antecedent variables (e.g., gender, race, early family experience, pubertal timing collected at Time 1 and Time 2) that I believe are causally related to group membership at Time 7. I am planning to include these in the model as covariates. However, I also have a number of concurrent variables (e.g., romantic attachment, attitudes about the opposite sex at Time 7) that I think also relate to group membership. After reading about the auxiliary variable function, I was planning to use these concurrent variables as auxiliary variables (as opposed to covariates). Would this be the correct approach or should I include all variables as covariates?
2. I assume the correct steps would be to first define the number of groups, then run a model including covariates. Should I include aux variables at each step or just the final step?
3. Is M plus capable of running a multiple group LCA for males and females? How is this distinct from just adding gender as a covariate?
Much thanks for any light that you can shed on these issues!
I am trying to specify a growth mixture model with a latent predictor (as opposed to an observed predictor). I am interested in seeing how this latent predictor is associated with class membership. I specified the covariances between the latent predictor and the growth parameters (intercept, slope and quadratic) to be zero.
I noticed in the output that when I include this latent variable as a predictor, I get class specific intercepts for the observed indicators of the latent variable, and I get class specific variances for the latent predictor and class specific residual variances for the observed indicators of the latent predictor, which are all by default constrained to be equal across classes. However, when I run a similar model with an observed predictor, there are no class specific estimates provided for that predictor (i.e., class specific intercept or variance) shown in the output. Why do I get class specific variance estimates for my latent predictor and class specific residual variances for my observed indicators in this model, but not when I only use an observed predictor? Does this approach seem problematic as compared to just using an observed predictor as far as trying to see if this predictor is associated with class membership?
The f1 -->c model is discussed in our Topic 5 short course - see the handout's ending section Structural Equation Mixture Modeling.
c-->f2 is standard, although you don't say f2 ON c, but as the default f2 means change as a function of the c classes.
So you can put it all together in one model.
Mary H. posted on Friday, February 24, 2012 - 2:16 pm
I am running a latent class analysis with a sample of 204 people. Results show that a 2 class model fits the data the best. However, 198 people are in Class 1 and 6 are in Class 2. Is it still possible/appropriate to examine covariates of membership in these clusters (i.e., if higher income people are more likely to be in one cluster or another)?
It's too small of a class to draw inferences related to that class.
Mary H. posted on Sunday, February 26, 2012 - 12:30 pm
Thank you for your previous response. I have a follow-up question.
Below is some of the the model fit information I received for a 1-class and a 2-class solution. When I examine a 3-class solution, the model fit indices are not any better and I get a notification about a non-positive definite matrix. The classes for the 3-class model are even less meaningful (class 1=198 people, class 2=1 person, and class 3=5 people). Is it correct for me to interpret from all of this that a one cluster solution is the best?
1 class model TESTS OF MODEL FIT
H0 Value -646.770 H0 Scaling Correction Factor 2.736 for MLR
I would remove the 6 observations. They may be outliers. Run the analysis without them.
Mary H. posted on Monday, February 27, 2012 - 4:24 pm
Thank you! I looked at the output file that contains the probability estimates to identify these 6 cases. I removed the cases, and re-ran the analyses. This time 5 people were placed into class two (with mean estimates on the idicator similar to previous results). Should I remove these people also and try again? Or does this mean that the 1-class solution is the best?
I wanted to follow up on the thread from June 14 2007 about class specific direct effects and the issues with estimating this model dependent on data... what specific data requirements are needed to estimate class specific direct effects?
I have estimated a 4 class model with 5 binary indicators and after some investigation it appears that I may have class specific direct effects but I have been having a hard time with this syntax in Mplus - are there any examples that I can follow?
If direct effects are mentioned in the overall part of the MODEL command, the regression coefficients are held equal across classes as the default. To relax this constraint, mention the direct effects in the class-specific parts of the MODEL command also.
*** ERROR The following MODEL statements are ignored: * Statements in Class 1: D2 ON BI D2 ON AS D4 ON HI * Statements in Class 2: D3 ON SXON * Statements in Class 3: D3 ON BMI2 D2 ON SXON D4 ON SXON D4 ON SXON * Statements in Class 4: D4 ON BMI2 D4 ON GENDER *** ERROR One or more MODEL statements were ignored. Note that ON statements must appear in the OVERALL class before they can be modified in class-specific models. Some statements are only supported by ALGORITHM=INTEGRATION.
I‘m trying to test whether the interaction between two continuous observed variables predicts class membership in a LCA. As I understand, the most appropriate way to test predictors and covariates is to use ‘r3step’ within the auxiliary option. I’m having a problem, however, when I attempt to center the predictors to create the interaction term. I get an error message saying that variables used in the CENTER option need to be used in the MODEL command (as I mentioned I want to use them in the auxiliary option). I would center the variables in the original datafile prior to importing into Mplus, but am using multiple imputation within MPlus, which I believe precludes this as a viable option. I do not want to alter the class structures by putting the interaction term into the model. Is there something I’m missing or is there a better way to test my interaction term given the constraints I’ve mentioned? Thank you for your help!
I am running an LTA (4 classes over two time points with covariates (dichotomous). In the LTA without covariates it is apparent that some of the transition probabilities between classes are extremely small (0.001 or 0). I understand I need to impose parameter restrictions to set those transitions equal to 0. However, I am not confident in how I constructed the code for this:
There is almost zero probability of transitioning from class 3 to class 4 or class 3 to class 2. Would I incorporate statements like “ c1#3 on c2#4@0”? This is what I have currently with no restrictions.
Model: %Overall% c1#1 on x1-x6 ; c1#2 on x1-x6 ; c1#3 on x1-x6 ; c2#1-c2#3 on c1#1-c1#3 x1-x6;
[ rmob$1] (1); [ rself$1 ] (2); [ rusual$1 ] (3); [ rpain$1] (4); [ ranxiet$1] (5); ……. And so on for MODEL C2
Hi there, I am runnig a LPA with 3 Variables (Job Satisfaction, Person-Job-Fit, Chance Events). The best solution is the one with 5 Classes. Now I would like to analyze if another Variable (Turnover Intentions: toi_t) is associated with each of the Class Memberships (1-5). I tried it with this 3-Step solution: MODEL: %OVERALL% c ON toi_t;
Now the problem is, that the output shows me only Regressions for Class 1-4 with toi_t, but the last one is not available on the output: Categorical Latent Variables C#1 ON TOI_T 0.353 0.926 0.381 0.703 C#2 ON TOI_T -6.769 1.264 -5.354 0.000 C#3 ON TOI_T -2.909 1.071 -2.717 0.007 C#4 ON TOI_T -4.798 1.070 -4.483 0.000
Did I use a wrong syntax? What could work better? Thanks a lot for your help!
Thanks for your answer, that helps me a bit. But with the BCH option I can in the output only see the means and the S.E. of turnover intention within each class. How can I find out if the turnover-variable correlates with each class? I need the p-values of each correlation.
I have cross sectional data and have identified 2 latent classes that make sense conceptually (one with 2 categories and the other with 3). I am interested in including covariates in the model but I also hypothesize that the first latent class variable (c1) predicts the second (c2). I tried a simple analysis including an ON statement with no covariates and got parameter estimates for the regression weights and means, but when I looked at the latent class patterns the class configurations had changed quite a bit and I'm not sure how to interpret the results at this point. I figured it was a good time to make sure I was going about this correctly (before I confuse myself with covariates) and get a better idea of how to interpret the parameter estimates as well as the class configurations. Here is the model statement I used:
MODEL: %OVERALL% C2 ON C1; !C2 has binary indicators and 3 classes; !C1 also has binary indicators and 2 classes ; !I used analysis type=Mixture;
I've spent some time searching the discussion board and my books & notes and have not found examples to follow. Everything with 2 categorical latents that involve predictive relationships seem to be covered in the longitudinal literature as LTA models but my data are cross sectional. Any help is appreciated.
Is there any way to constrain the classes to retain their configuration when adding the regression relationship or do I need to just create a new variable based on membership in the class hypothesized to be the predictor (the latent with 2 classes) and then include that in a separate analysis as a manifest variable predicting the downstream latent class variable (with 3 classes).
You can do it using the 3-step LTA approach discussed in the papers on our website:
Asparouhov, T. & Muthén, B. (2014). Auxiliary variables in mixture modeling: Three-step approaches using Mplus. Structural Equation Modeling: A Multidisciplinary Journal, 21:3, 329-341. The posted version corrects several typos in the published version. An earlier version of this paper was posted as web note 15. Appendices with Mplus scripts are available here.
Nylund-Gibson, K., Grimm, R., Quirk, M., & Furlong, M. (2014): A latent transition mixture model using the three-step specification. Structural Equation Modeling: A Multidisciplinary Journal, 21, 439-454.
Almar Kok posted on Thursday, April 30, 2015 - 7:34 am
For a study on health in older adults I am running a Growth Mixture Model to identify different types of ageing processes. The problem is that respondents were aged 55-85 at baseline, and therefore the variable "baseline age" heavily influences the assignment of individuals into latent classes. As a result, there is a 'young' latent class with high and stable functioning, and other classes are much older (and have worse health...). However, this is not very informative, since I want to assess health trajectories regardless of the age at which individuals entered the study.
It has become clear from this forum and my own experience that regressing C on age, and/or I S on age does not 'adjust' the classification process for age. While some minor changes seem to occur in the classification, results seems only to confirm that age is indeed a strong predictor of class membership. However, in a sense I want to rule out baseline age as a predictor... So, following up on a question previously stated in this discussion, is there some way of "washing out" age differences among classes, and of estimating types of trajectories 'regardless' of baseline age?
Two approaches are possible. You can let age be a variable to capture individually-varying ages using AT. A faster approach is to re-arrange your data into a couple of cohorts corresponding to age categories at baseline; see Table 2 of the paper on our web site:
Muthén, B. & Asparouhov T. (2015). Growth mixture modeling with non-normal distributions. Statistics in Medicine, 34:6, 1041–1058. doi: 10.1002/sim6388
This approach can be analyzed as either a single group with missing data for certain ages in certain cohorts, or as multiple groups using Knownclass.
Almar Kok posted on Friday, May 01, 2015 - 2:42 am
Many thanks for your prompt reply and reference! I have tried the second approach with a single group analysis, restructuring the data to variables that express all observations when respondents were e.g. 55 years old, 58 years old, etc... This works quite well. However, in your opinion, would a potential drawback of this approach be that it mixes up cohort and period effects, since respondents did not all have the same ages at the same point in time?
I'm also interested in the first approach you mention. Could you please explain a little bit more about what you mean by "You can let age be a variable to capture individually-varying ages using AT." E.g. what does AT stand for? I have searched the internet and the Mplus manual but could not find it...
Hi, is it possible to test latent profile membership as a nominal mediator? That is an IV influences latent profile membership and in turn profile membership influences a DV. Both IV and DV are observed, continuous variables and the data is cross-sectional. Are examples that I could adapt to test this?
Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus. Click here to download the paper. Click here to view the Technical appendix that goes with this paper and click here for the Mplus input appendix. Click here to view Mplus inputs, data, and outputs used in this paper.
I have a 3-class mixture model and would like to regress the categorical LV on a number of latent class predictors. These predictors are of various types: binary, nominal (unordered categorical) and continuous. I want to use R3STEP. how can I tell the R3STEP procedure which predictors are binary, which nominal and which continuous?
Nominal predictors should be turned into a set of dummy variables. In regression, covariates can be binary or continuous. In both cases, they are treated as continuous. There is no need to specify their scale.
sfhellman posted on Wednesday, February 03, 2016 - 8:43 pm
I ran a two-class LCA with group assignments (0 = control, 1 = intervention) as a predictor of class membership.
The output includes the following:
Categorical Latent Variables
C#1 ON ASSIGNMENT 0.846 0.374 2.265 0.024
Is it correct to interpret these results as indicating that the log odds ratio of being in class 1 (compared to class 2) is 0.846 higher in the treatment group relative to the control group? Any recommendations for information on how to interpret the output for predictors of class membership is appreciated!
I am doing an LCA and find that one variable, HIV status, influences classification and so should be included it as a predictor variable in an LCA-C. I also have a number of other covariates that I am examining as auxiliary predictor variables using R3step. However, I am interested in the interaction between HIV status and one of these covariates. What is the best way to test for interaction effects when I have one covariate influencing class membership but not the other?
I am conducting an LCA with 4 variables and a three-class model provides the best fit to the data. I would like to determine the influence of covariates on class membership, and I am attempting to use the R3STEP command. I conducted the LCA for the three-class model and saved the cprobs to create a nominal most likely class variable. Using the dataset generated when I saved the cprobs, I ran the LCA with covariates using the following syntax:
THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS -0.614D-17. PROBLEM INVOLVING PARAMETER 4.
However, when I check the Technical 7 output, I cannot find anything unusual with the values for this parameter. Can you please let me know if I conducted the analysis appropriately and how I should proceed given the error message. Thank you!
Send your full output to Support along with your license number.
Daniel Lee posted on Wednesday, January 16, 2019 - 1:58 pm
Hi Dr. Muthen,
I have selected 5 classes from a growth mixture model that spans early to mid emerging adulthood. I would like to see if these classes predict latent profile classes that were estimated during adulthood. Is this possible? If so, would you have any references that provide sample syntax? Thanks so much!