Message/Author 

Anonymous posted on Thursday, November 15, 2001  12:53 pm



I have a six class model, and one of the classes has the lowest number of members (around 15). When I add X(0,1) as a predictor, this smallest class divides up as 13 0's and 2 1's. This seems to be causing a problem, especially since I have other predictors in the model as well. What do I do in this case? And does Mplus have a conditional maximum likelihood method? Thanks. 


Yes, this happens with small classes. You probably have a slope and an intercept for this small class that start going towards extreme values because almost everyone has the same x value in this class. You can fix these coefficients at plus or minus 15 to get estimates for the other parameters. 


I am running a LCA with 8 dichotomous u variables indicating status of utilization of specific mental health services. The best model I came up with has 3 classes. I know want to run a model with a dichotomous covariate (type of service delivery system) predicting the latent categorical variable. I specify the following: MODEL: %OVERALL% c#1 on siteid; c#2 on siteid; %c#1% [incouns2$1*0 meds2$1*1 cm2$1*1 specedu2$1*3 gpfcoun2$1*2 resinpt2$1*2 placmt2$1*3]; %c#2% [incouns2$1*3 meds2$1*2 cm2$1*3 specedu2$1*1 gpfcoun2$1*1 resinpt2$1*1.5 placmt2$1*0]; %c#3% [incouns2$1*3 meds2$1*2 cm2$1*3 specedu2$1*1 gpfcoun2$1*1 resinpt2$1*1.5 placmt2$1*2]; I get a message saying that the model estimation terminated abnormally due to an illconditioned Fisher information matrix. And there is a problem with the gamma(c) matrix starting value for the regression of c#2 on siteid. Do you have any advice on how I should proceed? I have tried specifying different starting values for c#2 on siteid, but to no avail. Thanks for any insight you can provide. 

Anonymous posted on Friday, January 11, 2002  10:59 pm



Most probably the "siteid" variable is constant (or close to constant) in class 2. That makes the slope in "c#2 on siteid" unidentifiable, however by fixing that slope to zero (or any other number for that matter) you should get just as good a model (check the loglikelihood value to make sure). Also if you request Tech 7 in the output command, you can check whether the variance of "siteid" in class 2 is zero. Another possibility is that the slope in "c#2 on siteid" is very large or very small, i.e., it approaches +/ infinity (and therefore it is unidentifiable). In that case fixing the slope to +/ 15 will produce an identified model that is just as good (check the loglikelihood of the two runs). This second possibility corresponds to the case when all subjects with certain "siteid" are classified in the same class. 

Anonymous posted on Friday, January 18, 2002  3:31 am



How is it that this parameter can be fixed to zero or any other number and you can get a good solution for the model? 

Anonymous posted on Saturday, January 19, 2002  1:09 am



Thanks for the question. Now I realize that my message from Jan 11 is incorrect. Sorry about the confusion… When the variance of an X variable is 0 in class C then the slope of U on X in class C is unidentified and can be fixed to zero or any other number and you can get a good solution. This can not occur in the latent class regression (C on X). An empirical nonidentification in the latent class regression usually arises from empty cells in the joint distribution of C and X. Say that C=1,2,3 and X=0,1. If P(C=2,X=1)=0 then the slope C#2 on X is – infinity. If P(C=3,X=1)=0 then the slope C#2 on X is + infinity. The bottom line is that you have an empirical nonidentification. Something in the data makes the c#2 on siteid an unidentifiable slope. Take the ending value from the failed run and fix the slope to that value. This way you will be able to get SE for the rest of the parameters. Tihomir 

Jason Bond posted on Monday, April 28, 2003  2:27 pm



Hello. I have a latent class problem where I have an observed class variable which has 5 classes. So I have created 5 dummy indicators variables Y910  Y914. I also have a continuous predictor variable, BAC. Essentially, what I would like to do is estimate 5 cutpoints, or thresholds (call them T1T4) on the continuous variable BAC such that: if BAC T4, the latent variable takes the value 5 So I estimated the model: VARIABLE: NAMES = BAC Y910 Y911 Y912 Y913 Y914; Classes = C(5); Categorical = Y910 Y911 Y912 Y913 Y914; ANALYSIS: TYPE = Mixture; MODEL: %OVERALL% C#1 on BAC; C#2 on BAC; C#3 on BAC; C#4 on BAC; %C#1% [Y910$1*1.5 Y911$1*0 Y912$1*1 Y913$1*1.5 Y914$1*2]; %C#2% [Y910$1*0 Y911$1*1 Y912$1*0 Y913$1*1 Y914$1*1.5]; %C#3% [Y910$1*1 Y911$1*0 Y912$1*1 Y913$1*0 Y914$1*.5]; %C#4% [Y910$1*1 Y911$1*.5 Y912$1*0 Y913$1*1 Y914$1*5]; %C#5% [Y910$1*1.5 Y911$1*.5 Y912$1*.5 Y913$1*1 Y914$1*1.5]; It seems that what one should look at in the output, to accomplish the above goal, is the latent class regression model part giving the intercept and slope of the regressions of Y910  Y914 on BAC, but I seem to be a little confused about where the estimates of the thresholds T1T4 are obtained. Any sugesstions for my confusion? Also, I seem to be having trouble with the thresholds approaching the extreme values. Should I not be using all 5 category indicators Y910  Y914, or could this just be a starting values problem? Thanks much for any input. Jason 

bmuthen posted on Monday, May 05, 2003  10:23 am



I have a difficult time understanding these questions. To move forward, instead of stating the problem in terms of the Mplus analyses you are contemplating, can you please describe conceptually what it is that you want to do? BAC sounds like blood alcohol concentration. What do the BAC thresholds correspond to? What are the y's? How should the relationship between BAC and y's be viewed? Stating these things very briefly might help me understand. 

D.Gross posted on Tuesday, November 02, 2004  3:49 am



I am trying to determine how many males and females from one nominal covariate (gender) are in one of my two classes. Is there any command I can use to do that in Mplus? Thank you 

bmuthen posted on Tuesday, November 02, 2004  11:47 am



Use Mplus graphics, where you can look at histograms of the analysis variables by class. Clicking on a histogram bar gives you the number of people for it. 

Anonymous posted on Tuesday, May 03, 2005  7:40 am



I am estimating a latent class growth mixture model with 2 classes and 2 predictors of the latent slope and latent intercept. While I assume the predictors to have a different impact on eta depending on class, I have strong reason to believe they are mutually uncorrelated (throughout all classes). By default, however, Mplus treats the predictors as correlated. The (default) model converges without any problems, however, after adding p1 WITH p2@0; in the %overall% model statement I get the message that "This latent class regression requires numerical integration"  why? Thank you very much! 


When you mention p1 and p2, they are no longer considered to be independent variables. They are brought into the model and numerical integration is required. Add ALGORITHM=INTEGRATION; to the ANALYSIS command. 

Anonymous posted on Tuesday, May 03, 2005  11:39 am



thank you for your extremely fast reply! Unfortunately, I still do not understand (sorry): All I want to do is fix a parameter to zero (independent of class membership). This should make the model more restrictive (are not the models even nested?) and I am having difficulties understanding why I have to change the estimation algorithm when the more general model is estimated without problems. Can you hint me to any references on this? Thank you so much  this board is so helpful!!! 

bmuthen posted on Tuesday, May 03, 2005  12:09 pm



Does your model specify that the 2 predictors influence not only the 2 growth factors (intercept and slope) but also the latent class variable? 

Anonymous posted on Thursday, June 09, 2005  1:20 pm



Hello. I am estimating a latent class growth model and would like to get the predicted values for the outcome variable for each individual. So, for example, if the model places the ith person in class j, could I get a N by T (where N is the number of respondents and T is the number of time points available) dataset where for the ith row, the predicted values are those for class j? Thanks much, Jason 

bmuthen posted on Friday, June 10, 2005  6:13 am



You can get a plot of these values using the PLOT command, requesting individual estimated values. When you are in the graphics module you can use the Save graph data menu function to save the individual values used in the plot. 


I have a conceptual question for LCA with covariates. I am trying to define subgroups of drug users based on the types drugs they usually use (y1y14). I would like to evaluate several covariates and then based on the classification of cases compare groups on certain external validators (indicators of risk, drug side effects, etc.). As I understand the use of covariates, they should be background variables (age, race/ethnicity, income, etc.). My question is, are covariates in a sense "washing out" the effects of the covariate on latent class membership such that the differences between classes on a given drug use (y1y14) are independent of the given covariate? My second question is how to determine or justify the inclusion/exclusion of covariates into the model statistically. For instance, if I include age as a covariate, but find it does not significantly contribute to class membership, am I justified in dropping it from the model? 

bmuthen posted on Friday, September 23, 2005  8:11 am



Regarding your first question, the covariates do not wash out the differences in covariate means across classes. If a covariate has a strong effect on class membership then it will have different means in the classes. Strategies for the inclusion/exclusion of covariates would follow the same lines of arguments as in conventional regression analysis and I am not sure there is concensus even there... 

Alex posted on Wednesday, June 13, 2007  2:04 pm



Greetings, In estimating a mixture model with covariate (time invariant, i.e. predictors) would you: (1) estimate the best fitting model without covariates (i.e. estimate a 1, 2, 3, 4, 5, ... models and decide on the best one) and then include covariates only in the best fitting model; OR (2) Estimate all models with the covariates before choosing the best one; OR (3) do something else ? Thank you 


The strategy I would use is to include covariates only after the number of classes has been determined. If adding the covariates changes the class structure, this might point to the need for direct effects from the covariates to the latent class indicator variables. These direct effects represent measurement noninvariance. If they are needed, then the process may need to be done again taking this into account. 

Alex posted on Thursday, June 14, 2007  7:39 am



Thank you very much, Would you allow these direct effects to vary across classes ? 


A model with direct effects that vary across classes may be difficult to estimate depending on the data. 


In a growth mixture model, is it possible to determine to what extent the predictors can predict membership in the classes? For instance, I have a 4 trajectory class solution for development of alcohol symptoms across age 15 to 45 and I want to know how well I can predict membership in one of these classes based on predictors at age 15. I'm hoping for something like percent correctly classified. Thanks, Jennie 


Percent correctly classified is used in logistic regression where the categorical dependent variable is observed. Here it is not observed so we don't know what the true status is. I haven't seen a classification approach done in this case. One approach one can consider is what we do in our Mplus teaching on growth mixture modeling using the reading data example. We estimate the full model with the outcomes for all time points. Then we fix parameters at those values and use the model as a measurement instrument, that is, only estimate the posterior probabilities for each class and individual. Here, you can consider less than the full information. For example, you can tell the program that you have no data on the outcomes by setting those at the missing data flag. Thereby you would use as observed variables only the covariates predicting class membership. You can then crosstabulate most likely class membership when using only covariate information with that using full information. This tells you if you do well or not using only the covariate information in terms of specificity and sensitivity. 

Sarah Dauber posted on Tuesday, September 02, 2008  6:11 pm



Hello, I am running a 4class GMM model with covariates. I am trying to determine whether the covariates significantly distinguish among the classes. I know I have to regress the latent class variable (c) on the covariates, but how do I set a particular class to be the reference group? I would like to look at all possible comparisons (ie vary the reference group so all 4 groups are compared with each other). How do I do this? Thank you, Sarah Dauber 


We give all possible comparisons. You don't have to do anything. 

Sarah Dauber posted on Tuesday, September 02, 2008  6:29 pm



The output I got did not give all possible comparisons. Maybe I am specifying something wrong in the input? I have pasted the model part of my input below. Thanks, Sarah analysis: type=mixture missing; algorithm=integration; integration=montecarlo; starts=100 10; process=2; Model: %overall% i s q  aodpda7@0 aodpda8@1 aodpda9@2 aodpda10@3 aodpda11@4 aodpda12@5 aodpda13@6 aodpda14@7 aodpda15@8 aodpda16@9 aodpda17@10 aodpda18@11; sq@0; i s q on newage d4revx bleduc gender; c#1 on newage d4revx bleduc gender; c#2 on newage d4revx bleduc gender; c#3 on newage d4revx bleduc gender; %c#1% i; i on newage d4revx bleduc gender; %c#2% i; i on newage d4revx bleduc gender; %c#3% i; i on newage d4revx bleduc gender; %c#4% i; i on newage d4revx bleduc gender; output: tech4 tech11; 


Please send your input, data, output, and license number to support@statmodel.com. 


I am using Mplus 5.1 to conduct latent class analyses. The new output is not giving me all of the comparisons for the multinomial regression analyses. Do I need to add something to the code to get this? Thanks! 


Please send input, output, data, and license number to support@statmodel.com 


I am using Mplus 5.21 to conduct LCA using longitudinal data. I am new to LCA and I have three conceptual questions. I am interested in exploring group membership based on responses to five categorical items concerning sexual behavior collected at Time 7. 1. I have a number of antecedent variables (e.g., gender, race, early family experience, pubertal timing collected at Time 1 and Time 2) that I believe are causally related to group membership at Time 7. I am planning to include these in the model as covariates. However, I also have a number of concurrent variables (e.g., romantic attachment, attitudes about the opposite sex at Time 7) that I think also relate to group membership. After reading about the auxiliary variable function, I was planning to use these concurrent variables as auxiliary variables (as opposed to covariates). Would this be the correct approach or should I include all variables as covariates? 2. I assume the correct steps would be to first define the number of groups, then run a model including covariates. Should I include aux variables at each step or just the final step? 3. Is M plus capable of running a multiple group LCA for males and females? How is this distinct from just adding gender as a covariate? Much thanks for any light that you can shed on these issues! 


1. Your use of aux seems preferable if you think class membership is not determined by the concurrent variables. 2. Yes. It doesn't matter. 3. Yes. With mixture modeling of categorical items you can accomplish the same using either approach. 


I am trying to specify a growth mixture model with a latent predictor (as opposed to an observed predictor). I am interested in seeing how this latent predictor is associated with class membership. I specified the covariances between the latent predictor and the growth parameters (intercept, slope and quadratic) to be zero. I noticed in the output that when I include this latent variable as a predictor, I get class specific intercepts for the observed indicators of the latent variable, and I get class specific variances for the latent predictor and class specific residual variances for the observed indicators of the latent predictor, which are all by default constrained to be equal across classes. However, when I run a similar model with an observed predictor, there are no class specific estimates provided for that predictor (i.e., class specific intercept or variance) shown in the output. Why do I get class specific variance estimates for my latent predictor and class specific residual variances for my observed indicators in this model, but not when I only use an observed predictor? Does this approach seem problematic as compared to just using an observed predictor as far as trying to see if this predictor is associated with class membership? 


I am confused  please send the two outputs: Observed and latent predictor. 


Hello, I have a general question. I was wondering if it was possible to fit a model that was a combination of the following user manual examples: 7.17: CFA Mixture Model 7.19: SEM with a categorical latent variable regressed on a continuous latent variable 7.20: Structural Equation Mixture Modeling Let's say I have continuous latent variables, f1 and f2...as well as a categorical latent variable, C. Is it possible to fit a model with the following paths simultaneously: f1 > C > f2 and f1 > f2 I am trying to find examples of this but I haven't had much luck. I wasn't sure if such a model would be appropriate. 


The f1 >c model is discussed in our Topic 5 short course  see the handout's ending section Structural Equation Mixture Modeling. c>f2 is standard, although you don't say f2 ON c, but as the default f2 means change as a function of the c classes. So you can put it all together in one model. 

Mary H. posted on Friday, February 24, 2012  2:16 pm



Hello, I am running a latent class analysis with a sample of 204 people. Results show that a 2 class model fits the data the best. However, 198 people are in Class 1 and 6 are in Class 2. Is it still possible/appropriate to examine covariates of membership in these clusters (i.e., if higher income people are more likely to be in one cluster or another)? Thank you 


It's too small of a class to draw inferences related to that class. 

Mary H. posted on Sunday, February 26, 2012  12:30 pm



Thank you for your previous response. I have a followup question. Below is some of the the model fit information I received for a 1class and a 2class solution. When I examine a 3class solution, the model fit indices are not any better and I get a notification about a nonpositive definite matrix. The classes for the 3class model are even less meaningful (class 1=198 people, class 2=1 person, and class 3=5 people). Is it correct for me to interpret from all of this that a one cluster solution is the best? 1 class model TESTS OF MODEL FIT Loglikelihood H0 Value 646.770 H0 Scaling Correction Factor 2.736 for MLR Information Criteria Number of Free Parameters 6 Akaike (AIC) 1305.539 Bayesian (BIC) 1325.448 SampleSize Adjusted BIC 1306.438 (n* = (n + 2) / 24) 2 class model TESTS OF MODEL FIT Loglikelihood H0 Value 454.893 H0 Scaling Correction Factor 2.318 for MLR Information Criteria Number of Free Parameters 10 Akaike (AIC) 929.785 Bayesian (BIC) 962.966 SampleSize Adjusted BIC 931.283 (n* = (n + 2) / 24) 


I would remove the 6 observations. They may be outliers. Run the analysis without them. 

Mary H. posted on Monday, February 27, 2012  4:24 pm



Thank you! I looked at the output file that contains the probability estimates to identify these 6 cases. I removed the cases, and reran the analyses. This time 5 people were placed into class two (with mean estimates on the idicator similar to previous results). Should I remove these people also and try again? Or does this mean that the 1class solution is the best? 


It sounds like you may have removed the wrong observations. I would recheck that. You can run the analysis with the full sample and ask for LOGLIKELIHOOD in the OUTPUT command. This gives you the loglikelihood contribution for each person. This may help isolate the problem. 


Hello, I wanted to follow up on the thread from June 14 2007 about class specific direct effects and the issues with estimating this model dependent on data... what specific data requirements are needed to estimate class specific direct effects? I have estimated a 4 class model with 5 binary indicators and after some investigation it appears that I may have class specific direct effects but I have been having a hard time with this syntax in Mplus  are there any examples that I can follow? Thank you!!! 


If direct effects are mentioned in the overall part of the MODEL command, the regression coefficients are held equal across classes as the default. To relax this constraint, mention the direct effects in the classspecific parts of the MODEL command also. 


Hi Linda, Thanks for your response. I am trying to specify the class specific direct effects but I must be doing something wrong as I keep getting this error message. Model: %Overall% C#1 on gender AS AA HI BI SXON BMI2; C#2 on gender AS AA HI BI SXON BMI2; C#3 on gender AS AA HI BI SXON BMI2; ! trying to see if there are class specific effects! %C#1% D2 on BI; D2 on AS; D4 on HI; %C#2% D3 on SXON; %C#3% D3 on BMI2; D2 on SXON; D4 on SXON; D4 on SXON; %C#4% D4 on BMI2; D4 on GENDER; Output: Tech10 Tech11 patterns residual CINTERVAL Tech7 svalues; *** ERROR The following MODEL statements are ignored: * Statements in Class 1: D2 ON BI D2 ON AS D4 ON HI * Statements in Class 2: D3 ON SXON * Statements in Class 3: D3 ON BMI2 D2 ON SXON D4 ON SXON D4 ON SXON * Statements in Class 4: D4 ON BMI2 D4 ON GENDER *** ERROR One or more MODEL statements were ignored. Note that ON statements must appear in the OVERALL class before they can be modified in classspecific models. Some statements are only supported by ALGORITHM=INTEGRATION. 


You need to have the ON statements in the overall part of the MODEL command if you mention them in the classspecific part. Add them under %OVERALL%. 


I‘m trying to test whether the interaction between two continuous observed variables predicts class membership in a LCA. As I understand, the most appropriate way to test predictors and covariates is to use ‘r3step’ within the auxiliary option. I’m having a problem, however, when I attempt to center the predictors to create the interaction term. I get an error message saying that variables used in the CENTER option need to be used in the MODEL command (as I mentioned I want to use them in the auxiliary option). I would center the variables in the original datafile prior to importing into Mplus, but am using multiple imputation within MPlus, which I believe precludes this as a viable option. I do not want to alter the class structures by putting the interaction term into the model. Is there something I’m missing or is there a better way to test my interaction term given the constraints I’ve mentioned? Thank you for your help! 


Why couldn't you use the set of multiple imputed data sets and create a new set by creating the interaction variable (first subtracting the mean from each of the two variables)? 


Thank you for your prompt reply! We decided to center the variables and compute the interaction term in the original data file, as recommended by multiple sources (http://onlinelibrary.wiley.com/doi/10.1002/sim.4067/abstract, http://missingdata.lshtm.ac.uk/talks/RSS_2010_03_30_Spratt.ppt, http://www.ssc.wisc.edu/sscc/pubs/stata_mi_ex.htm#Interactions). Unfortunately, now we’ve run into a new problem. We noticed that even when interaction terms are not included as predictors, centered variables produce vastly different r3step regression coefficients than noncentered variables. For example, look at the difference between the coefficients for age and the Life Events Checklist when they are centered (first) and when they are not centered (second): C#3 ON AGE 0.075 0.064 1.169 0.243 LECCONT0 0.101 0.066 1.518 0.129 C#3 ON AGE 0.698 0.280 2.488 0.013 LECCONT0 0.439 0.122 3.606 0.000 Why is this happening? 


Please send the noncentered data and your input along with your license number to support@statmodel.com. 


Hello! I am trying to estimate the class membership of the individuals in my analysis (who belongs to which class) but I am not being able to do it. This is the command I am using (what am I missing?): Variable: Names are ENR CONF; Usevariables are ENR CONF; Missing are all (9999); Classes = c(4) ; Analysis: Type = Mixture; Starts = 40 2; Output: samp Stand Tech11; Plot: Type = Plot3 ; Thanks! Pedro 


See the CPROBABILITIES option of the SAVEDATA command. 


Hello, I am running an LTA (4 classes over two time points with covariates (dichotomous). In the LTA without covariates it is apparent that some of the transition probabilities between classes are extremely small (0.001 or 0). I understand I need to impose parameter restrictions to set those transitions equal to 0. However, I am not confident in how I constructed the code for this: There is almost zero probability of transitioning from class 3 to class 4 or class 3 to class 2. Would I incorporate statements like “ c1#3 on c2#4@0”? This is what I have currently with no restrictions. Model: %Overall% c1#1 on x1x6 ; c1#2 on x1x6 ; c1#3 on x1x6 ; c2#1c2#3 on c1#1c1#3 x1x6; MODEL C1: %C1#1% [ rmob$1] (1); [ rself$1 ] (2); [ rusual$1 ] (3); [ rpain$1] (4); [ ranxiet$1] (5); ……. And so on for MODEL C2 Thank you! 


You don't need to fix these parameters to zero unless you want to do so for theoretical reasons. Just leave them as they are. See the August 2012 Utrecht course handout and video on the website for further information. 

Back to top 