If you are getting different solutions, then you are hitting a local maxima. The best solution is the one with the highest loglikelihood. It is important, however, in mixture modeling, to confirm that your solution is not just a local solution by using different starting values to obtain the same solution.
Anonymous posted on Friday, July 21, 2000 - 6:03 am
What is the best way to choose starting values? I first hypothesized starting values for a simple correlation model with differing means, cov's and variances, and when that failed, I did a cluster analysis which broke the data set into two virtually equal groups. I took the estimates from these two clusters for the means, cov's and variances as the starting values, and ran the model. Even though the two groups were equally balanced, in the mixture model, one ran to 0 in the third step and the model failed again. I have tested this model, and others on various data sets, and continue to have the same problem.
Mixture modeling can be difficult when many or all parameters are allowed to vary across the latent classes. A zero class is often a result of too much model flexibility. A useful analysis strategy is to start with a more restrictive model, such as allowing only the means to vary across the latent classes, perhaps also letting the within-class covariance matrix be diagonal as in latent profile analysis. Once a solution has been found for this model, other parameters can be allowed to vary across the classes, one step at a time checking how the likelihood improves. For example, after the model with free means is successful, covariances can be freed across classes and then variances using starting values from previous runs.
Based on two variables- X1 and X2, a researcher hypothesized three clusters. I wanted to test the three cluster assumption by estimating a mixture model with 2 observed variables, X1 and X2. I chose the starting values based on a scatterplot of the two variables. I have three questions: first, why is it taking about 10-13 hours to find a solution (N=5000+). Second, the probabilities of being in certain classes appears to be disproportionate to the number of cases that, based on the scatterplot, should be associated with a certain class. Am I doing something incorrectly. Similarly, while reading the scatterplot seems to suggest five classes, I cannot get a model with 5 classes to converge. I have attached my input.
TITLE: DATA: FILE IS "C:\WINDOWS\Desktop\MPLUS\DAT_Files\Mixture.dat"; VARIABLE: NAMES ARE X1 X2; MISSING ARE ALL (9999); CLASSES = c(5);
One question is if for each latent class you want (1) uncorrelated x variables or (2) correlated x variables. I think the way you are setting it up gives model type (1), which is the Latent Profile Analysis type of model. Alternatively, you may search for latent classes where the x's are correlated - the Mplus web site example mix8-mix11 and the reference to the Everitt book are useful for such models. It is well known in the statistical literature on finite mixture modeling that model type (2) can give difficulties. Model type (1) is typically easier, at least if you have class-invariant variances as your input specifies.
Convergence can be slowed down for several reasons. First, it is useful to put your variables on a similar scale with variances in the 1-10 range. Second, you have increased the default number of iterations to 4000 in four instances and I would suggest using the defaults except for MITERATIONS, which you need to increase to get convergence.
Once you have a solution, you may want to relax the specification of equal variances across the latent classes - this can have a large effect on the classification of individuals.
I have one point of clarification on the answer above related to the time it took to converge. In mixture modeling, convergence is evaluated differently than in regular SEM modeling. In Mplus Version 1, the number of iterations for the mixture part of the model is not the maximum number but the actual number of iterations. So by choosing 4000 for MITERATIONS, MCITERATIONS, and MUITERATIONS instead of using the defaults, the time to converge is increased dramatically. As mentioned above, use the defaults unless a message is received to do otherwise.
Anonymous posted on Friday, June 15, 2001 - 9:56 am
This question is regarding identification in a mixture model. It appears from the text of the manual that standard errors won't be computed if the model is not identified and an error message will appear. However, I have a mixture model that ran with 42 parameters (14 per class with 3 classes) and only 34 var/covar in the sample data. Is it possible to not be identified and get a solution (albeit inappropriate)?
bmuthen posted on Friday, June 15, 2001 - 10:12 am
Identification in mixture models is not the same as for conventional covariance structure models. For the latter you consider the covariance matrix as the sufficient statistics because you are in a normal-theory framework. For the former, however, you have no sufficient statistics less than the raw data because you do not assume normality. You only assume conditional normality given covariates within each class, which can give rise to very non-normal outcomes. With 34 (36?) var-cov elements it sounds as if you have 8 outcomes which often could support that many parameters, depending on the model. I would say that the invertability of the information matrix is a rather trustworthy index of local model identifiability.
Anonymous posted on Friday, June 15, 2001 - 11:51 am
The ultimate model I want to test is a five-class model with 7 measured outcomes (28 var/cov not 34-oops) and 74 parameters [14 per class + 4 from ALPHA(C)] which seems to be unrealistic. It appears you are suggesting that if I get a solution (e.g., the matrix is invertable) then I can trust the results (this is a latent profile model with seven continuous measures).
Fourteen parameters per class does seem to be an overly unrestriced model that may not be identifed. If it does converge and the information matrix is invertible, then your model is most likely identified. It sounds like you have a latent profile model with 7 means and 7 variances varying across classes. Typically having variances varying across classes in LPA is difficult. You can start with a more restricted model where the variances are held equal across classes. Then take the solution as starting values for a model with class-varying variances.
Anonymous posted on Tuesday, June 26, 2001 - 8:36 am
What does it mean (as stated in the output) that I have reached a saddle point?
bmuthen posted on Tuesday, June 26, 2001 - 9:28 am
A saddle point is not a true maximum of the likelihood. Although the first-order derivatives are all zero as they should be for a maximum, not all second-order derivatives are negative as they should be. Saddle points occur for some model-data combinations and reflect the fact that the likelihood is not easy to maximize. Faced with this outcome, new starting values should be given.
Anonymous posted on Thursday, February 07, 2002 - 6:44 am
I can estimate a one-class, and a three-class, but not a two-class model using the same indicators?
Using the same indicator variables (but different starting values), I have been able to estimate a one-class model and a three-class model. The good news is that the three-class model is vastly better than the one-class on all possible indicators of “betterness”. The bad news is that try as I might, I cannot get a two class model to converge.
Following from your manual, it appears that Mplus is able to execute the first three steps of the EM algorithm, but gets stuck on inverting the Fisher information matrix to create standard errors.
As you likely already know, the output I receive provides the estimates for the two-class model, but not standard errors or fit statistics. The estimates are very much in concurrence with my theoretical expectations, and follow logically as a middle point between the estimates from the one and the three class models.
I still think my three-class model is that best, but without fit statistics for the two-class model it’s a really an issue of asking my audience to “trust me” – not a comfortable argument to make.
My questions are this: 1. What could be behind this situation where I can fit a one and a three-class model, but not a two-class model? 2. Can you suggest any tricks to unstick a stuck Fisher information matrix (believe me I have tried all sorts of starting values)?
It would probably be best if you send your input and data to firstname.lastname@example.org and I can take a look at it.
Jason Hay posted on Wednesday, February 27, 2002 - 9:44 pm
I am looking at using this program for simulations in finite mixture models. At the moment I am testing the demo version. I am curious as to how you specify the program to analyse data that is a mixture of three gaussians. My data set is in two columns one categorical and the other a value.
bmuthen posted on Thursday, February 28, 2002 - 7:16 am
The User's Guide gives several examples of how to do simulations with mixtures. It sounds like you want to work with a 3-class model, and that you want to consider 2 variables. It sounds like you want one variable that is categorical and the other normal, but with one categorical variable I don't understand how the gaussian aspect comes in. Also, I don't understand what you mean by saying "my data set", since data are generated in the simulations. Perhaps you want to clarify.
Anonymous posted on Wednesday, May 01, 2002 - 12:59 pm
I am fitting a LPA model with four continuous indicators, with both means and variances class specific. I get a reasonable solution, but the modification indices list class specific covariances with 999.000 as the modification index and 0.000 as the expected change indices. What does this indicate?
The 999.000 indicates that the modification index could not be computed.
Anonymous posted on Tuesday, May 07, 2002 - 12:33 pm
I would like to evaluate a mixture model solution using various staring values to determine if the solution is not just a local solution. What should I be looking for across solutions using different starting values- normal termination of model estimation, fit statistics, or more?
You should be looking at TECH8 output. Things to check are the following:
1. The loglikelihood should increase smoothly and reach a stable maximum 2. The absolute and relative change should go to zero - fluctuations may indicate multiple solutions 3. Class counts should remain stable
If convergence is obtained for more than one set of starting values, compare the loglikelihood values and select solution with the largest loglikelihood.
David Rein posted on Monday, November 25, 2002 - 2:12 pm
Hi, this question regards a difference in my conclusions found when I use my modeling data, and the data I am holding out for testing.
My modeling sample if much bigger than my testing sample (750,000 to 86,000), and the issue occurs with the smallest class. In my modeling sample, both my BIC and entropy statistics, and my interpretation of the groups lead me to conclude that there is a fourth class of patients, but that it is very small (about 0.4% of the total sample) which is equal to a large number of about 3000 people.
In my holdout sample, I have set all my coefficients equal to those found in my modeling data. I see a very large drop in BIC when going from 1 class to 2, then a large drop in BIC going from two classes to three, and then a large INCREASE in BIC when adding a fourth class.
Each model produces the same proportional results. Class four in the testing sample is also 0.4% of the sample, but this only leads to a class four N of around 300.
So, what's the interpretation here? Am I unjustified in moving out beyond three classes? Is the small proportional size of class four responsible for this turn of events? What can I say about my results if I am unable to reproduce them in the holdout sample?
Incidentally, although the holdout sample was randomly drawn, it differs from the modeling sample in some important ways (thanks SAS!). Can this be my culprit?
bmuthen posted on Monday, November 25, 2002 - 3:21 pm
Is this LCA (categorical outcomes)? How many items? Also, I wonder if the 4th class has the same interpretation in your 2 analyses. Some quick thoughts that come to mind before having heard your responses:
Significant differences between the 2 samples can certainly create the discrepancy in BIC.
If the 4th class is connected with many parameters, it may be that 300 individuals do not provide sufficient stability.
BIC is not necessarily always the best method to choose the number of classes.
BIC works a bit different in different sample sizes because the penalty for many parameters is bigger with bigger samples. This should however work in the direction of having BIC evidence for more classes in the smaller sample (if I am thinking correctly), so that cannot play in here.
David Rein posted on Tuesday, November 26, 2002 - 7:00 am
Thanks for your quick response.
The model is essentially the same basic cluster analysis as the one used in the example by Everitt and Hand (1981), and yes, it is an LCA in that it does not allow the indicators to be correlated.
The latent class variable is indicated by four continuous measures of the same concept.
Because the coefficients in the holdout sample are restricted as equal to the values found in the modeling sample the interpretation is the same. In each case the fourth class is the most severe - experiences the highest values of the four indicators.
I'm going to take back my earlier statements about there being differences between the fit and the unfit sample as on the four variables used for this model there is no difference. (There were some differences in other variables which led to my earlier comment.)
I can cut and past the by class mean and variance values but there's not much there, they show classes in equal proportions in each group, with statistically identical mean values and variances between classes in the fitted and the unfitted sample.
The difference in the two is that the modeling sample allows the classes to be estimated freely, and testing sample restricts estimators found before.
I see two possible conflicting interpretations. 1. Freeing the estimation in the modeling sample finds a distinct and interpretable class. 2. Freeing the estimation in the modeling sample over-parameterizes the estimation, finding a class that isn't really "there", which allows me to impose a ready made interpretation on it.
Is there any way to test which of these options is closer to the big T truth?
bmuthen posted on Tuesday, November 26, 2002 - 9:06 am
Just to recap, your analysis sample had BIC values that pointed to 4 classes (I assume BIC was at a minimum at 4 classes), while in your holdout sample BIC pointed to 3 classes. So, your concern is about this discrepancy, right?
Let me probe a bit further. When you did your holdout sample analysis, you say that you fixed the parameters at the values of the analysis sample. Do you mean all parameters, including the ones describing the class probabilities? If so, there are no free parameters left and the BIC penalty for parameters is zero, which means that here BIC is a function only of the log likelihood. That doesn't seem right because the log likelihood should improve when going from 3 to 4 classes - unless the 4-class solution is a local optimum. You might want to check that.
Note also that Mplus 2.12 allows a new statistical test of k-1 versus k classes using the Lo, Mendell, Rubin LRT - see Tech11. This can give different results than BIC.
David Rein posted on Tuesday, November 26, 2002 - 10:00 am
On point one - yes, the model sample had declining BIC scores, which pointed to four classes, and an improved entropy statistic for four classes compared to three. I should note though that the improvement in BIC was small, but significant using a Bayes Factor test. However, I then was unable to estimate a model with five classes.
In contrast, the results in the holdout sample are nicer in a way, as they the pattern lacks any ambiguity - clearly pointing to a three-class model. It just problematic, as conceptually a four-class model makes a bit more sense than a three-class model (As the fourth class handles the outliers much better than pooling them with class three).
On the second point - Not quite. The proportions in the holdout sample were estimated freely. Only the means and the variances of the indicator variables were restricted to be equal to the values form the model sample.
I will use the Lo, Mendell, Rubin LRT when I get a chance - but is it possible that the answer is just ambiguity. Given a large enough sample, an outlier distribution can be identified, but in most subsamples, this group is too small to be considered its own group?
bmuthen posted on Tuesday, November 26, 2002 - 5:37 pm
It seems to me that a class consisting of 300 individuals would be well estimated since with 4 variables relatively few parameters specific to this class are involved. This assumes of course that this class is well-defined.
You say that you have a latent class model for continuous outcomes (so an LPA). Are you letting the means or also the variances vary across classes?
Another question is how well one should expect this type of cross-validation to work if the model doesn't really fit the data well. The Mplus residual output wrt to means, variances, and covariances and Tech12 output wrt to skewness and kurtosis might be useful to assess the model fit here.
It is a little strange the holdout sample would get a BIC that points to fewer classes (3) than the analysis sample (4) because the BIC penalty for many parameters is smaller in the holdout sample analysis (both the sample size and the number of free parameters are smaller). I wonder what the BIC picture would be if you had all parameters free in the holdout sample analysis (I know that this forfeits the purpose, but) - perhaps just a relatively small change in estimates would take place, but the log likelihood might improve a lot and perhaps BIC would then again point to 4 classes.
Yi-fu Chen posted on Tuesday, March 18, 2003 - 6:33 am
I have a quick question. I have 7 binary variables and want to ran a latent class model. When I did it in Mplus, the following message shown:
IN THE OPTIMIZATION, ONE OR MORE LOGIT THRESHOLDS APPROACHED AND WERE SET AT THE EXTREME VALUES. EXTREME VALUES ARE -15.000 AND 15.000. THE FOLLOWING THRESHOLDS WERE SET AT THESE VALUES: * THRESHOLD 1 OF CLASS INDICATOR IVA4PC FOR CLASS C#4 AT ITERATION 17 * THRESHOLD 1 OF CLASS INDICATOR IVA6PC FOR CLASS C#4 AT ITERATION 49 * THRESHOLD 1 OF CLASS INDICATOR IVA2PC FOR CLASS C#4 AT ITERATION 67 * THRESHOLD 1 OF CLASS INDICATOR IVA1PC FOR CLASS C#2 AT ITERATION 106
THE MODEL ESTIMATION TERMINATED NORMALLY
I still can get all the results, but I wonder if these messages indicate any problem in my model. Can I trust the result?
No, these messages do not indicate any problem in your model. It can actually aid in interpretation of the classes when an item has probability zero or one in a class.
Anonymous posted on Wednesday, April 09, 2003 - 10:47 am
I apologize for the elementary nature of this question, but I am working on an LCA, trying to model a 3- and 4 class solution. I apparently had appropriate starting values for the 2-class solution, but this is the message I get for the 3-class solution:
THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A NON-POSITIVE DEFINITE FISHER INFORMATION MATRIX. CHANGE YOUR MODEL AND/OR STARTING VALUES.
THE MODEL ESTIMATION HAS REACHED A SADDLE POINT, NOT A PROPER SOLUTION. CHANGE YOUR MODEL AND/OR STARTING VALUES. THE CONDITION NUMBER IS -0.838D+00.
THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THIS IS OFTEN DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. CHANGE YOUR MODEL AND/OR STARTING VALUES. PROBLEM INVOLVING PARAMETER 2.
What are some rules when assigning starting values to a 3- and 4-class model? Does the model have to be constrained in this case?
To answer this, I need to know how many latent class indicators you have and what starting values you used. Please send the 2 and 3 class outputs and data to email@example.com.
Anonymous posted on Saturday, July 12, 2003 - 4:21 pm
I'm in the process of developing a SEM which posits a latent categorical variable, LV, as a mediating variable between a set of covariates, X, and an outcome variable, Y.
I proceeded by developing the LC measurement model first and find that I get very similar LogL and L-R Chi-Square values for the ML and MLF estimators, but very different values for the MLR estimator.
My questions are therefore:
Should the variation in LogL for the ML, MLF, and MLR estimators be a matter of concern, and is there a reason (aside from convergence problems) that one would ever *not* want to use the MLR estimator ? Shouldn't the MLR estimator be used for all ML modeling in Mplus ?
In general, do you have a set of a priori considerations you suggest in selecting ML estimators for Mplus ?
Are nested LR-Chi Square tests of fit / restrictions equally valid for all three estimators ?
Finally, when I include LV in the fuller SEM with X and Y, are my results likely to be less sensitive to the choice of estimator than they were when I was modeling LV alone (since more information is included in the full SEM versus the LCMM alone) ?
bmuthen posted on Saturday, July 12, 2003 - 4:51 pm
Since you have a categorical latent variable, you must be doing type=mixture analysis. ML, MLF, and MLR give the same log likelihood values and only affect SEs here. If MLR gives a different log likelihood, something is wrong in the input; please email input, output, and data to firstname.lastname@example.org.
I would recommend the default estimators in all cases since they draw on experiences we have had with simulations. Nested loglikelihood tests are the same for the different mixture estimators because the log likelihood is the same. If in your last question you are asking if including x makes results more sensitive to estimator choice in this mixture modeling example, my answer is I don't know.
Anonymous posted on Friday, September 26, 2003 - 2:00 pm
I encoutered a strange situation, where I got the results of "0-value" estimates in my 3-class mixture model (see output below; all observed variables are dichotomized ones). The model converged at the 797th iteration. Any suggestions? Many thanks!
The reason that you get zeroes in the third class is that you give no threshold starting values for the third class. Instead you give starting values for the second class twice. I think you need to change the second %c#2% to %c#3%.
Anonymous posted on Friday, October 29, 2004 - 3:22 pm
I have a 4 class LCA model with 6 indicators. Three of these were originally measured as 4-point ordinal scales (strongly disagree-strongly agree). I've recoded these indicators to be dichotomous variables. My LCA results are different when I specify these dichotomous indicators to be nominal vs. ordinal. Is there a way to determine which is the correct specification (the model fit for each are about the same, but the class profiles are different).
bmuthen posted on Saturday, October 30, 2004 - 10:10 am
Analyzing dichotomous items by latent class analysis using the nominal or the categorical option should give the same result. The differences you are seeing may be due to local optima - using a higher value for STARTS = will give the same results. If not, please send your inputs, outputs, and data to email@example.com.
Anonymous posted on Thursday, November 18, 2004 - 1:07 pm
I have a generalized growth mixture with two background variables, 4 time points. I ran the model with STARTS 500, 10. The two-class model BIC is 3301 and the three class is 3304. The AIC for the three class is 3244 and three class is 3238, Sample adjusted bic for the 2 class is 3244 and three class is 3238. I know that is not necessarily a big difference, I prefer the three class model, but BIC suggests they are essentially the same or the two class is slightly better. AIC and sample BIC are slightly smaller. Would you have any suggestions about which model is best, I like the three class solution, but I'm not sure I can justify it for publication. What would you say as a reviewer? The LO-MENDELL-RUBIN ADJUSTED LRT TEST = 30 (p = .09). Thank you in advance
bmuthen posted on Thursday, November 18, 2004 - 1:53 pm
Since BIC etc can't clearly distinguish between the models I would go with the one that is easiest to interpret. Also, if you compare the two models you probably have some similarities so that the 3-class model is merely an elaboration of the 2-class model. Also, adding more covariates and perhaps also a distal outcome might help distinguish between the models.
Anonymous posted on Tuesday, March 01, 2005 - 8:03 pm
Here's a foolish question, but I've tried without success to learn the answer so far: What's the interpretation of the categorical latent variable means part of the printout in a mixture model? e.g.: Categorical Latent Variables
Means C#1 1.474 0.136 10.863 C#2 0.224 0.209 1.071 C#3 0.475 0.161 2.942 Thanks for any information on it!
The values are multinomial logits. When they are converted to probabilities, they are the probabilities of class membership. See Calculating probabilities from logistic regression coeffcients in Chapter 13. This conversion would be done using Formula 2 without x's. The a's are the multinomial logits.
I am doing LCA with 8 indicator variables (categorical: 0, 1, 2) and 2 covariates. A 2-class solution seems to best fit the data.
I'd like to know which of my indicator variables were more influential in determining class membership. Ideally, I'd like to be able to create an algorithm (e.g., weights in a general linear model) that will allow me to assign other participants into these same two categories on the basis of the same indicator variables and covariates. I'm not sure what part of the output tells me these weights.
Thanks for any help. Sorry if this is a simplistic question, but I'm a little overwhelmed by the information in the output.
Thank you for your reply. If I understand the thresholds correctly, they represent a value (though I'm not sure what the value is of. Of a latent trait?) that needs to be exceeded for the indicator variable to move to the next category. But both the thresholds and the "Results in Probability Scale" seem to relate to the likelihood of a score on an indicator, given membership in a latent class.
I'd like to go the other way. Given scores on the 8 indicator variables, how would I predict a new individual's membership in a latent class?
I've been looking at the section in Chapter 13 of the User Guide on Calculating Probabilities from Logistic Regression Coefficients. This seems close to what I want to do, but the examples only seem to focus on intercepts and covariates. Is there a way to plug the thresholds into a similar formula?
One approach is to first estimate the model parameters for a certain sample. The second step is then do ananalysis using the same model but holding all parameters fixed at the estimated values from step 1. In this second step your sample may consist of only 1 person (or a set of persons) and that person's data vector is the same as in step 1. - This second step produces the "posterior" class probabilities for all classes for the individual (or individuals), which is what you need in order to classify the person into a class (based on the most likely class). The second step is not straightforward to calculate by hand, but is done quickly in Mplus. Hope that is close to what you had in mind.
Yes, that's helpful, thanks. I was hoping for a simple formula that could be done by hand or reported in a manuscript, but it seems like that may not be possible.
One question about this method. When I ran a subsample of the original sample, using the fixed parameters as you suggested, I was able to replicate the results (with small deviations) from the original run. However, I got the following errors: ------------------------ *** ERROR in Model command Unknown threshold value 2 for variable BORD7 *** ERROR in Model command Unknown threshold value 2 for variable BORD8 *** ERROR in Model command Unknown threshold value 2 for variable BORD7 *** ERROR in Model command Unknown threshold value 2 for variable BORD8 *** ERROR The following MODEL statements are ignored: * Statements in Class 1: [ BORD7$2 ] [ BORD8$2 ] * Statements in Class 2: [ BORD7$2 ] [ BORD8$2 ] ---------------------
I gather that these errors are because I had fixed parameters for 3 levels of BORD7 and BORD8, but the data in my subsample had only 2 levels for these variables. I was able to work around it by including bogus data vectors- is that the best solution?
Thanks once again for your quick replies and your helpful suggestions.
I would simply add the new observations to the original data set. Try that to make sure you get the same results as using bogus data vectors. You probably will.
Anonymous posted on Monday, May 16, 2005 - 6:32 am
Hello! I am attempting a latent profile analysis and wonder if there is a simple way to transform the logits into a scale that reflects the original metric. Otherwise, do you have advice about easing interpretation so that I can determine what my clusters mean? Thanks in advance for your help.
In latent profile analysis, the latent class indicators are usually continuous so logits would not be estimated. Means and variances would be estimated. What is the measurement scale of your latent class indicators?
Anonymous posted on Monday, May 16, 2005 - 7:24 pm
Re: posting Monday, May 16th at 6:32 AM
I suppose I am technically doing a mixture model, as one of my latent classes is composed of continuous indicators, while two are composed of categorical indicators. I am specifically wondering how to translate the logits of the categorical scales so that I can interpret my classes. Any information would be appreciated. Thanks!
Toward the end of Chapter 13, there is a section that shows how to translate logits into probabilities. It uses intercepts. If you change the sign of your thresholds, they become intercepts. See the first example where the covariates values are all zero.
Anonymous posted on Thursday, May 19, 2005 - 10:21 am
Re: posting Monday, May 16th at 6:32 AM
Hi, Again. Thanks for your most recent post. I have been advised to interpret the means provided for each class, not the thresholds. Further, thresholds are given for each manifest variable and I am seeking to interpret my latent variable means for each class. (Does that make any sense? Sorry, I am new to this.) I am attempting to interpret the means for each of my latent variables (1 continuous and 2 categorical) for each of my classes proposed. Is there a way to transform the means back into the original metric so that I can understand what my classes mean in terms of my latent variables? Otherwise, how does one interpret the means to understand what their respective class memberships mean? Thanks!
Please send your output and license number to firstname.lastname@example.org. Refer specifically to the values you are trying to interpret.
Anonymous posted on Friday, May 20, 2005 - 2:57 pm
Thank you for your response. I will send my output and license number on Monday. Unfortunately, I did not check this message until I was home for the weekend. Thanks, again. I look forward to getting your response.
Anonymous posted on Monday, May 23, 2005 - 7:49 am
Hi, again. I have sent my output to email@example.com. I look forward to hearing your response. Thanks, again.
Anonymous posted on Wednesday, July 27, 2005 - 8:47 am
I am testing a cross-sectional mixture model in which there are two DVs and 5 IVs. If I wish to control for an extraneous variable (e.g., size of the company or age of company) that is not of any theoretical interest to my study (simply a control variable), do I just add the variable to the regression in the overall model? Or should I also include these variables in the regression of class membership onto the covariates that are of theoretical interest to my study?
bmuthen posted on Wednesday, July 27, 2005 - 6:37 pm
Anonymous posted on Thursday, August 04, 2005 - 12:46 pm
Hi, I am doing a mixture model, I have 8 latent class indicators, this is the first time i do this type of analysis, what is the technique to do for choosing the starting value? and if i want to do boostrap in my analysis so i must change MLR to ML how this affect my result? thank you for your help
bmuthen posted on Monday, August 08, 2005 - 2:21 pm
No starting values are needed - see the version 3 User's Guide.
MLR is used to get non-normality robust SEs. Bootstrapping might be useful for small samples. I am not sure which approach is best - perhaps a study is needed.
Anonymous posted on Tuesday, August 09, 2005 - 5:46 am
can tell me excatly where can i find that no starting values are needed in the version 3 User's guide, because i have understand that if i want to have a latent variable with 2 or more classes, i must give the threshold starting values . my question is how can i choose this starting value
Anonymous posted on Tuesday, August 09, 2005 - 8:20 am
thank you for your answers, but if i want to specify the starting values instead of random starting value, how can i do this, if i understand, i can begin by the random starting values and use the outputs in order to specify the starting values for the second run, if not, how can i do this.
In Example 7.5, starting values are given and random starts are used. The starting values are the values following the asterisks (*). An asterisk (*) is used to specify a starting value.
To avoid a local solution, use random starts as is done in this example. You can read about random starts by looking up the STARTS option in the Mplus User's Guide.
Anonymous posted on Sunday, August 28, 2005 - 11:03 am
Hello. How are the standard errors of the paramter estimates are obtained im mixture model? I didnt specify bootstrap in the analysis command. Is it the default method to use in Mplus? Can I find more detailed description for bootstrap in EM algorith in any reference paper?
bmuthen posted on Sunday, August 28, 2005 - 12:11 pm
The mixture model uses regular ML-based SEs, where the ML information matrix is estimated by one of three alternatives given in Technical Appendix 8 on the Mplus web site.
ksullivan posted on Wednesday, February 22, 2006 - 10:14 am
I am trying to run a latent class analysis with 6 latent variable indicators. Each indicator has six levels (i.e., 1, 2, 3, 4, 5, 6). I realize that using this many indicators forces the model to estimate a lot of parameters. I can get the model to run with 2 classes but it falls apart with 3 classes. I get the errors where the model did not terminate normally due to an "ILL-CONDITIONED FISHER INFORMATION MATRIX" and "DUE TO A NON-POSITIVE DEFINITE FISHER INFORMATION MATRIX." I also had problems with reaching large thresholds.
I tried increasing the STARTS but I still could not get the 3 class model to run.
I think that I need to set starting values but I am unsure how to set starting values when my indicators have 6 levels. I have the probabilities based on an LCA run with 1 and 2 classes but I am not sure how this translates into setting values for a model with 3 or more classes. Thanks in advance for any help.
Are you using Version 3.13? If not, I would download it and try the analysis again. Also, increase your starts for the two class solution and be sure you are getting a good solution, that is, be sure you replicate the best loglikelihood value at least twice. Having thresholds fixed is not a problem. It helps define the classes. If all else fails, please send your input, data, output,and license number to firstname.lastname@example.org.
In implementing the Lo-Mendell-Rubin Likelihood ratio test, I have seen that you should have the classes ordered such that the largest class is the last class. Does it then follow that the first class should be the smallest and each subsequent class should be ordered in size?
just requesting a clarification on the above point. I thought that the idea with implementation of the Lo-Mendell-Rubin was to have the smallest class as the first class, with the order of the subsequent classes immaterial, seeing as the first class was being deleted for the comparison with k-1 classes...is this correct?
the last two above mentioned messages makes me doubtful: Is it necessary for conducting or interpreting the Lo-Mendell-Rubin Likelihood two have the last class the largest class? If so, how to tell M+ that the last class should be the largest?
Unfortunately, this changed the whole model; it is no longer comparable with the model without reordered classes. What did I wrong? And what happens with the bootstrapped Lo-Mendell-Rubin Likelihood ratio when the last class is not the largest one?
Hi, I have two questions: 1. if I run my model with 3 classes, i get the error message: WARNING: THE LATENT VARIABLE COVARIANCE MATRIX (PSI) IN CLASS 2 IS NOT POSITIVE DEFINITE. THIS COULD INDICATE A NEGATIVE VARIANCE/RESIDUAL VARIANCE FOR A LATENT VARIABLE, A CORRELATION GREATER OR EQUAL TO ONE BETWEEN TWO LATENT VARIABLES, OR A LINEAR DEPENDENCY AMONG MORE THAN TWO LATENT VARIABLES. CHECK THE TECH4 OUTPUT FOR MORE INFORMATION. PROBLEM INVOLVING VARIABLE ANG. The latent var ANG in class 2 has a very low residual variance of .004. What can i do? Is this a problem of my model? The 2-class solution ended properly.
2. How can I interpret the intercept terms of the models? Are there any papers I should look into?
Please send your input, data, output, and license number to email@example.com. It is likely that you don't need three classes but I would need more information to say for sure. I can then answer your second question when I see what your model is.
In the estimation of mixture models, MOC (from R) provides ICL-BIC (A BIC corrected for entropy) using the formula BIC + 2*entropy.
I have two questions: (1) Do you plan on implementing it in Mplus ? (2) How would you calculate it by hand from Mplus given the fact that the entropy is not computed in the same way (is MOC, we seek to minimize entropy and in Mplus to maximize it if i'm right) ?
Thanks for sending the information. We have read about ICL-BIC in the McLachlan and Peel book on mixture modeling. It appears to have performed well in the limited simulations that were carried out. We will study it further and see how it performs and include it if it looks promising. One potential objection is that it combines the loglikelihood which is a fit statistic with entropy which is a measure of classification quality and not a measure of fit.
You can compute ICL-BIC by hand using information from Mplus. See formula 6.75 on page 217 of the McLachlan and Peel book. This formula has three terms. The sum of terms 1 and 3 are equal to the Mplus BIC shown in formula 6.50 on page 209. Term 2 is shown in formula 6.58 on page 213, EN (tau). That term is the numerator in formula 171 shown in Technical Appendix 8 on the Mplus website with the exception of a sign change. You can solve for that term using formula 171 and change the sign and then use it to compute ICL BIC.
I am trying to run a FMM w/ 4 factors & 6 classes, N is 1388. Factors are measured by 5 or 6 binary variables. I'm using the cluster means from SAS Proc Fastclus as starting values. So far, I have not been able to get it to run using MLR or MLF, even when I reduce the classes to 2. It gives me the following:
*** FATAL ERROR THERE IS NOT ENOUGH MEMORY SPACE TO RUN THE PROGRAM ON THE CURRENT INPUT FILE. THE ANALYSIS REQUIRES 4 DIMENSIONS OF INTEGRATION RESULTING IN A TOTAL OF 0.50625E+05 INTEGRATION POINTS. THIS MAY BE THE CAUSE OF THE MEMORY SHORTAGE. etc.
When I cut the number of subjects down to 100, it does run but says the following:
WARNING: THIS MODEL REQUIRES A LARGE AMOUNT OF MEMORY AND DISK SPACE. IT MAY NEED A SUBSTANTIAL AMOUNT OF TIME TO COMPLETE. REDUCING THE NUMBER OF INTEGRATION POINTS OR USING MONTECARLO INTEGRATION MAY RESOLVE THIS PROBLEM.
(This is still running so I don't know what the output will look like).
It runs when I use only 3 factors with MLF and Cholesky=off (but takes about 20 hours).
Is it possible that a PC with 3 Gigs of RAM is not enough to run this model or is something else wrong? Thanks
First, I would check that a regular (single-class) FA can be estimated. It may not be identified if you have too many free loadings - for example an EFA with 4 factors and 6 variables is not identified.
Second, 4 factors leads to 4 dimensions of integration and with the default number of integration points of 15, you then have 15-to-the-power-of-4 integration points. This then gets multiplied by the number of subjects, leading to the large memory requirement. You can reduce the number of integration points by saying either integration=7 (for example), or integration=montecarlo.
Third, when you allow for several latent classes, you may not need as many dimensions as you otherwise would.
Finally, memory use on your computer can be expanded as mentioned under Systems Requirements on the web site.
J.W. posted on Friday, September 21, 2007 - 1:24 pm
Deciding on number of latent classes in GMM:
Sometimes, Information Criteria and LMR LR test lead to contradictory results. For example, BIC of k-class model is smaller than BIC of (k-1)-class model, while the LMR LR test can¡¯t reject (k-1)-class model. In this case, which model (k-class model or (k-1)-class model) is better?
See the following paper for guidance on determining the number of classes:
Nylund, K.L., Asparouhov, T., & Muthen, B. (2006). Deciding on the number of classes in latent class analysis and growth mixture modeling. A Monte Carlo simulation study. Accepted for publication in Structural Equation Modeling.
About the largest class last for the interpretation of the Lo-Mendell-Rubin LRT (see posts around October 6 2006):
I understand it is desirable to have the largest class last but I am not sure about the order for the other classes.
I ran a LCA with 4 classes with the classes ordered (smallest - largest) and another LCA with only the last class fixed as the largest class (here, the second class turn out to be the smallest). As expected, the p values for the LMR LTR differ. Which value should I report?
When running several models, shouldn't one be consistent throughout and specify the models so that the smallest class is always the first class and the largest class always the last?
The largest class last for the LRT tests is not for purposes of interpretation. It is because the first class is deleted when testing the k versus the k-1 models. The order of the other classes is not an issue.
I could not say which value you should report without seeing the two outputs. It sounds like you fixed the parameters in the last class which is not what is recommended. You can send the outputs and your license number to firstname.lastname@example.org if you want further comments.
I extracted 4 groups in my growth mixture analysis. There is a group which could be clearly assigned as "increasers" looking at the plot of estimated means, but looking at the means of linear an quadratic slope revealed no significance (due to high SE), intercept is significant. Could such a group still be assigned as increases? Are the high SE of the mean estimates may be a sign of misfit?
I have four continuous variables from laboratory measures of anxiety response. I plan to run a LPA on the variables to test a 1 vs. 2 group model of anxiety response. Are there any additional mixture models you would recommend for answering this question? Would a Mixture regression analysis, CFA mixture modeling, or a structural equation mixture modeling be appropriate?
Without knowing your research context, I would say that in addition to LPA a factor mixture model might be explored - for an overview of mixture models, see
Muthén, B. (2008). Latent variable hybrids: Overview of old and new models. In Hancock, G. R., & Samuelsen, K. M. (Eds.), Advances in latent variable mixture models, pp. 1-24. Charlotte, NC: Information Age Publishing, Inc.
I have a question in terms of the conditional independence assumption in LCA. When the latent class indicators are combination of binary, censored, and count variables (data in ex7.11), how can I examine/account for the within-class correlations among the LC indicators?
linda beck posted on Wednesday, July 23, 2008 - 8:24 am
Referring to the Kaplan chapter it is important to include covariates in final mixture solutions. Regarding their effects on growth parameters, is it important to hold these effects on intercepts and slopes equal across classes or should one try to have unequal effects? In an unconditional model I've computed before, only equal growth parameter variances converged.... (I ask this question because in an article it was emphasized to hold effects of covariates on growth parameters equal across classes, unfortunately I can't find it...)
I typically hold covariate effects equal across classes - it is a more stable model - unless the context says that one should expect differences such as when looking for differential intervention effects.
linda beck posted on Friday, July 25, 2008 - 6:41 am
That's also my experience, thank you! unequal effects don't converge.
A related question: should one prefer the covariate solution as final model albeit the BIC is slightly higher as compared to the unconditional model (30 BIC-points). I've found some sig. direct effects on class membership and not considering them would end up in an misspecified model, in my opinion. thank you so far!
I prefer the covariates included in the model when possible. The "analyzing-classifying-analyzing" approach using most likely class membership gives biased estimates and SEs. Note that BIC is not in the same metric with and without covariates, so not comparable. This is because the likelihood scale changes when adding covariates.
linda beck posted on Monday, July 28, 2008 - 6:49 am
I had a deeper look at my model. Since I'm using two-part growth mixture modeling is very complex. I have an Intervention which should also predict the slope of the u-part, together with other covariates. As you said, especially when you have an Intervention one could postulate unequal effects on the slopes. But this model is very complex.
a.) it is possible to hold effects of some covariates on the slopes equal and for others (like intervention) not, in order to get rid of complexity!? I guess no...
b.) I'm quite sure, that the intervention and the other covariates have an effect on the slope. Would it be better to postulate equal effects of the intervention on the slopes as compared to postulate no effects at all, and therefore misspecify the model!?
I think it is reasonable to hold all covariate effects except the intervention equal across classes. With x1 being the intervention dummy covariate you would say:
%overall% ..... su on x1 x2;
su on x1;
su on x1;
linda beck posted on Monday, July 28, 2008 - 9:56 am
Thank's a lot for this advice. I will try this. It takes a long time to compute the iu and su variances in a two part mixture model (2 Dimensions of Integration). I think about reducing the number of integration points (15 is default). Which value would be reliable enough, 10? What else can one do to increase the speed?
The first thing to do is to add in the Analysis command:
Process = 4(starts);
if you have 4 processors. Say 2 if you have 2. This distributes the mixture starts over the processors and speed is much improved.
Often it is sufficient to use
integ = 7;
linda beck posted on Tuesday, July 29, 2008 - 2:06 am
ok, thanks... unfortunately this distribution of mixture starts is not available for the second run of starting values (i think this is needed for LMR)... would be cool to implement in the next version of mplus!
linda beck posted on Tuesday, July 29, 2008 - 3:41 am
Before I do the final step - adding covariates into two-part mixture - I have a last question (hopefully). In my unconditional model, concerning the variances only the intercepts and linear slopes become significant. In the conditional mixture models should I also allow for effects on the quadratic slopes, albeit their variance is not significant in the unconditional mixture models (or not estimatable)? Would that may be too complex?
BTW: I get the impression, that in mixture modeling effects of covariates on growth parameters are easier to compute than variances in unconditional models, do you know why!?
On your first post, I am not sure I know what you mean by the distribution of mixture starts not being available for "the second run of starting values".
On your second post, yes you can regress a growth factor on covariates even if it did not have a significant variance in the unconditional run. Using the additional information from the covariate's relationships to the outcomes may uncover variation in the growth factor. It is also ok if the residual variance is zero.
linda beck posted on Thursday, July 31, 2008 - 8:43 am
first post: I' m referring to the technical 8 output "windows system32" window. when I use LMRT there is a "first" run of starting values (distributed over processers) and then the final stage optimizations (also distributed). After this, mplus should give me an output, but when using LMRT-tech11 (may be this is the cause, I don't really know) there is a "second" run of both procedures (starting values and optimization). This run is not distributed over processors. After that (long time), mplus gives me an output.
When runs are not distributed across the processors, it may be that there is not enough RAM. We have seen this.
linda beck posted on Friday, August 08, 2008 - 7:13 am
sorry, for bothering you again. But, as I'm writing down my findings on two part mixture with covariates, I noticed that it is not longer possible to get the estimated percentages of belonging to one category in the binary part (I usually derived these in unconditional models via "univariate distribution fit of classes" in the "residuals" option).
This might be due to predicting the binary growth parameters with my covariates, (for the continuous part this is not a problem, I get the means for plots). The warning message is: "Residuals are not available for models with covariates..."
Is there any other option to get these estimates? I find them very useful to plot. If not, I will take the estimated percentages from the unconditional model as an approximation.
I am new to GMM and would like to use it to model the interaction of a continuous covariate x treatment group on longitudinal alcohol outcomes following an intervention. In the case the interaction is "disordinal," so to speak, I would like to find out the point --what in linear regression would be the "cross-over interaction" -- that specifies at which approximate value of the covariate one intervention becomes more efficacious than the other in decreasing at-risk drinking.
I would then like to test this "cutpoint" by classifying new cases according to it, running an new, additional GMM on the new cases and comparing their classifications.
Is this remotely possible? I have done things like his before (e.g., with Fisher's coefficients from discriminant analysis), and I see you alluded to this kind of application in Muthen & Muthen (2004) on pg 359, but I wasn't sure exactly how to do this in a GMM context.
Thanks so much for any tips you can give me! --Susan
Regarding your first paragraph, in a single-class setting, my thinking is that one would estimate the model and based on the estimated coefficients one can derive the values of the covariate (the cutpoint) that you are interested in. With GMM, you have class-specific intervention effects so this exercise needs to be done for each class.
Regarding your second and third paragraph, it sounds like you are referring to my 2004 Kaplan book chapter where I discuss early classification into a problematic class - that is, based on early outcomes. Perhaps you are thinking of using the covariate and treatment group information only (with the model estimates for its coefficients) to classify individuals - which is certainly possible and has been written about - but I don't see offhand how the cutpoint comes in here.
Thanks so much! That's exactly the info i was hoping for.
Do you have any examples you have done (and published) using the early classification into a problematic class? I work best from examples, and I was having trouble searching for that in the literature. Thanks again!
Thanks for the hint about the handouts. They were helpful. I would still be interested in any papers/authors you might be able to suggest if you get a chance.
I did have another question: I would like to run a GMM with two dummy-coded known-class variables (i have three treatment groups)...I haven't seen any examples that include two dummy-coded categorical known class variables and just wanted to check whether this is doable.
The one person I know who is working on this is not ready to share the paper. See the following paper on the website:
Boscardin, C., Muthén, B., Francis, D. & Baker, E. (2008). Early identification of reading difficulties using heterogeneous developmental trajectories. Journal of Educational Psychology, 100, 192-208.
Muthén, B., Khoo, S.T., Francis, D. & Kim Boscardin, C. (2003). Analysis of reading skills development from Kindergarten through first grade: An application of growth mixture modeling to sequential processes. Multilevel Modeling: Methodological Advances, Issues, and Applications (in press). S.R. Reise & N. Duan (Eds). Mahaw, NJ: Lawrence Erlbaum Associates, pp.71-89.
With the KNOWNCLASS option, you can specify three groups. It is not necessary to create two dummy variables.
linda beck posted on Monday, September 29, 2008 - 5:41 am
As posted some time ago above, I have a two-part mixture model with an intervention as covariate in the model. I found two groups, and the intervention has some effects on the slopes in one group. All works fine, but the journal wants effect sizes.
a. is there a way to get an effect size for the effect of the treatment on the slope in the continuous and the binary part?
b. I have an idea. I thought of using cohen d and getting the means of both trajectories (to get effect sizes for the cont-part). then comparing the means between the dummy coded treatment following cohen's approach. But this seems to be difficult, since my treatment is part of the whole two-part mixture model (it predicts group-membership and slopes). Is there a way to get the means of both trajectories of the cont-part in two part mixture modeling sorted by a dummy coded group which is part of the model? Do you see any alternatives? For the binary part I have no idea to get effect sizes at all...
a. Effect size has to do with a dependent variable's mean differences across groups divided by the standard deviation. One way is to consider the dependent variable to be the slope but that is not very down to earth. Instead, one would probably want to look at the outcome at some particularly imnportant time point. This information is given by estimated means per class in the RESIDUAL or Tech7 output.
b. You seem to say that your treatment dummy covariate influences not only the slope but also the "group-membership". By group membership I wonder if you mean the latent class variable. If so, do you really think that treatment changes class membership? In our own work we have typically not taken that approach, but have focused on latent class as being pre-existing (before treatment started) with treatment effect being on the slope within latent class.
For the binary part, I don't know that effect size makes sense. Perhaps one can translate the effect into an odds ratio for treatment-control related to binary outcome 0-1.
linda beck posted on Tuesday, September 30, 2008 - 8:49 am
a. o.k., these are the means of the classes in general. But to compute effect sizes I need the means of the classes sorted by treatment vs. control. How can I get these means in mixture modeling? Or did I missunderstand you!? sorry...
b. I found a significant effect of treatment on group membership and I think this is very reasonable. Otherwise, it would be bad news for prevention science since it is one goal of (primary) prevention programs to deflect from bad developmental pathways concerning substance use, aggression and so on. If they would be fixed in stone one could only minimalize the damage (only effects on the slope in bad trajectories).
a. I think you get the treatment- and class-specific means using the plot part of Mplus via the "adjusted means" option. Otherwise, you can compute them from the parameter estimates.
b. It is certainly a possible modeling approach, but you have to be careful that the class membership doesn't influence the parameters of outcomes before the intervention starts, because in that case it doesn't make sense that the intervention influences class membership (which in turn influences something before the intervention). For instance, if your growth model included the pre-intervention time point and you center so that the intercept represents the systematic part of the outcome at that time point, the intercept growth factor mean should not be allowed to vary across the latent classes.
Another approach is a latent transition model, where you have a latent class variable before the intervention and another one after. The one after can have as "indicators" the growth factors of a process that starts after the intervention.
linda beck posted on Thursday, October 02, 2008 - 1:23 am
b. I modeled invariant intercept means between classes in my two part mixture model. I set: [iu] (1); [iy] (2);
in both class statements. But the model didn't converge due to a non positive fisher matrix and no computable standard errors. The means/intercepts of 'iy' were set equal in the output, but I think 'iu' was the problem, there were only asterisks. Do I have to add a command, may be aiming at the thresholds or something like that?
thanks, linda beck
linda beck posted on Thursday, October 02, 2008 - 7:51 am
addition: I think the problem with equal 'iu' across both classes has something to do with the need of mplus to constrain at least one 'iu' to zero. Am I right? I have nor more ideas.
linda beck posted on Thursday, October 02, 2008 - 8:14 am
sorry, I think I found the right way to test mean equality in both trajectories. set 'iu' to 0 in both classes, and 'iy' equal, right? :-) sorry, once again
For the growth model for the categorical outcome, the comparison is setting the mean of iu to zero at all timepoints versus zero at one timepoint and free at the other timepoints. For the growth model for the continuous outcome iy, the comparison is the means of iy free at all timepoints versus holding the means equal across time.
linda beck posted on Thursday, October 02, 2008 - 10:15 am
linda beck posted on Tuesday, October 21, 2008 - 7:49 am
back to bengt's answer from october 01, 2008, 08.19h. Unfortunately the 'adjusted means'-option is not available when estimating the entire two-part model. How can I compute the treatment-class-specific means from parameter estimates as you said!?
Please see the Olsen-Schafer JASA article referred to in the UG. If it looks too complex, a local statistician should be able to help with that.
linda beck posted on Monday, January 12, 2009 - 9:03 am
I did not find anyone who could help me with computing class specific estimated means of two-part mixture separated by treatment (see last posts). Currently I'm thinking of simply splitting the sample by treatment/control and estimating the final "two class" two-part mixture for both these conditions, to get the class specific estimated means for the continuous part separated by treatment condition.
a.) Would that be sufficient to get an idea of class specific means (separated by treatment) of the original model which utilized the entire sample?
linda beck posted on Tuesday, January 13, 2009 - 8:26 am
sorry for bothering you again, but besides the problem above, there is only one problem left, that reviewers wanted to be solved.
I want to control the effects of treatment on slope for initial level by predicting the slope with the intercept (muthen,curran). I'm using two part growth mixture (randomized prevention) and unfortunately there are some efects of the treatment on pre-intervention status (intercept).
My problem: b. I don't want to use the original muthen and curran approach (I want to skip the "multiple group" or "known class"-part). I only want to predict the slope with treatment within the entire sample controlling for effects of the intercepts on the slopes. Is muthen and curran (predicting slope with intercept) the right approach for achieving that aim? c. I lose the covariance between both intercepts of the y- and u-part, when I use the intercepts as a predictor of the slopes within both parts. That is the only covariance of growth factors between both y- and u- parts I have estimated in my two-part mixture model(because it was the only significant). Is that a problem? I thought it is the heart of two-part growth curves to have at least one covariances between the growth factors of both parts...
You first can do the analysis of both groups, then do the analysis of each group with all parameters fixed at the values of the first analysis to get the plots.
linda beck posted on Wednesday, January 14, 2009 - 10:30 am
thank you for your patience... so I should use the same (co-)variances, thresholds and regression-coefficents (derived from the analysis of the entire sample) for the analysis of both control/treatment? What about the means? They surely should not be fixed in treatment and control analyses (because you said "all")!?
Good point. I assume your treatment/control dummy influences a growth factor mean, in which case you have to use the mean for the group in question.
linda beck posted on Thursday, January 15, 2009 - 8:52 am
I'm not sure if I fully understood. But what I've done so far (with plausible estimated means as outcome) is: I have estimated the model separately for t/c-condition fixing all parameters (cov, thresholds, regression coefficients and growth factor means) at the values of the analysis utilizing the entire sample. With one exception: regarding the growth factor means (which are intercepts at all, because I have a conditional model) I fixed only the intercepts iu and iy at values of the entire analysis because they were not influenced by treatment in the analyis using the entire sample (su and sy are influenced by treatment). In other words, only the slope means were allowed to be freely estimated in the separated analyses. I'm a bit confused about the thresholds, should I fix them at values of the entire analyses or not, when doing the separated analyses for t/c? I hope that's the way to go, in principal...
P.S.: Would be really cool to have the adjusted means (by covariates) option for two-part models in the future! :-)
linda beck posted on Friday, January 16, 2009 - 10:34 am
add., besides the thresholds (in question) should one also fix the intercept of c (c#1) when doing the separated analysis for Treatment and control? Sorry, I overlooked that [c#1] yesterday... When I fix the thresholds and [c#1] I have only two parameters left to estimate in both separated analyses (su and sy), which are influenced by t in the original model, utilizing the entire sample. Is that the correct model for what you had in mind?
You need to look at the results from TECH11 while taking weighting and clustering into account. Running TECH11 or TECH14 without taking weighting and clustering into account does not correctly represent your data. A further consideration in choosing the number of classes is the substantive interpretation of the classes.
Hello, I am running a 2-class LCA with 1 binary categorical indicator and 3 nominal indicators. I have two questions: 1. As outlined in the section on TECH14 in the User Guide, I first ran TECH14 with the starts option, then used the OPTSEED option with the seed of the stable solution, then ran with LRTSTARTS = 0 0 40 10. Here, I received a warning: WARNING: THE BEST LOGLIKELIHOOD VALUE WAS NOT REPLICATED IN 3 OUT OF 5 BOOTSTRAP DRAWS. THE P-VALUE MAY NOT BE TRUSTWORTHY DUE TO LOCAL MAXIMA. INCREASE THE NUMBER OF RANDOM STARTS USING THE LRTSTARTS OPTION. I increased the LRTSTARTS to 0 0 80 20 and the model ran without any problems/warnings. I then additionally specified LRTBOOTSTRAP = 100 (as suggested by McLachlan and Peel, 2000), and I again receive the warning printed above. I subsequently increased the LRTSTARTS to as high as 0 0 150 30 and no longer receive the warning. Should I be concerned about having to increase the LRTSTARTS so high?
2. For the Lo-Mendell-Rubin LRT in TECH11, I understand that the last class should be the largest. In the User Guide, you specify that if you are using starting values, they be chosen so that the last class is the largest. If I am using the automatic starting values and I notice that the last class is not the largest, does this mean I have to specify my own starting values?
Hello, I am comparing 2, 3, 4, & 5-class models using LCA. I have 3 nominal and one binary indicator, and I am running these LCA models at various ages (25, 30, 35 years etc.). I have a few questions from my output: 1. In a couple of instances, when I run TECH11 and TECH14, the H0 Loglikelihood from the k-1 class model is not the same as it was in the previous run with one less class. Why does this happen and how can I interpret it?
2. In a few instances, I observe the following two errors: a)ONE OR MORE PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY DUE TO THE MODEL IS NOT IDENTIFIED, OR DUE TO A LARGE OR A SMALL PARAMETER ON THE LOGIT SCALE. THE FOLLOWING PARAMETERS WERE FIXED: 62 What does this mean for model interpretation?
b)THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.487D-16. PROBLEM INVOLVING PARAMETER 40. Can you provide any guidance how I can address this issue?
Hi, I have a question about random starts. I have requested TECH1 but I am not sure where I can find the starting values. For example, if I get the output as follow, what are the starting values? Are they 0 0 0 for the variable AC_FS NC_FS CC_FS? Or, are they .305 .229 .225?
The starting values are under the heading of STARTING VALUES under Technical 1 in the output. It looks like what you have above are starting values. Nu and Theta are difference matrices. Nu gives starting values for means. Theta gives starting values for variances and covariances.
I have been trying to build a conditional Model that my dataset fit well. My path diagram describes a relationship between weight at different ages and cancer development. I have two latent variables (slope and intercept) with arrows indicating towards the weight at different ages. I have also arrows from these latent variables towards cancer as I am trying to investigate the effects of the weight changes on cancer. In this case, I do have weight and cancer as outcome variables. Weight is my continuous variable. And, Cancer is my categorical ( Free(o), early onset(1), late onset(2)) variable.
My aim is to know the patterns of weight gain/loss by cancer groups(0,1 &2). I have tried LGCM by specifying cancer as a categorical. But, I could not get the chi-square and model fit indices ( e.g. CFI and RMSA). But, when I run the model without specifying the categorical variable, I get the fit indices but 999.000 in my S.E and P-values persists.
1. Is LGCM ideal in my case?
2.What do I do if my slope and intercept are exogenous with two outcome variables?
Given that the weight profiles do not follow a developmental trend but rather go up and down, I would not use a growth model. I would use the weight variables in a latent class analysis to find patterns of weight gain and loss and use the cancer variable as a distal outcome.
I would suggest viewing the Topic 5 course video and looking at the papers on the website Latent Class Analysis. A good book is:
Hagenaars, J.A & McCutcheon, A. (2002). Applied latent class analysis. Cambridge: Cambridge University Press.
Example 7.12 shows an LCA with a covariate.
Simon Denny posted on Thursday, September 16, 2010 - 9:45 pm
Hello Linda and Bengt
I have fitted a multilevel mixture model with four classes with a categorical outcome. The between-level variables are dummy variables that were constructed for high, medium and low levels of various aspects of school environments.
I am interested in estimating the percentages or rates of my outcome variable within each class at different levels of the school variables. i.e. instead of presenting the odds ratios, I want to present the percentages/rates within the four classes. This for a lay-report where odds ratios are not easily understood. The percentages don't necessarily need to take into account the individual-level covariates, but this would be nice if possible.
Is there any way of converting the odds ratio's back to a percent of my outcome variable? Or is there any way of assigning the students to their latent class so I can then look at different levels of the dummy variables and my outcome variable?
This is possible. I would need to see your full output to understand your exact situation. Please send it and your license number to email@example.com.
Poh Hiong posted on Sunday, February 06, 2011 - 8:52 am
Hi, I have a question regarding solution to my latent class analysis. I realize that there is a difference in the profile of my classes between the model without predictors and the model with the predictors.
To elaborate, in my analysis without any predictors, it shows 3 classes. However, when I add in the predictors, it also shows 3 classes but they take a different profile.
I am not sure if I can still interpret the effects of the predictors on class membership when there are changes in the profile.
Any advice on this situation will be helpful. Thanks.
This issue is discussed in the following paper which is available on the website:
Muthén, B. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (ed.), Handbook of quantitative methodology for the social sciences (pp. 345-368). Newbury Park, CA: Sage Publications.
I had a question about a mixture model, I got the following error after using RML with Monte Carlo and my log likelihoods are over -2000, but it said the model terminated normally. Any thoughts? Thanks!
THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.210D-16. PROBLEM INVOLVING PARAMETER 25.
I would like to use a known class procedure to estimate a cross-gender latent profile of six observed variables, allowing the variances to vary across gender.
Here are my questions: 1. As I understand from my research so far, there is no way to obtain a test of the relative fit of k and k-1 classes (a LRT test) when using the known class procedure in mixtures in MPLUS. Am I correct?
2. Beyond looking at the BIC, what are your recommendations for comparing class solutions when using the known class option?
3. I considered running latent profile separately for males and females, but this doesn't give me the same solutions since mean profiles will be different across genders. Is running by-gender models the better choice given limitations of estimating the best solution using the known class procedure?
4. Do you know of any published studies using the known class procedure in MPLUS for a latent profile?
1. That's correct. It is possible in principle to do the BLRT, but it is only implemented for one latent class variable and exploratory mixture analysis (LCA).
2. With categorical items I would use bivariate tests in TECH10 and with continuous outcomes perhaps TECH11, TECH12, TECH13 (see UG).
3. I would recommend considering having gender as a covariate instead of Knownclass because it may make the solution search easier. Although it sounds like you have continuous indicators and therefore might consider class-varying variances, which the covariate approach can't handle, the covariate approach allows gender variation in class probabilities (c on x) and allows gender differences in some indicator means by direct effects of x on y (item bias). Without direct effects, the class profiles are the same for the genders. I would first analyze each gender separately to see if there is hope for getting the same classes.
4. Not off hand, but check under Papers on our web site. Does someone else know?
I ran by gender LPAs and found that a 4-class model was the best fit for both genders. The configuration of profiles is generally similar with 2 notable mean differences on specific indicators within 2 classes. I ran a full model with a gender covariate effect on those indicators within those particular classes, and a gender direct effect on class membership. Here are my questions:
1. A best-fitting 4-class model resembles by-gender profiles. To obtain class probabilities, should I leave in the direct effect of gender on C, although it is non-significant, since gender covariate affects class membership? 2. For planned analyses, it would be helpful to use the class probabilities from the full model. Any advice for good criteria for using the full sample vs by gender class member results, beyond significant chi-squares in cross-tabs of class membership in full vs by-gender models?
I am running a latent profile analysis on 5 factors with each three indicators (3 different informants for each factor, but the same three across factors). The solution looks very much like what I would expect based on theory, but I get the negative psi matrix warning. It relates to the last factor, but if I run the analysis without this factor, I get the same warning but for the factor that is now the last. I do not see a negative (residual) variance or a correlation greater than 1 anywhere. To me, it seems unlikely that there is a linear dependency among more than two variables, but I would like to ask how I can check for this.
The message could also be due to a group of high correlations. See TECH4. I would do the analysis without mixture and explore the factor structure further. I would ask for modification indices so see if there is a need for residual covariances across the five factors.
I would not recommend using imputed data with such a computationally demanding model. I would use MLR on the original data.
Matt Luth posted on Friday, March 16, 2012 - 9:09 am
I am trying to run a model similar to example 7.27. However instead of fixing the first factor loading to be one in both groups to set the scale, I would like to fix the variance in both groups to one. Would this be the correct syntax?
It sounds like you are hitting a local solution. Try more random starts. Also, be sure to check that you have the same number of parameters in both models. Otherwise, send both outputs and your license number to firstname.lastname@example.org.
Stata posted on Tuesday, March 20, 2012 - 12:31 pm
I am running factor mixture model with ordinal and binary variables. I got a bunch of error messages with FMM3. Is there a problem with my Mplus syntax?
*** ERROR in MODEL command Ordered thresholds 2 and 3 for class indicator A4 are not increasing. Check your starting values.
You seem to want the thresholds to be class-varying, but the way you state things not all of them will be class-varying. As an example, for [a1$3-a20$3]; it seems like you should refer also to thresholds 1 and 2 to make them different across classes as well.
Mplus 6.1 and 6.11 provide tech11 and tech14 results for factor mixture model with type=imputation. Why 6.12 does not have that capacity (see below)? In that case, should I trust tech11 and tech14 results from Mplus 6.1 and 6.11?
*** WARNING in OUTPUT command TECH11 option is not available with DATA IMPUTATION or TYPE=IMPUTATION in the DATA command. Request for TECH11 is ignored. *** WARNING in OUTPUT command TECH14 option is not available with DATA IMPUTATION or TYPE=IMPUTATION in the DATA command. Request for TECH14 is ignored. 2 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS
This applies to all analyses with TYPE=IMPUTATION.
Bart Simms posted on Friday, March 30, 2012 - 10:32 pm
I am attempting to run an FMM-5 with two classes and two correlated factors. There are 3 count indicators and four ordinal indicators. I also included
STARTS = 120 30; STITERATIONS = 65;
In the output there were the warnings
THE MODEL ESTIMATION HAS REACHED A SADDLE POINT OR A POINT WHERE THE OBSERVED AND THE EXPECTED INFORMATION MATRICES DO NOT MATCH. THE CONDITION NUMBER IS -0.235D-04. THE PROBLEM MAY ALSO BE RESOLVED BY DECREASING THE VALUE OF THE MCONVERGENCE OR LOGCRITERION OPTIONS.
THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THIS IS OFTEN DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. CHANGE YOUR MODEL AND/OR STARTING VALUES. PROBLEM INVOLVING PARAMETER 9.
RESULTS ARE PRESENTED FOR THE MLF ESTIMATOR.
Parameter 9 is a loading for a categorical variable in class 1.
I'm just wondering the best thing to do at this point to get it to run. Even with the saddle point, this is better than my other models (including the 1 class version) in terms of information criteria, and it makes good sense substantively. So I'm really hoping to get it to work.
I figured I would ask before I tried more starts, because that run took 45 hours.
It looks like the estimation recovered and you obtained MLR standard errors. If you have standard errors in the results, you can ignore the message. You can also try to decrease the MCONVERGENCE and LOGCRITERION options.
Bart Simms posted on Saturday, March 31, 2012 - 11:44 pm
Yes, there are standard errors. So the MLF results are only the parameter estimates themselves, and not the standard errors?
I also realized that I did something that surely didn't help the estimation in that the loading set to 1 for the second factor was a cross loading, and this was the same indicator set to 1 for the first factor. This resulted in quite a low factor variance and some huge loadings for other indicators.
I guess I will re-run it after correcting this, and also try more starts.
*But am I wasting time by setting STITERATIONS too high at 65? Perhaps the default is better?
with combination of categorical and continous variables (7 in total), I think 5 clusters seem to be the best solution. but I have three following warnings in the process-
I have two big questions:
1) do I have to standardidized all the continuous variables(they have different scales)
2) would you please let me know what I am supposed to do? the three warnings are
WARNING: THE BEST LOGLIKELIHOOD VALUE WAS NOT REPLICATED. THE SOLUTION MAY NOT BE TRUSTWORTHY DUE TO LOCAL MAXIMA. INCREASE THE NUMBER OF RANDOM STARTS.
THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS -0.851D-17. PROBLEM INVOLVING PARAMETER 14.
ONE OR MORE PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY BECAUSE THE MODEL IS NOT IDENTIFIED, OR BECAUSE OF EMPTY CELLS IN THE JOINT DISTRIBUTION OF THE CATEGORICAL VARIABLES IN THE MODEL. THE FOLLOWING PARAMETERS WERE FIXED: 19 20 21 22 23 24 37
Hi, I want to do path analysis based on three cohorts. I want to find out whether it is justifiable to use the three cohorts all together although the frequency distribution of some variables differs depending on the cohort. I ran the path model for each cohort separately and for all three cohorts together. I compared the results, and the general conclusion is that more parameters become significant when I use all three cohorts together. There are only a few minor differences in the results between the models of the three cohorts. I wander if this conclusion is sufficient enough to use all three cohorts together. Or should I try a multigroup comparison? I looked at the user’s guide, but could not find an example in the chapter on mixture modeling that ressembles what I think I need to do. A simplified version of my current path model is as follows: CATEGORICAL ARE y1 y; USEVARIABLES ARE x1 x2 x3 y1 y2 ; MISSING ARE ALL (999); ANALYSIS: estimator = ml; integration = montecarlo; MODEL: Y2 on x1 x2 x3 y1; y1 on x1 x2 x3; How does the syntax look like when I want to find out whether this model differs depending on the cohort being used? Or is there some other way than mixture modeling to find out that there is statistcally no difference according to the cohort and that it is justifiable to use the path model with all three cohorts together? I appreciate your advice.
No, the fact that the OPTSEED run doesn't give the warning is no guarantee.
For us to diagnose this you would have to send input, data, output, and license number to email@example.com.
Adam Myers posted on Monday, July 29, 2013 - 6:12 pm
Hi Bengt and Linda,
I have a question about the different diagnostics for selection of an appropriate latent class solution. Specifically, when I run an LPA using MPlus, the LMR test indicates rejection of a three-class solution (which makes little sense for my data), but when I run a model with four through nine classes, the p-value for the LMR test is well below .05. All of the other indicators (AIC, BIC, BLRT) consistently suggest that each successive class solution is a better fit.
I read in Nylund, Asparahouv, and Muthen (2007) that LMR tends to overestimate the number of classes rather than underestimate, but it seems to be doing the exact opposite with my data.
Thoughts? Should I ignore the LMR results for the 3-class solution?
xiaoyu bi posted on Tuesday, January 07, 2014 - 10:17 am
Dear Dr. Muthen, For the growth mixture model, when I reported the counts/proporations of individuals in each class, should I report them based on (a) estimated posterior probabilities, or (b) their most likely latent class membership? I noticed that the graphs generated by Mplus are based on the estimated posterior probabilities. I read a book, and it reports the number/proporations based on their most likely latent class membership. But, if in the text I report the counts and proporations based on their most likely latent class membership, the number will not match with those in the graph? What do most researchers do? Any suggestions? Thank you so much!
You specify that within each class. So, it's possible, although not always easy to get stable solutions. For examples, see the papers on our website under Papers, Growth Mixture Modeling.
Marianne SB posted on Wednesday, September 24, 2014 - 10:40 pm
Thanks for your reply!
Actually, I want to examine effects of some covariates measured on some of the waves, so I cannot regress i s q c on them. Therefore, I am thinking about specifiying dep16 on x y z (time-varying covariates).
However, as you suggest, I struggle with unstable solutions and error messages. It might not be possible to examine both classes with different longterm development and different effects of covariates simultaneously with my dataset. I don't think any of the papers under Papers, Growth Mixture Modeling have done this either.
Marianne SB posted on Wednesday, September 24, 2014 - 10:43 pm
Addition: My covariates are measured on some of the waves in the middle, not on baseline.
Anna Hawrot posted on Thursday, November 13, 2014 - 5:19 am
I've seen this term in several articles. For instance, authors were extracting up to 6 classess, however results of the 6-class model were not reported because the model was not well-defined for the data.
Yestarday, after posting my message here, I found the information in UG7, p. 466 that models whose final stage optimizations resulted in LogLikelihood (LL) values very close to the best LL, may be not-well defined for the data.
In order to understand it better I was experimenting with different LCA models and I managed to get such models (parameters' values differred not dramatically, but substantially; LL values were very close, e.g., -8126.353, -8126.691, -8126.729). However, I also got ones with almost identical parameters' values. To sup up, my results were in line with the information in UG7.
Infering from my explorations, I would say that a model is "not-well defined" for the data when its parameters estimates are unstable, and thus - not thrustworthy. Am I right?
I think you are on the right track. Look at our handout for Topic 5, slide 116, cases 3 and 4. The likelihood has 2 local maxima that are very similar in height. This means that using this model the data is not very informative about the value (estimate) of the parameter on the x axis. This is what we mean by the model not being well-defined. This implies that the ML method breaks down - it cannot clearly help us find a best parameter estimate. In case 3 the problem isn't that big since the parameter values are not that far apart for the 2 peaks, but in case 4 it is a serious problem.
Anna Hawrot posted on Tuesday, November 18, 2014 - 9:08 am
Thank you! It's much clearer now!
Ann Nguyen posted on Wednesday, January 21, 2015 - 12:23 pm
I ran a series of LCAs (from a 1 class solution to a 4 class solution) despite the LMR test indicating a preference for a 1 class solution. A 3 class solution showed the greatest reduction in AIC and BIC. Entropy was highest for the 3 class solution. Moreover, the 3 class solution was most easily interpretable and consistent with theory. Given all of these factors, is it fine to ignore the non-significant LMR test from the 3 class solution and select the 3 class solution as the final solution?
I don't think you need to be tied to what LMR says. A key indicator is where BIC is at its minimum - so if that was at 3 classes you are fine to go with that. BIC doesn't do well at small sample sizes, such as n<200.
Ann Nguyen posted on Thursday, January 22, 2015 - 6:20 am
Hi, I'm running a LCA with four, ordinal variables. I estimated a 3-class solution but the proportion of replications of my best LL solution was a bit low (34%) with starts =100 50 and one perturbation failed to converge. I upped the random starts to 500 100 to see if I could get better replication. Again replication of the best LL solution was 34/100. I then copied the S-values and pasted them into the model command to further help the model along. I got an error message stating:
"One or more pairs of ordered thresholds are not increasing in Class 1. Check your starting values. Problem with the following pairs: ANTI$2 (5.175) and ANTI$3 (5.175) ANTI$4 (15.000) and ANTI$5 (15.000)"
I see that a few of my thresholds were the same (i.e. 5.175) and of course the thresholds set to an extreme value (15) were be the same. How do I remedy this problem when pasting in S-values?
You can simply add a small value to the starting value for the higher threshold.
But getting 34 replications of the best LL seems more than sufficient to me.
Bill Dudley posted on Thursday, March 31, 2016 - 5:29 am
When I run very simple unconditional cross sectional mixture models and export data with the cprobs command, the calss counts are very close to the class counts and proportions as reported in the output (I see about a 1% difference which I assume could be to some differences in rounding).
However in more complex models there can be considerable discrepancy between the the class counts in the output and the CPROB data. I am assuming that in complex models such as conditional models the Posterior probabilities reflect the model based posterior probabilities.
This leads to a question of how to report the class sizes. Although the means for classes are reported, I would like to move class counts into excel for graphing and reporting descriptive stats on demographics etc. Which class count data should I use?
It is like the difference between factors and estimated factor scores for continuous latent variables. The former are assumed to be normal with mean zero and variance psi but the estimated factor score distribution may not look like that - possibly because the model isn't perfect but also because estimated scores don't have the same properties as true scores. In a similar way, the model-estimated class probabilities can be different than the class probs based on estimated posterior probabilities. Both can be of interest.
Bill Dudley posted on Tuesday, April 05, 2016 - 6:07 am
Thank you for your prompt reply. I am mulling this over - in the meantime, I have a practical question. Which N's should we report? And are these the N's for the class means provided in the model results for each class?