Philip Jones posted on Wednesday, January 03, 2007 - 1:53 pm
I'm trying to fit a GMM on a right-skewed outcome at 4 time points, treating the outcome as censored at the floor. I am fitting cubic growth curves and am allowing the intercept to vary within class but am fixing the other three. With the default settings, for a 3-class model I get an ABIC of 39968 and entropy of 0.841. If I free up the intercept variance and the residual variances across classes, the ABIC drops to 39621, suggesting a better fit. The differences I observe in the variance estimates across the 3 classes seem to corroborate this. However, the entropy of the latter model drops substantially to 0.408. How should I reconcile this with the presumed better fit? Such a relatively low entropy suggests to me that it's not as useful a model in terms of accurately classifying subjects. Varying the number of classes doesn't change things much; the entropies of the more general model are much lower than those of the model with variances equal across classes.
And a slightly different question: computation flies (<5 min) if I assume normality but is excruciatingly slow when I code the variables as censored or two-part (can be 45 minutes to 2 hours on my brand new 2.33GHz Core 2 Duo with 2GB memory and PROCESSORS=2), making model tweaking very arduous. I often try INTEGRATION=5 to speed things up but it either doesn't help much or results in non-convergence. Any suggestions?
Entropy is not a measure of model fit. It may just be the better fitting model does not have as clear cut classes.
When numerical integration is needed, computational time increases. One to two hours is not particularly long in this case. INTEGRATION=5 may not be enough for adequate precision. I would use 7. If convergence is slower than expected with numerical integration, it may be that the model is too complex.
rgm smeets posted on Friday, September 07, 2018 - 2:02 am
I am running a GMM on a very large dataset (almost 60,000 patients) with care utilization measured at six timepoints as dependent variable. I am doubtful about a number of findings and how to interpret these, in the light of the features of our dataset. Firstly, the entropy is just above 0.7 and related to this, I feel like there is quite some variation between observed individual values and estimated means. On the other hand, I can imagine that, in comparison to other research on mixture modeling, we have an enormous dataset. Furthermore, it seems like care utilization has quite some variance and instability over time. Could you tell me your ideas about this situation?
You can plot the class-specific estimated means and the observed individual curves in the same graph and see if some of the classes seem to have more variability than the other classes - if so, you can free a variance in those classes (such as the variance of the random intercept). That can give a better fit. If your outcomes are strongly skewed - but with no floor or ceiling effects - you may also consider skew-t GMM as in our Statistics in Medicine article.
rgm smeets posted on Monday, September 10, 2018 - 7:52 am
Thank you for your respond mr. Muthen. I have two remaining questions: I am currently running a GMM with free intercept, but slope and quadratic factor fixed to zero.(as I read somewhere that this is the common procedure and I noticed in my LCGA on the same dataset that there is much and highest variance in intercept). My first question related to this is: should I free and fix intercept, slope and quadratic factor for every model in the same way? I read somewhere on this forum that if you free and fix factors unequally, you might be “model trimming”. My second question is related to what factors to prioritize in estimating the best model: what if the entropy still increases and the BIC still decreases, but the Lo-Mendell-Rubin Likelihood ratio test seems to be unsignificant? Is there any factor which I should prioritize in deciding on the best model?
It is useful to be more precise with the wording here. You say "slope and quadratic factor fixed to zero" - perhaps you mean that their variances are fixed to zero.
Go by BIC - to me, BIC is more important than entropy, that is, you first want to decide on the model and then you look at the entropy. You wouldn't decide on a structural equation model based on its R-square for instance.
rgm smeets posted on Tuesday, September 11, 2018 - 8:09 am
Indeed, I meant the variances in intercept, slope and quadratic factor. I still feel that it is quite arbitrary what variances in what factors to fix of set free. I started with the following model in a GMM:
Step 1: Use our default of class-invariant variances.
Step 2: Plot the estimated and observed curves for each class. Check if a certain class has substantially more or less individual variation than other classes.
Step 3: Free the i variance in that class.
rgm smeets posted on Wednesday, September 12, 2018 - 1:53 pm
Thank you to give me an overview of the steps. As a follow-up question: Should I first run the model with class-invariant variances with increasing classes and if the BIC won't improve, decide which (variance in) intercept to free? Or is it also possible to free the variance in intercept while still increasing the number of classes?