

GMM with nonnormal distribution 

Message/Author 

HanJung Ko posted on Monday, February 16, 2015  10:11 pm



Drs. Muthen, My prior GMM model was based on normal distribution assumption. Two classes were suggested better than one class, while three classes were not able to converge. It could be a sample size constraint (n=159), correct? The recent development of estimating the model based on nonnormal distribution has led me to redo the analysis. The oneclass skewed t distribution BIC is much better than the 2class normal distribution BIC. However, there is a consistent error message. **THE LATENT VARIABLE COVARIANCE MATRIX (PSI) IN CLASS 1 IS NOT POSITIVE DEFINITE. THIS COULD INDICATE A NEGATIVE VARIANCE/RESIDUAL VARIANCE FOR A LATENT VARIABLE, A CORRELATION GREATER OR EQUAL. TO ONE BETWEEN TWO LATENT VARIABLES, OR A LINEAR DEPENDENCY AMONG MORE THAN TWO LATENT VARIABLES. CHECK THE TECH4 OUTPUT FOR MORE INFORMATION. PROBLEM INVOLVING VARIABLE S. I checked the result suggested in the message and found the residuals were negative. Does it suggest this model is overfitting the observations? Could you please advise how to proceed to solve this problem? Thank you! 


With negative residual variances I would not use skewt, particularly not given your small sample size. Sounds like you might settle for a 2class normal. Nonconvergence for 3 classes can be a sign of using a model with too many parameters. 

HanJung Ko posted on Tuesday, February 17, 2015  3:20 pm



Dr. Muthen, My apologies. Based on your workshop notes, the residual variances for skewedt modeling would not be accurate but we should refer to TECH4 output. The residuals for covariances were given in TECH4 rather than residuals for variances. Therefore, what I meant should be "residuals for covariances". I tried to fix the variances for latent variables (intercept and slope) to be zero and the model estimation terminated normally. However, the residuals for covariances are still all negative. Does this matter? The BIC for skewt model is smaller than the 2class normal model. But only a few residuals for covariances in the 2class normal model are negative while the majority of them are positive. That's why I am concerned. Thank you. 


We would have to see your full output from your two 2class runs  send to support along with your license number. Also say what your observed variable metric is (Likert scale, sum of items, etc)  you need truly continuous variables for skewt. 

HanJung Ko posted on Saturday, February 21, 2015  10:10 am



Drs. Muthen, thank you for reviewing my outputs for oneclass skewt and twoclass normal distribution solutions. I did try to fit the skewt model, only estimating the intercept but not slope. For oneclass solution (BIC= 1345.38), the residuals for covariances are still all negative. Here is the error message: ONE OR MORE PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY BECAUSE THE MODEL IS NOT IDENTIFIED, OR BECAUSE OF EMPTY CELLS IN THE JOINT DISTRIBUTION OF THE CATEGORICAL VARIABLES IN THE MODEL. THE FOLLOWING PARAMETERS WERE FIXED: Parameter 3, %C#1%: I THIS MAY ALSO BE DUE TO RESIDUAL VARIANCES CONVERGING TO 0. THESE RESIDUAL VARIANCES AND CORRESPONDING COVARIANCES ARE FIXED TO 0. I also tried 2class solution for skewed t distribution. The model did not fit well. The 2class with normal distribution model estimation terminated normally (estimating only intercept latent variables) with a BIC of 1381.47. My concern is which model seems to fit the data better and tell a robust result? Thank you! 


Two key considerations to not use the skewt model are:  Your variables should not have upper limits because this doesn't really fit well in this family of distributions.  A zero variance estimate makes the modeling questionable. In such cases I recommend using normal mixtures instead even if they have worse BIC. 

HanJung Ko posted on Tuesday, February 24, 2015  10:52 pm



Thank you for your feedback. My coauthors and I are still debating whether to choose the normal distribution result over the skewt results because the latter provides the most parsimonious explanation of the data. 1) We agree with your first consideration, but based on this logic shouldn't we assume normality either? For practical purposes, wouldn't a 5point Likerttype scale be sufficiently "continuous" as applied by many other researchers? I also review the Mplus webnote. It seems with the upper limits in our data, the distributions are conditional distributions (in Advances in Mplus Version 7.2, page 148)? 2) I certainly agree that the zero variances lead to unintuitive results, but based on Bengt's talk at M3 is it a relatively minor price to pay for a parsimonious model? (Especially because the skew results replicate exactly how we would have to interpret the twoclass results that assume a normal distribution.) 3) Is there a way to test whether the normal distribution 2class model needs to adjust for its skewness? Based on TECH12, the observed Skewness ranges from .914 to .0443 while the estimated mixed skewness ranges from 0.514 to 0.317. Thank you. I really appreciate your input. 


1. The skewt distribution needs quite a lot of information from the data and so needs a more continuous outcome than the normal. I agree that the normal works fine for Likert (if no strong floor/ceiling effects). 2)  3) Have you tried skewnormal? 

HanJung Ko posted on Wednesday, February 25, 2015  2:38 pm



Thank you, Dr. Bengt. I did try skewnormal earlier but its BIC was not as good as the skewt model. I reran the skewnormal model again. To get it to work, I would have to fix the intercept variances as well as the covariance matrix between intercept and slope (the same as the skewt model). Would this help determine whether normal or skew models fit better for my data? To clarify, my main concern is 1) whether the 2 classes are spurious because the underlying population distribution is not normal; 2) whether the skewnormal and/or skewt models are overfitting the data. Can TECH12 in the 2class model output or any other indices help the determination? Thank you! 


What's the percentage at the lowest and highest value of your observed variable (which has range 15 I think)? 

HanJung Ko posted on Wednesday, February 25, 2015  4:20 pm



It's 05 for five time points (n=159). 


And what's the percentage of observations at 0 and at 5 at those 5 time points? 

HanJung Ko posted on Wednesday, February 25, 2015  6:44 pm



For each time point, there is a sum score (PIL) from seven items for each individual. The means and standard deviations for year 1 to year 5 are: 3.94 (.79), 3.87 (.82), 3.90 (.84), 3.84 (.85), and 3.92 (.80). The corresponding observed skewness for each time point is: .914, .443, .010, .845, and .502 although the estimated mixed skewness (in the 2class normal model) becomes smaller. Hope this helps. Let me know if you need the exact percentages. 


I am not getting the information I asked for. Also, you say the range of your outcome is 05 and then you say it is the sum of 7 items  how does that come about? 

HanJung Ko posted on Thursday, February 26, 2015  10:26 am



Dr. Bengt, thank you for clarifying. Sorry I meant there are 7 items, each of which is based on a Likert scale (05). We took mean score (sorry not sum) of the seven items so each individual's PIL score still ranges between 05 for each time point. For the first time point (PIL_1), there are 1.23% below score of 2, 11.04% below 3, 39.88% below 4, and 93.25% below 5, and there is 6.75% with the score of 5. For PIL_2, there are 1.27% below 2, 13.29% below 3, 51.27% below 4, 91.77% below 5, and 8.23% with the score of 5. For PIL_3, there are 0.65% with the score of zero, another 0.65% above 1 and below 2, 11,69% below 3, 48.70% below 4, 90.26% below 5, and 9.74% with the score of 5. For PIL_4, there are 3.95% below 2, 14.47% below 3, 44.74% below 4, 96.05% below 5, and 3.95% with the score of 5. For PIL_5, 0.68% below 2, 14.97% below 3, 46.94% below 4, 93.20% below 5, and 6.80% with the score of 5. Hope this clarifies. Thank you! 


Ok, so you don't have negligible floor and ceiling effects which is good. You should plot the estimated distribution of each variable using your skewt solution and see how well it matches the observed variable distribution. 

Back to top 

