GMM with non-normal distribution
Message/Author
 Han-Jung Ko posted on Monday, February 16, 2015 - 10:11 pm
Drs. Muthen,
My prior GMM model was based on normal distribution assumption. Two classes were suggested better than one class, while three classes were not able to converge. It could be a sample size constraint (n=159), correct?
The recent development of estimating the model based on non-normal distribution has led me to re-do the analysis. The one-class skewed t distribution BIC is much better than the 2-class normal distribution BIC. However, there is a consistent error message.
**THE LATENT VARIABLE COVARIANCE MATRIX (PSI) IN CLASS 1 IS NOT POSITIVE DEFINITE. THIS COULD INDICATE A NEGATIVE VARIANCE/RESIDUAL VARIANCE FOR A LATENT VARIABLE, A CORRELATION GREATER OR EQUAL. TO ONE BETWEEN TWO LATENT VARIABLES, OR A LINEAR DEPENDENCY AMONG MORE THAN TWO LATENT VARIABLES. CHECK THE TECH4 OUTPUT FOR MORE INFORMATION. PROBLEM INVOLVING VARIABLE S.
I checked the result suggested in the message and found the residuals were negative. Does it suggest this model is over-fitting the observations? Could you please advise how to proceed to solve this problem? Thank you!
 Bengt O. Muthen posted on Tuesday, February 17, 2015 - 8:51 am
With negative residual variances I would not use skew-t, particularly not given your small sample size.

Sounds like you might settle for a 2-class normal. Non-convergence for 3 classes can be a sign of using a model with too many parameters.
 Han-Jung Ko posted on Tuesday, February 17, 2015 - 3:20 pm
Dr. Muthen,
My apologies. Based on your workshop notes, the residual variances for skewedt modeling would not be accurate but we should refer to TECH4 output. The residuals for covariances were given in TECH4 rather than residuals for variances. Therefore, what I meant should be "residuals for covariances". I tried to fix the variances for latent variables (intercept and slope) to be zero and the model estimation terminated normally. However, the residuals for covariances are still all negative. Does this matter? The BIC for skew-t model is smaller than the 2-class normal model. But only a few residuals for covariances in the 2-class normal model are negative while the majority of them are positive.
That's why I am concerned. Thank you.
 Bengt O. Muthen posted on Wednesday, February 18, 2015 - 8:02 am
We would have to see your full output from your two 2-class runs - send to support along with your license number. Also say what your observed variable metric is (Likert scale, sum of items, etc) - you need truly continuous variables for skew-t.
 Han-Jung Ko posted on Saturday, February 21, 2015 - 10:10 am
Drs. Muthen, thank you for reviewing my outputs for one-class skewt and two-class normal distribution solutions. I did try to fit the skewt model, only estimating the intercept but not slope. For one-class solution (BIC= 1345.38), the residuals for covariances are still all negative. Here is the error message:
ONE OR MORE PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY BECAUSE THE MODEL IS NOT IDENTIFIED, OR BECAUSE OF EMPTY CELLS IN THE JOINT DISTRIBUTION OF THE CATEGORICAL VARIABLES IN THE MODEL. THE FOLLOWING PARAMETERS WERE FIXED:
Parameter 3, %C#1%: I
THIS MAY ALSO BE DUE TO RESIDUAL VARIANCES CONVERGING TO 0. THESE RESIDUAL VARIANCES AND CORRESPONDING COVARIANCES ARE FIXED TO 0.
I also tried 2-class solution for skewed t distribution. The model did not fit well. The 2-class with normal distribution model estimation terminated normally (estimating only intercept latent variables) with a BIC of 1381.47. My concern is which model seems to fit the data better and tell a robust result? Thank you!
 Bengt O. Muthen posted on Saturday, February 21, 2015 - 2:44 pm
Two key considerations to not use the skew-t model are:

- Your variables should not have upper limits because this doesn't really fit well in this family of distributions.

- A zero variance estimate makes the modeling questionable.

In such cases I recommend using normal mixtures instead even if they have worse BIC.
 Han-Jung Ko posted on Tuesday, February 24, 2015 - 10:52 pm
Thank you for your feedback. My co-authors and I are still debating whether to choose the normal distribution result over the skew-t results because the latter provides the most parsimonious explanation of the data.
1) We agree with your first consideration, but based on this logic shouldn't we assume normality either? For practical purposes, wouldn't a 5-point Likert-type scale be sufficiently "continuous" as applied by many other researchers? I also review the Mplus webnote. It seems with the upper limits in our data, the distributions are conditional distributions (in Advances in Mplus Version 7.2, page 148)?
2) I certainly agree that the zero variances lead to unintuitive results, but based on Bengt's talk at M3 is it a relatively minor price to pay for a parsimonious model? (Especially because the skew results replicate exactly how we would have to interpret the two-class results that assume a normal distribution.)
3) Is there a way to test whether the normal distribution 2-class model needs to adjust for its skewness? Based on TECH12, the observed Skewness ranges from -.914 to -.0443 while the estimated mixed skewness ranges from -0.514 to -0.317.
Thank you. I really appreciate your input.
 Bengt O. Muthen posted on Wednesday, February 25, 2015 - 1:24 pm
1. The skew-t distribution needs quite a lot of information from the data and so needs a more continuous outcome than the normal. I agree that the normal works fine for Likert (if no strong floor/ceiling effects).

2) - 3) Have you tried skew-normal?
 Han-Jung Ko posted on Wednesday, February 25, 2015 - 2:38 pm
Thank you, Dr. Bengt.
I did try skew-normal earlier but its BIC was not as good as the skew-t model. I re-ran the skew-normal model again. To get it to work, I would have to fix the intercept variances as well as the covariance matrix between intercept and slope (the same as the skew-t model). Would this help determine whether normal or skew models fit better for my data?
To clarify, my main concern is 1) whether the 2 classes are spurious because the underlying population distribution is not normal; 2) whether the skew-normal and/or skew-t models are overfitting the data.
Can TECH12 in the 2-class model output or any other indices help the determination?
Thank you!
 Bengt O. Muthen posted on Wednesday, February 25, 2015 - 4:08 pm
What's the percentage at the lowest and highest value of your observed variable (which has range 1-5 I think)?
 Han-Jung Ko posted on Wednesday, February 25, 2015 - 4:20 pm
It's 0-5 for five time points (n=159).
 Bengt O. Muthen posted on Wednesday, February 25, 2015 - 6:18 pm
And what's the percentage of observations at 0 and at 5 at those 5 time points?
 Han-Jung Ko posted on Wednesday, February 25, 2015 - 6:44 pm
For each time point, there is a sum score (PIL) from seven items for each individual. The means and standard deviations for year 1 to year 5 are: 3.94 (.79), 3.87 (.82), 3.90 (.84), 3.84 (.85), and 3.92 (.80). The corresponding observed skewness for each time point is: -.914, -.443, -.010, -.845, and -.502 although the estimated mixed skewness (in the 2-class normal model) becomes smaller.
Hope this helps. Let me know if you need the exact percentages.
 Bengt O. Muthen posted on Thursday, February 26, 2015 - 7:34 am
I am not getting the information I asked for. Also, you say the range of your outcome is 0-5 and then you say it is the sum of 7 items - how does that come about?
 Han-Jung Ko posted on Thursday, February 26, 2015 - 10:26 am
Dr. Bengt, thank you for clarifying. Sorry I meant there are 7 items, each of which is based on a Likert scale (0-5). We took mean score (sorry not sum) of the seven items so each individual's PIL score still ranges between 0-5 for each time point.
For the first time point (PIL_1), there are 1.23% below score of 2, 11.04% below 3, 39.88% below 4, and 93.25% below 5, and there is 6.75% with the score of 5.
For PIL_2, there are 1.27% below 2, 13.29% below 3, 51.27% below 4, 91.77% below 5, and 8.23% with the score of 5.
For PIL_3, there are 0.65% with the score of zero, another 0.65% above 1 and below 2, 11,69% below 3, 48.70% below 4, 90.26% below 5, and 9.74% with the score of 5.
For PIL_4, there are 3.95% below 2, 14.47% below 3, 44.74% below 4, 96.05% below 5, and 3.95% with the score of 5.
For PIL_5, 0.68% below 2, 14.97% below 3, 46.94% below 4, 93.20% below 5, and 6.80% with the score of 5.
Hope this clarifies. Thank you!
 Bengt O. Muthen posted on Friday, February 27, 2015 - 10:15 am
Ok, so you don't have negligible floor and ceiling effects which is good.

You should plot the estimated distribution of each variable using your skew-t solution and see how well it matches the observed variable distribution.
 Hiva Y posted on Sunday, January 07, 2018 - 9:03 am
Hello
I am trying a mixture of SEM by skew normal distribution, can I ask what constraints should I use to get an identified model?
 Bengt O. Muthen posted on Sunday, January 07, 2018 - 2:57 pm