Decide linear or quadratic for GMM model PreviousNext
Mplus Discussion > Latent Variable Mixture Modeling >
Message/Author
 Qi posted on Tuesday, June 02, 2009 - 2:46 pm
Dr. Muthen,

In a GMM analysis, I am trying to decide whether to use a linear or a quadratic growth model, I am wondering if there are some guidelines for how to approach this. Shall I 1) fit a regular growth model(with 1-class) first, and decide to include the quadratic term or not (I assume using the chi-square test for nested model), and then fit the GMMs to decide the # of classes; or 2) decide the # of classes in GMM, and then decide if I should include quadratic terms? It's hard to know if each latent class would need the quadratic term though. Your guidance would be greatly appreciated!

Qi
 Linda K. Muthen posted on Thursday, June 04, 2009 - 9:47 am
You should include a quadratic growth model in the overall part of the MODEL command. You can then look at the means of the quadratic growth factor in each class and see if it is different from zero.
 Dustin Pardini posted on Friday, June 14, 2013 - 1:44 pm
Related to the above discussion, we have estimated an unconditional growth curve that demonstrates better fit with quadratic than linear model. As a next step we looked at a 5 class latent class growth model. In this model, the three most substantively interesting classes have non-significant quadratic means. Additonally, the slopes for these 3 classes are also non-significant, but eyeballing their graphs it would appear there is a significant slope. I'm wondering if the incorrect inclusion of a quadratic term for these 3 classes could be affecting the significance of the slope mean estimate? The two other classes, which have a higher N (ex.-stable low class) do however, demonstrate a significant quadratic mean.

In this case, do you suggest maintaining the same quadratic or linear model for all classes? Or is it possbile/correct to specify different models (linear vs quadratic) for the different classes? If this is the case, is there a reference you are aware of for how to go about settnig this up? Thanks so much
 Bengt O. Muthen posted on Sunday, June 16, 2013 - 4:09 pm
Without substantive theory to the contrary, it seems like a reasonable approach to use the same functional form for all classes and then report where some parameters are insignificant.

You get a separate form simply by specifying a different growth model within the class in question.
 Han-Jung Ko posted on Wednesday, April 02, 2014 - 11:56 am
Dr. Muthen,
I am working on a growth mixture model with outcomes of 5-wave continuous variables. Before proceeding to class analysis, I wonder how I could determine whether it should be better estimated as a linear or quadratic model?

1. The estimated means of slopes in the linear model were not significant, nor were the slopes and quadratic terms in the quadratic model.

2. Would it be legitimate if I determine whether linear or quadratic estimations are more appropriate by Chi-square tests? If so, where should I look for the values in the output? Chi-Square Test of Model Fit or Chi-Square Test of Model Fit for the Baseline Model?

What does it mean if the Chi-Square Test of Model Fit is not significant?

I read through example 6.1 and 6.9 but I am still not sure how to compare the linear and quadratic models.

Thank you,
Han-Jung Ko
 Bengt O. Muthen posted on Wednesday, April 02, 2014 - 5:09 pm
I think it is sufficient to look at the z scores for the linear and quadratic growth factor means. So it sounds like you have an intercept only growth model.
 Han-Jung Ko posted on Monday, April 07, 2014 - 11:58 am
Thank you, Dr. Muthen. I figured keeping the linear model is more reasonable.
Another question encountered is when I tried to apply the 3-step approach to estimate the model. I was able to estimate the classes in step 1 (2 classes identified) but failed in step 2, where I have a few categorical and continuous variables to predict the probabilities of being in each class.

Analysis:
Type = Mixture;
Starts = 500 80;
LRTSTARTS = 1000 1000 1000 1000;
Stiterations = 50;
processor = 4;
Model:
%OVERALL%
!i s | pl_1@0 pl_2@1 pl_3@2 pl_4@3 pl_5@4;
!i WITH s;
c on sex AA total_t1lgs total_t1neo_c total_t1neo_n total_t1neo_e;
%c#1%
pl_4;
[n#1@1.955];
%c#2%
pl_4;
[n#1@-3.150];

My reference:
http://www.statmodel.com/download/AppendicesOct28.pdf

But I think the difference is that I did not have auxiliary variables in my model but only predictors for class in step 3.

Thank you! I really appreciate your help!

Han-Jung Ko
 Bengt O. Muthen posted on Monday, April 07, 2014 - 4:47 pm
I don't know what the problem is that you describe.

Why do you say "pl_4" in the class-specific part of the model?
 Han-Jung Ko posted on Monday, April 07, 2014 - 8:59 pm
Sorry I did not describe my question well. Based on Asparouhvo & Muthen (2013), my step 1 model is:
Usevariables are pl_1 pl_2 pl_3 pl_4 pl_5;
Missing are all(-9999) ;
Classes=c(2);
SAVEDATA:
FILE = gmm2step.dat;
save = cprobabilities;
Analysis:
Type = Mixture;
Model:
%OVERALL%
i s | pl_1@0 pl_2@1 pl_3@2 pl_4@3 pl_5@4;
i WITH s;

The problem was when I proceed to estimate whether gender, race, personality traits etc. predict the class membership, Mplus could not read the data saved from step 1.

Model:
%OVERALL%
c on sex AA total_t1lgs total_t1neo_c total_t1neo_n total_t1neo_e;
%c#1%
[n#1@1.955];
%c#2%
[n#1@-3.150];

I guess it is because in the data from step 1, there are only estimated PIL1~5, intercept mean and Std., slope mean and Std., p(1), p(2), and n. There are no data on gender, race, and personality traits etc.

I did try add:
auxiliary= AA sex total_t1lgs total_t1neo_c total_t1neo_n total_t1neo_e;
But the model did not work. Thank you for the help in advance.
Han-Jung Ko
 Bengt O. Muthen posted on Tuesday, April 08, 2014 - 8:31 am
Send the output from your problematic run and license number to Support.
 Yue Liao posted on Thursday, June 26, 2014 - 12:36 am
Hi Dr. Muthen,

I have a 3-class GMM using a zero-inflated Poisson model. I want to have one class as linear, and the other two as quadratic. How do I specify that?

Here is my model:

%overall%
i s q | smk1@0 smk2@.1 smk3@.2 smk4@.3 smk5@.4;
ii si qi | smk1#1@0 smk2#1@.1 smk3#1@.2 smk4#1@.3 smk5#1@.4;
s-qi@0;


How should I specify class 1 so it will be linear?
 Linda K. Muthen posted on Thursday, June 26, 2014 - 2:22 pm
One comment. You do not have a GMM if the variances of the growth factors are fixed at zero.

To have different growth models in different classes, specify the most complex model in %OVERALL%. In the other classes, fix the means, variances, and covariances of the growth factors you don't want to zero.
 Yue Liao posted on Thursday, June 26, 2014 - 3:45 pm
Hi Linda,

I followed Example 8.5 to set up my model. I thought it's a GMM?

If I don't want class 1 to have the quadratic term for the growth model, do I just put:

%c#1%
q@0;

or

%c#1%
i s;


Also, if I want to have a cubic function, is the following code correct?

%overall%
i s q c | smk1@0 smk2@.1 smk3@.2 smk4@.3 smk5@.4;
ii si qi ci | smk1#1@0 smk2#1@.1 smk3#1@.2 smk4#1@.3 smk5#1@.4;

I want to start with a simple model (i.e., a fixed effect model), so should I put

s-ci@0;


Thanks a lot!
 Linda K. Muthen posted on Thursday, June 26, 2014 - 3:58 pm
I see. There is one growth factor free. The reason the example does this is because each growth factor requires one dimension of integration.

The variance and covariances are zero. To set the mean to zero say

%c#1%
[q@0];

Yes, that is correct for a cubic model.
 Yue Liao posted on Friday, June 27, 2014 - 1:28 pm
Thanks Linda!

Also, our measurement points are one year part, I assume smk1@0 smk2@.1 smk3@.2 smk4@.3 smk5@.4 would be the same as smk1@0 smk2@1 smk3@2 smk4@3 smk5@4; correct?

Another question.

ANALYSIS: TYPE=MIXTURE;
STARTS = 5000 50;
STITERATIONS = 100;
ALGORITHM = INTEGRATION;
ESTIMATOR = MLF;

For the above spec, the model still says "THE SOLUTION MAY NOT BE TRUSTWORTHY DUE TO LOCAL MAXIMA. INCREASE THE NUMBER OF RANDOM STARTS." Should I then change the algorithm to MONTECARLO (suggested by the manual since random starts already reach 5000)?

And will increasing STITERATIONS and/or integration number make a difference at this point?
 Bengt O. Muthen posted on Saturday, June 28, 2014 - 6:35 pm
Answers to your 3 questions:

Q1. Yes.

Q2. Not necessarily. Increase the number of random starts using STARTS= if the best loglikelihood value is not replicated several times.

Q3. Only if your problem requires numerical integration and TECH8 shows negative "ABS" changes.
 Yue Liao posted on Monday, June 30, 2014 - 1:22 pm
So to confirm, if the problem is only about "the best log likelihood value is not replicated", I can just keep increasing the STARTS number, like, from 5k to 10k or more.

Meanwhile, I don't need to increase STITERATIONS unless I see negative "ABS" changes.

What about the 2nd part of the STARTS (# of optimizations in the final stage)? At what point I'll need to increase it?

Thanks a lot!!
 Bengt O. Muthen posted on Monday, June 30, 2014 - 1:51 pm
I recommend using

Starts = y x;

where x is about 1/4 of y. So you want to increase both in tandem.

I always use the default Stiterations and it does not have to do with negative ABS changes. The problem of neg ABS changes has to do with lack of numerical precision with numerical integration, in which case you want to increase the number of integration points.
Back to top
Add Your Message Here
Post:
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Password:
Options: Enable HTML code in message
Automatically activate URLs in message
Action: