Fit data to Zero-inflated Poisson
Message/Author
 Ruixue Wang posted on Monday, July 25, 2011 - 8:38 am
Can I fit data to Zero-inflated Poisson or logistic models such as there are two or more classes, each is ZIP model and get BIC,Likelihood test results?
I find codes like this
VARIABLE: NAMES ARE u1-u4;
COUNT = u1-u4 (i);
CLASSES = c (2);
ANALYSIS: TYPE = MIXTURE;
MODEL:
%OVERALL%
i s | u1@0 u2@1 u3@2 u4@3;
ii si | u1#1@0 u2#1@1 u3#1@2 u4#1@3;
OUTPUT: TECH1 TECH8;
But it's a linear model.
 Bengt O. Muthen posted on Monday, July 25, 2011 - 9:18 am
That should work.
 Ruixue Wang posted on Monday, July 25, 2011 - 9:35 am
but if I want to fit data to ZIP model not linear model, what should I do?
 Bengt O. Muthen posted on Monday, July 25, 2011 - 9:58 am
The development for the counts is not linear. The DV for counts is the log rate, where rate is the mean of the count variable. Even if the log rate is given a linear growth model, as you have specified, this still (1) takes into account the count nature of the DV and is therefore a generalized linear model, not a linear model, and (2) is not linear in the counts (but linear in the log rates).
 Ruixue Wang posted on Monday, July 25, 2011 - 10:06 am
so counts is the log rate.
1. Is there any code to fit data directly to ZIP model with multiple classes?
2.There are two parameters in ZIP model, which one is estimated here?
3. How to fit data into a logistic curve?
Thank you
 Ruixue Wang posted on Wednesday, August 10, 2011 - 7:48 am
1.there is a warning when I run GMM with ZIP:COUNT VARIABLE HAS LARGE VALUES.
Your suggestion I found is:You may want to do a log transformation.
But if the data is random ZIP, the DV in GMM with Zip is log rate, is that right to log the count again?
2.How can I fit data to GMM with logistic such as model logit in proc traj?
Thank you.
 Bengt O. Muthen posted on Wednesday, August 10, 2011 - 12:31 pm
1. With high counts, the distribution is close to normal, at least after a log transformation. You are right that you don't want to log transform and then use COUNT=. You should check why you have such high counts; perhaps it is due to a coding error.

2. With Type=Mixture, logit is the default for categorical outcomes. If you mean count mixture modeling, you will have to clarify.
 Ruixue Wang posted on Thursday, September 08, 2011 - 6:57 am
Can Mplus fit logit data (0s and 1s) to GMM as ZIP data/proc traj? How to clarify? Thank you.
 Bengt O. Muthen posted on Thursday, September 08, 2011 - 7:22 am
ZIP refers to count data, so not just 0s and 1s. Mplus can do proc traj modeling.
 Ruixue Wang posted on Thursday, September 08, 2011 - 7:57 am
Sorry&#65292;maybe I did not speak clearly. I did not find codes related to logit data(0s and 1s) with mixture models such as this code related to ZIP
VARIABLE: NAMES ARE u1-u4;
COUNT = u1-u4 (i);
CLASSES = c (2);
ANALYSIS: TYPE = MIXTURE;
MODEL:
%OVERALL%
i s | u1@0 u2@1 u3@2 u4@3;
ii si | u1#1@0 u2#1@1 u3#1@2 u4#1@3;
OUTPUT: TECH1 TECH8;
 Bengt O. Muthen posted on Thursday, September 08, 2011 - 9:33 am
The UG gives examples of growth mixture modeling with binary outcomes. With 0/1 outcomes and logit/probit, there is not an inflation version of the model.
 Chien-Ti Lee posted on Wednesday, September 14, 2011 - 1:37 pm
Hi Drs. Muthen,

I am trying to fit a zip model with co-variates for smoking trajectories.

model:
i s q| smoke1@0 smoke2@1 smoke3@2 smoke4@3...smoke21@20;
ii si qi|smoke1@0 smoke2@1 smoke3@2 smoke4@3...smoke21@20;
i-qi@0
i s q on cov1-cov16;
ii si qi on cov1-cov16

and then I got error messages about non-positive define matrix, I am not sure what causes this problem? Would you mind to point out possible solutions for this model?

Many thanks!
 Bengt O. Muthen posted on Wednesday, September 14, 2011 - 2:15 pm
You should divide your time scores by 10 to make convergence easier - a quadratic model with highest time score of 20 gives a value of 400. Also, do a Type=Basic run with only the 16 covariates to see that you don't have a collinearity problem among them.

If none of this helps, send output to support.
 Chien-Ti Lee posted on Tuesday, September 20, 2011 - 6:16 pm
Hello Drs. Muthen,

Thank you so much for helping. Yes the problem is at the time score.

Now, I got the model fit well. However, I have further questions about the interpretation of the zip growth model with co-variates, especially for the binary part.

I ran the unconditional model, and asked for the probability of being at zero. I found a decreased trajectory over time,and it level off in the middle part, and finally increased again. I had a negative intercept for si and a positive one for qi.

When I added in co-variates, the intercepts for si and qi had similar patten. However, I am not sure how to interpreted a negative influence of a co-variate on si? And how about a positive coefficients with qi?

Again, thank you so much!
 Bengt O. Muthen posted on Wednesday, September 21, 2011 - 6:24 am
Negative influence of a covariate on a negative si: The larger the covariate value, the larger the negative the value of si.

Positive influence of a covariate on a positive qi: The larger the covariate value, the larger the positive value of qi.
 Chien-Ti Lee posted on Wednesday, September 21, 2011 - 7:12 am
Thank you so much, Dr. Muthen!

So,if I had a positive influence of a co-variate on a negative si, and the larger the covar value, then the smaller the negative value of si. In other words, the larger this positive covar, the slower decline of the trajectory compared to a negative covar?. Am I right?

Many Thanks!!!
 Bengt O. Muthen posted on Wednesday, September 21, 2011 - 9:42 pm
Right.
 Ruixue Wang posted on Wednesday, October 05, 2011 - 4:16 pm
Hello,
In tech 14, the default number of draws for bootstrap is 2 to 100 using a sequential method.And it's determined by the program. I don't quite understand this part. Can you tell me how the program decide which number between 2 to 100 to use? Do you think increase the number of draws will lead to great increase of the performance bootstrapped likelihood ratio test, for example, 1000?
Thank you.
 Tihomir Asparouhov posted on Wednesday, October 05, 2011 - 9:48 pm
The algorithm is focused on determining whether or not the P-value is bigger or smaller than 5% using a minimal amount of replicated draws. It is a statistical algorithm that minimizes error. For example, if after 20 draws the current estimate for the P-value is 80%, with a very high probability we can be sure that the P-value is above 5%. If you increase the number of draws to 1000 you will obtain a more precise P-value but it is very unlikely that this will yield a different result for the hypothesis testing of K v.s. K-1 classes.
 Ruixue Wang posted on Thursday, October 06, 2011 - 6:33 am
Thank you for your explanation very much.