Mplus Discussion >> Zero-Inflated Negative Binomial Regression

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Zero-Inflated Negative Binomial Regre...

Mplus Discussion > Categorical Data Modeling >

Message/Author

Mark A. Greenbaum posted on Saturday, June 27, 2009 - 12:55 pm

Please excuse me if this has been answered already, but I couldn't tell from the threads.

We want to use ZINB because we have a highly non-normal count distribution (of mental health visits) with a very large number of 0's and standard deviation much larger than the mean. We believe that ZINB is the best way to handle this.

Does M-plus version 5.2 cover this analysis?

Thank you very much!

VAguy

Linda K. Muthen posted on Saturday, June 27, 2009 - 1:11 pm

Yes, this model can be estimated in Mplus. It was added in Version 5.1. See the Version 5.1 Language Addendum and the Version 5.1 Examples Addendum which are on the website with the user's guide.

Bengt O. Muthen posted on Saturday, June 27, 2009 - 3:03 pm

See also the 4th (bottom) web talk at

http://www.statmodel.com/webtalks.shtml

which goes through an example in Hilbe's negative binomial book.

Mark A. Greenbaum posted on Sunday, June 28, 2009 - 11:26 pm

Thank you both for your responses! I will review these this week!

~mag

LAS posted on Tuesday, September 28, 2010 - 12:28 pm

Hello. I am currently running LCGA and GMM models using highly skewed data with a large percentage of 0s. I explored using four different models: the Poisson, Zero-inflated Poisson (ZIP), negative binomial, and Zero-inflated Negative Binomial (ZINB). Compared to the NB and ZINB, the Poisson and ZIP performed poorly (based on the BICs), so I eliminated these models from consideration. When I ran the ZINB, mplus set the logit parameters for all classes to -15 (regardless of how many classes I extracted). As a result, the ZINB seemed to reduce to the NB with the ZINB and the NB producing identical log likelihoods. Based on these results, can I assume that the inflation parameters are not needed and that the most appropriate model is the NB? Thank you!

Bengt O. Muthen posted on Tuesday, September 28, 2010 - 3:57 pm

Yes. I assume you have multiple classes as well. That sometimes removes the need for zero-inflation, at least for NB which already picks up the preponderance of zeros to some extent - at least better than Poisson.

Alexander Kapeller posted on Tuesday, June 14, 2011 - 11:37 am

Hi
A question on missing values. Are they also handled via FIML for ZINB ZIP or inflated hurdle models in Mplus?

Can i argue in a paper that this is possiple because all the mentioned modells are calculated using ML estimator which is the basis also for fiml?

Thanks

Bengt O. Muthen posted on Tuesday, June 14, 2011 - 5:55 pm

I don't think of FIML as being in operation when you have only a single dependent variable as you do in those regression models. Dealing with missing on the DV is simply the same as deleting the subject because it has no information on the relationship between the DV and covariates, nor on the DV. Dealing with missing on covariates goes beyond the regression model.

For FIML - that is ML under MAR - to play a role you need more than one DV so that missing on one of them borrows information from the other.

Jonathon Rendina posted on Wednesday, August 28, 2013 - 4:14 pm

I am running a zero-inflated Poisson LCA model with three count outcomes that measure the number of days of prescription drug use (stimulants, pain killers, and sedatives) in the prior 3 months. I have three questions regarding this analysis:

1. If I have a variable that could indicate true structural zeros (i.e., we measured whether or not participants had ever used each class of drugs in their lives), would it be better to include those as class indicators than to run a zero-inflated model?

2. Can you briefly explain the difference between the default of fixing the inflation parameters across classes versus freeing them? I'm having trouble finding a reference that would assist me in the relative interpretation of the two.

3. The model is giving me a message under the model fit section saying "** Large values were truncated at 9." Does this mean that the analysis truncates all values (which range all the way to a value of 90) to 9 for the purpose of analysis, or just for the purpose of computing chi-square statistics? Can you point me to any references on the appropriateness and interpretability of this and how this affects the sample and estimated means for the count variables?

Thanks in advance for your help!

Bengt O. Muthen posted on Wednesday, August 28, 2013 - 6:44 pm

1. You can create a zero class using it as separate group by Knownclass, where that group has zero prob of Y>0. But it may just complicate matters - I would stick with last month reports.

2. I would let them be different across classes - a high class for instance may have less inflation than a low class. Maybe the 1989 Roeder et al JASA article talks about this.

3. That refers only to the chi-2 testing, not the subsequent analysis.

Sung Joon Jang posted on Tuesday, August 11, 2015 - 8:12 am

I would like to use zero-inflated negative binomial regression, but could find syntax only for ZIP -- COUNT IS u1(i) -- and NB -- COUNT IS u1(nb) in Version 7 Mplus User's Guide. What should I use for ZINB instead of "nb"?

Sung Joon Jang posted on Tuesday, August 11, 2015 - 8:18 am

Never mind. Just found that it was nbi. Sorry about that.

Bengt O. Muthen posted on Tuesday, August 11, 2015 - 1:59 pm

Good.

Amy Hoffmann posted on Thursday, September 22, 2016 - 7:17 am

Hello Drs. Muthen,

I am running analyses using ZINB models to examine gender interactions in a set of variables in the count model. In the zero model, I put three variables that would be theoretically associated with production of excess zeroes.

When I ran the model using the full sample, everything ran fine and there was a significant gender interaction in the count model. I then split the sample by gender and ran similar models in each sample. This model ran fine in the female sample, but is producing the following error in the male sample:

ONE OR MORE PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY DUE TO THE MODEL IS NOT IDENTIFIED, OR DUE TO A LARGE OR A SMALL PARAMETER ON THE LOGIT SCALE. THE FOLLOWING PARAMETERS WERE FIXED:
Parameter 5, TOTALPERP#1 ON INP_NUM
Parameter 3, TOTALPERP#1 ON INC_NUM
Parameter 4, TOTALPERP#1 ON AGE
Parameter 1, [ TOTALPERP#1 ]

While it produced normal looking output in the count model, it's not giving me values for standard error or p in the zero model. Everything I've found seems to suggest that there may be a problem with restricted variance in the fixed variables, but I don't think that's the case here, as the distributions are similar for males and females. Any thoughts you have would be greatly appreciated.

Thanks,

Amy

Bengt O. Muthen posted on Thursday, September 22, 2016 - 9:29 am

Perhaps males don't need the zero-inflation. Check using BIC. You can also run without covariates and see of the inflation intercept gets close to -15, indicating that the inflation part is not needed.

Amy Hoffmann posted on Thursday, September 22, 2016 - 1:31 pm

Hello Dr. Muthen,

Thank you for the prompt reply. I ran the model as a regular negative binomial, and the BIC increased significantly from the ZINB model (going from 702.424 to 724.688), indicating the ZINB is a better fit. What do you mean by run without the covariates?

Thanks again for your help.

Bengt O. Muthen posted on Thursday, September 22, 2016 - 6:07 pm

So instead of saying

y on x;

you say

y;

If this doesn't help, we need to see your input, output, data - send to Support along with your license number.

anonymous Z posted on Thursday, February 15, 2018 - 7:55 am

Dear Drs. Muthen,

I am fitting a negative Binomial (multilevel with random intercept) model. I have two questions:

1. I assume the default estimation is LAPLACE approximation, am I right?

2. Can mplus do Residual PL estimation? If yes, what it the syntax for it?

Thanks so much!

Bengt O. Muthen posted on Thursday, February 15, 2018 - 4:56 pm

1. No. Just regular ML using numerical integration.

2. I don't think so because I don't know what Residual PL estimation is.

anonymous Z posted on Friday, February 16, 2018 - 6:44 am

Hi Dr. Muthen,

Thanks so much for your prompt response!

I am trying to replicate the modeling with SAS, SAS uses residual PL, referring to restricted (residual) pseudo-likelihood algorithm. Sorry not to make it clear in my last message.

Mark Peterman posted on Wednesday, March 21, 2018 - 8:21 am

Hi Dr. Muthen,

When running a nb regression, is there a way that mplus can account for missing data? I typically use FIML when running a linear or logistic regression.

Thanks.

Bengt O. Muthen posted on Wednesday, March 21, 2018 - 2:26 pm

Do you mean missing on covariates? Missing on a single y isn't covered by FIML.

Mark Peterman posted on Wednesday, March 21, 2018 - 2:35 pm

Yes, missing on covariates.

Bengt O. Muthen posted on Wednesday, March 21, 2018 - 2:43 pm

Just bring them into the model like you do with other types of DVs, e.g. by mentioning their variances. It leads to numerical integration however.

Mark Peterman posted on Wednesday, March 21, 2018 - 2:51 pm

Something like this?

COUNT ARE DSA1O2(nb);

Model:
DSA1O2 on sex ethnic groupid ZneighprobMR1 zestotal mom1xsus2;
sex ethnic groupid ZneighprobMR1 zestotal mom1xsus2;

ANALYSIS:
integration=montecarlo;
starts 200 40;

Also, what do you mean by "numerical integration"?

Bengt O. Muthen posted on Wednesday, March 21, 2018 - 4:45 pm

Right.

The missingness on a covariate makes this variable a partially latent variable. The combination of latent variables and count (or categorical) variables calls for numerical integration over the latent variable to express the probabilities of the count outcome categories. I am saying that because with many dimensions of integration slow computations are obtained (check the Summary of Analysis output to see how many dimensions are required).

See also the missing data chapter 10 of our RMA book.

Sarah L. Anderson posted on Thursday, June 14, 2018 - 10:12 am

I apologize if these questions have been answered elsewhere on the message board.

I am conducting a set of NB and ZINB regression models and need to calculate STDX values. I was wondering how to do this manually.

Also, my understanding is that it would be typical to report the unstandardized beta (and SE) and the odds ratio for a ZINB model but would this also apply for a NB model? Or is it more typical to see unstandardized beta (and SE) the and standardized beta (using STDX) reported for both a ZINB and NB model. I hope you may be able to provide your opinion about this, or that others may be able to speak to this as well.

Bengt O. Muthen posted on Thursday, June 14, 2018 - 5:06 pm

The statistics literature does not standardize with counts (and seldom at all). See, e.g. the count book by Hilbe that is referred to in the UG. Incident rate ratios using exp(b) are used. In the social and behavioral sciences standardization is more common. For counts one can multiply by the SD of X to standardize a slope b with respect to X. Standardization wrt to a count Y is not meaningful because there is not a residual in the count model.

For Zero-inflated models a logistic regression part is added. Again, in mainstream statistics, standardization is not typically done for logistic regression. It is possible to standardize wrt to both X and Y if one considers the key DV to be the continuous latent response variable behind the binary observed DV. It has a residual variance even though it is not a free parameter to be estimated.

Sarah L. Anderson posted on Monday, June 18, 2018 - 5:49 pm

Thanks, Bengt. That was very helpful, as were the Hilbe books (2011, 2014) you directed me to.

JIn Liu posted on Monday, April 29, 2019 - 10:58 am

Hello Dr. Muthen,

I am wondering if we can do zinb using multilevel analysis.
Thanks.

Jin