Zero-inflated negative binomial and c... PreviousNext
Mplus Discussion > Multilevel Data/Complex Sample >
 Brondeel Ruben posted on Tuesday, November 09, 2010 - 8:24 am

I'm using zero-inflated negative binomial in a complex dataset (clustering within schools).
If I don't change the starting values, I get a reasonable result. But if I do increase the amount of starting values, I get a result with fixed parameters in the zero-model to avoid singularity.
I was also wondering which technique is used to correct the s.e. for the complex structure.

 Linda K. Muthen posted on Wednesday, November 10, 2010 - 11:09 am
Please send the outputs and your license number to
 Brondeel Ruben posted on Sunday, November 28, 2010 - 11:48 pm

the model below has 2 zero-inflated (poisson) dependent variables. I would like to include a correlation between these 2 variables.
1. Is this correlation specified correctly?
2. Is the standard MLR the one to use?
3. should I be suprised that the other coefficients are quite different, given a high correlation between the 2 independent variables?

Thank you very much for providing these techniques. It's a really nice model I couldn't have fitted before using Mplus.

Usevariables are
prop_del viol_del delgroup att_viol
s_contr mk_tot;
Missing are .;
categorical are delgroup;
count is prop_del viol_del(i);
cluster = school;
analysis: type = complex;
starts 100 20;
model: att_viol on MK_tot ;
S_contr on att_viol MK_tot;
delgroup on S_contr att_viol;
prop_del on MK_tot att_viol S_contr delgroup;
prop_del#1 on MK_tot att_viol S_contr delgroup;
viol_del on MK_tot att_viol S_contr delgroup;
viol_del#1 on MK_tot att_viol S_contr delgroup;
f BY prop_del viol_del;
 Linda K. Muthen posted on Monday, November 29, 2010 - 9:35 am
1. Yes.
2. Yes.
3. No because adding the residual correlation allows the correlation between the two dependent variables to be explained not only by the covariates influencing them.
 Brondeel Ruben posted on Wednesday, December 01, 2010 - 6:07 am

the model is quite stable, and replicable with other start values. However, one of the coefficients is very high. The estimate for prop_del#1 on delgroup is 14.525. The oddsratio is therefore about 2million. This is very strange to report. But it does make sense that this coefficient is very high, only not that high, and the rest of the model makes sense.

Can and should I somehow restrict the size of this parameter? Or should I just report it like it is?
 Linda K. Muthen posted on Wednesday, December 01, 2010 - 9:55 am
The high coefficient just means that some event happens with probability one. I would report it as is.
 Brondeel Ruben posted on Tuesday, December 14, 2010 - 7:51 am

in the model above, is it possible to estimate indirect effects on the zero-inflated dependent variables.
Can Mplus do this? And in what scale would they be? I assume that it would be an effect measured in 2 coefficients, one for the zero-part and one for the count-part?

The interpretation of the indirect effects can be drawn from the path model in terms of positive and negative effects. Which I will report like this. I was just wondering if there would be some estimation of the size of this effect, and some confidence intervals if possible.

Thanks a lot for your support.
 Bengt O. Muthen posted on Tuesday, December 14, 2010 - 8:51 am
The guiding principle for being able to produce indirect effects is that the M ON X and the Y ON M regressions are both linear. This is more general than it sounds. For instance, M can be a latent response variable for a categorical (binary or ordered) observed variable, in which case we call it M*. In for example probit regression M* ON X is then a linear regression. For this example what is required is that Y ON M is also linear. Y can then be a latent continuous response variable for a categorical outcome, a lograte for a count, or a log hazard for a survival variable. But, continuing the example with a categorical observed measure of the mediator M, the key is that Y ON M concerns the latent continuous response variable M*, not the observed categorical measurement. So for instance, with a binary variable it is not the event itself that predicts Y but the tendency for the event to happen.

In this example, Y is a count and I think you had a categorical mediator. That's a tricky combination which Mplus doesn't yet handle. Count Y requires ML which doesn't yet work with a latent response variable M for Y ON M. Bayes can do that, but can't yet do counts.
 Ksenia posted on Monday, December 01, 2014 - 8:56 am

I am looking for a software that can handle three level zero-inflated negative binomial. I would appreciate if you could answer whether:

1a) MPlus works with three-level zero-inflated negative binomial;
1b) MPlus works with three-level negative binomial;

2) if one can graph an interaction in three level negative binomial using MPlus ?
Thank you very much.

 Linda K. Muthen posted on Monday, December 01, 2014 - 10:39 am
Mplus does not do three-level for count variables.
 Tracy Witte posted on Tuesday, February 24, 2015 - 8:10 am
I am running a zero-inflated negative binomial regression in Mplus. I have a categorical predictor with 3 levels (i.e., three different diagnoses). Thus, I am modeling the predictors as a set of two dummy variables, with one of the diagnoses as the reference variable. Since this only allows for the comparison of each of the other diagnoses with the reference group, I also ran the regression again with a different diagnosis as the reference variable so that I can get that final pairwise comparison.

For the first regression, I got this warning message, " WARNING: THE MODEL ESTIMATION HAS REACHED A SADDLE POINT OR A POINT WHERE THE

Based on previous posts, from what I understand, it's ok to ignore this message. However, I did not get this warning message for the second regression I ran. Also, for this regression, the beta weights for the pairwise comparison that is also contained in the original regression are quite different. Normally, when I run regressions with dummy variables, in the second version, one of the beta weights is identical to the first and is just a different sign.

I'm wondering if these results are trustworthy, or if I'm perhaps doing something wrong. Any assistance would be helpful!
 Bengt O. Muthen posted on Tuesday, February 24, 2015 - 8:39 am
Please send the relevant inputs, outputs, data, and license number to Support.
 Calvin D. Croy posted on Tuesday, October 11, 2016 - 11:29 am
Could you please help me interpret the R-square output from running the following ZINB model?

count = numdxy (nbi);
Type = complex;
Integration = Montecarlo;
numdxy on
sexf ... e04 ;
numdxy#1 on
sexf ... e04;
!the following line is included so Mplus executes FIML rather than !listwise deletion;
sexf marcoh ed12sup unemply ipovert discnty e04;

Here's the output that I need help with:

Obs Var Estimate S.E.

NUMDXY 0.479 0.233 NUMDXY 1.000 999.000

My questions:
1) Which line for NUMDXY above is for the count data 0 and higher, and which line is for the logistic regression predicting membership in the "must be zero" group?

2) Why is the Rsquare estimate 1.00 and S.E. of 999?

Thanks for your help!
 Calvin D. Croy posted on Tuesday, October 11, 2016 - 11:38 am
Sorry! I accidentally hit "Post" for the above message before I got the formatting right.

The lines under R-square in the above post should appear as:

Obs Var Estimate S.E.

NUMDXY 0.479 0.233
NUMDXY 1.000 999.000
 Bengt O. Muthen posted on Tuesday, October 11, 2016 - 12:09 pm
Count DVs don't have residual variances and therefore standardization wrt such a DV gets R-square = 1. Ignore this.
 Calvin D. Croy posted on Wednesday, October 12, 2016 - 4:19 pm
Dr. Muthen,
Thank you for your quick response. May I ask for a little more clarification?

1. As far as I can tell either you didn't answer my first question or I'm not smart enough to understand your reply. So, in general, in Mplus output for the R-squares from ZINB models, does the first line containing the name of the outcome variable report the R-square for the model for all observations with outcome values >= 0, or is the first line in the R-square section for the logistic regression predicting membership in the "must have count of 0" group?

2. Does your reply mean that the reported R-square values are for the standardized coefficients? If not, how is standardization a part of computing the R-square values? Thanks for unpacking your reply.
 Bengt O. Muthen posted on Wednesday, October 12, 2016 - 5:52 pm
1. The first line with the value 0.479 is for the latent binary DV of the zero-inflation.

2. R-square is the same for unstandardized and standardized coefficients and computed in the regular fashion as for linear regression.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message