

Beginner question on count data (pois... 

Message/Author 


I’m trying to analyze some panel data models with count variables, where my interest is in the behavior of the rate, and its relationship with certain covariates. Most of my data is severely overdispersed, and if I try to use it I receive an error message related to the computation of the posterior distribution. Is that due to the overdispersion?; if that is the reason, there is anyway to model overdispersed or negative binomial counts with Mplus?. I have also tried to model a short example with data that wasn’t overdispersed, but I also got that error message, so I don’t really know where is my mistake. For example, if the data is: id acc aflds lnafl + 1. 442 5776 8.661467 2. 495 6085 8.713582 3. 536 5936 8.688790 4. 480 6008 8.700848 5. 508 6223 8.736008 6. 470 6321 8.751633 7. 495 6569 8.790117 ++ (Where acc: accidents, aflds: exposure, lnafl: ln(aflds)) Then, if I try the simple null model: acc on lnafl@1 I receive the message: “SERIOUS PROBLEM IN THE OPTIMIZATION WHEN COMPUTING THE POSTERIOR DISTRIBUTION. CHANGE YOUR MODEL AND/OR STARTING VALUES.” (The estimated intercept should be 2.528) Thanks in advance, Fernando. 


I don't believe that the problem is overdispersion. Your counts are very high which may indicate they can be treated as continuous. I would not be able to say more without more information. You can send your input, data, output, and license number to support@statmodel.com if you want us to look into this further. 


I was wondering if anyone had thoughts on how to handle this situation. The dependent vairable of interest is a proportion score that has many zero values. If these data were count data, I would use a count distribution to model the data. However, this seems inappropriate with these data. Would it be reasonable to specify the regression model as a binary outcome and for the nonzero proportion values, estimate those as a continuous variable? Thanks! Tom 

WenHsu Lin posted on Monday, November 30, 2015  1:05 am



Hi, Mplus team: I have five variables and 1 distal outcome variable, which is a count variable. These variables are: 2 endogenous variables: X1 and X2 2 mediating outcome variables, both of which are categorical variables: z1 and z2 1 count variables: y (count with 40% of 0). I specified y in mplus as count y (i); I then have the following syntax: z1 on x1 x2; z2 on x1 x2; y on x1 x2 z1 z2; y#1 on x1 x2 z1 z2; However, the result indicated that "THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NONPOSITIVE DEFINITE FIRSTORDER DERIVATIVE PRODUCT MATRIX." The model told me which parameter caused the problem. I ran the model without that parameter in y#1 model and the result was fine. What shall I do in this situation? Did I just left that particular variable out? Thank you. 


Can't tell without seeing the output  send to Support along with your license number. For a count outcome and for categorical mediators you need to understand the paper on our website: Muthén, B. & Asparouhov, T. (2015). Causal effects in mediation modeling: An introduction with applications to latent variables. Structural Equation Modeling: A Multidisciplinary Journal, 22(1), 1223. DOI:10.1080/10705511.2014.935843 

Back to top 

