Message/Author 

eric baumer posted on Wednesday, January 02, 2002  10:05 am



I have a situation in which I really want to estimate a fairly complex structural equation model, yet the key endogenous variable is probably best represented as a count variable with a poisson distribution (i.e., a count model or negative binomial). Does Mplus handle this type of estimation in an SEM context? Thanks, Eric Baumer 


Mplus does not have a facility for representing count data using a Poisson distribution. Mplus would treat this data as ordered polytomous. 


I'm also working with count data, where I have several count items (number of occasions on which the respondent has performed certain behaviors) which I would like to use in a factor model. Unfortunately, I have incomplete data. Previously, I imputed the count data (using NORM  yes, I know, it's already a distortion), categorized the counts into ordered polytomous variables, and ran a categorical data model in MPlus, combining across imputations. I'd like to avoid the multiple imputation step, if possible, not least because I have a lot of individual items and the imputation takes quite a while. Would using the MLR estimator in 2.1 seem like a reasonable approach for these data? Separately, following up Eric's question, is work being done on Poisson links for this kind of model? I have no idea what the complexities would be, but I'm used to y'all pulling rabbits out of hats :). Thanks, Pat 

bmuthen posted on Tuesday, June 25, 2002  2:33 pm



Yes, I think MLR or MLM can be seen as a rough approximation. Mplus is developing missing data facilities for more types of outcomes. And also planning for Poisson outcomes. 


Excellent! Thank you. 


Dr. Muthen: I am also working on a relatively complex growth curve model that is based on count data. Do you know of any resources that might save me some time writing simulations by discussing the biases that might result from treating data distributed as a poisson with MPlus which would consider them polytomous in nature. I assume that this would entail the violation of the assumption that there is a continuous latent dimension underlying the categorical polytomous data. Thank you. 

bmuthen posted on Sunday, September 22, 2002  9:41 am



I am not aware of any writings on the analysis of Poisson count outcomes using methods for ordered polytomous outcomes. Personally, I would not worry too much because for the simple purpose of regression I don't think the ordered model makes important violations of the nature of count data, although I may be wrong. That is, the ordered polytomous model would certainly not estimate the same parameters as Poisson, but probably fit the data well and point to the same important predictors. Perhaps other Mplus Discussion readers have an opinion here. I don't think of the ordered polytomous model as requiring an assumption of a continuous underlying dimension (a "y*" latent response variables), but merely a model based on proportional odds (see Agresti's book), so that would not be an important consideration to me in this choice. 


Greetings, I am preparing to fit a path analysis model in which there are three sets of observed variables: (a) exogenous variables (either dummycoded or continuous, approximately normally distributed); (b) mediators that are continuous and approximately normally distributed; and (c) a single outcome that is a count and which appears Poisson distributed (perhaps with zeroinflation). We will be using Mplus 3.11 to analyze these data. Of interest are the indirect effects of the (a) variable set on (c). I notice that Mplus 3.11 does not support computation of indirect and total effects models involving count outcomes. Can you tell me how to compute these by hand? More generally, I'm also curious how much the calculations would change under different, but similar scenarios. For instance, how would the computations change if one of the mediating variables was binary or ordered categorical? What about if continuous latent variables are involved at the exogenous, mediating, or outcome stage of the model? Thank you for any tips you can offer on this topic. 

bmuthen posted on Friday, October 01, 2004  1:12 pm



With an ultimate count outcome, the indirect effect could pertain to the log rate that is modeled  so that this is the "y*" variable (in my terms) that we are used to when modeling categorical outcomes. The indirect effect would then simply be the usual product of regression coefficients and SEs computed using the Delta method (see Bollen's book). For count outcomes, Mplus uses ML and in the ML context a mediating variable that is categorical (binary or ordinal) is entering the prediction of the count outcome as an observed variable (not y*), i.e. a score that is treated as continuous. The same approach as above applies. And also for continuous latent variables. 

Anonymous posted on Thursday, October 14, 2004  1:04 pm



Hi Linda, Does mplus version 3 handle mediating count variables? thanks a lot. 

bmuthen posted on Thursday, October 14, 2004  1:18 pm



Yes. This is done in the new ML estimation framework. Note that when used in the equation where it is an exogeneous variable, as opposed to in the equation where it is endogenous, the count variable is treated as a continuous variable (an observed variable rather than the underlying y*type log rate). 

Anonymous posted on Friday, July 15, 2005  2:16 pm



I am glad to see that Mplus 3 can run zeroinflated Poisson model, using cross section or longitudinal data. I have a couple of questions on this issue: 1) It looks like that Mplus only provides Loglikelihood values and Information Criteria for the zeroinflated Poisson model. Is there any way to tell whether a model fits the data? 1) Can I use the Loglikelihood values Mplus provides to conduct LR test for comparing the standard Poisson model with zeroinflated Poisson model? 2) Can I run zeroinflated negative binomial model in Mplus? 3) Can I run other zeromodified (e.g., zerodeflated and zerotruncated) Poisson models in the current version of Mplus? Thank you very much for your help 

bmuthen posted on Friday, July 15, 2005  6:41 pm



1) No fit statistics are offered so one has to work with nested models. I believe you can request RESIDUAL in the Output command to check disagreement between observed and estimated. 2) Yes, these models are properly nested I believe. 3) No, Mplus does not yet have negative binomial modeling. 4) No, that is not implemented yet. Unless it is possible to do it via tricks like the one used in the Version 3 User's Guide ex 7.25 where the ZIP is done via explicit 2class modeling. 

Anonymous posted on Monday, July 18, 2005  9:13 am



Thanks a lot! Hope more zeromodified Poisson models will be integrated in the future version of Mplus. Great service! 

bmuthen posted on Monday, July 18, 2005  5:51 pm



Do you have any good writings to recommend with applications of zerodeflated and/or zerotruncated Poisson? 


Does Mplus support grouped zeroinflated poisson indicators, i.e. indicators of the form "1 through 3", "4 through 8", etc.? Thanks 


No, Mplus does not support this model. 


Hello  I was studying the example 7.25 Linda referenced above. Is there a reference (or discussion of how the output of the two alternative approaches are interpreted)? Did Bengt publish a paper where he used this model on alcohol or substance use data? Thanks 


Hi again  I am reading Bengt's posting above on September 22, 2002 where he wrote "I am not aware of any writings on the analysis of Poisson count outcomes using methods for ordered polytomous outcomes." My question is about the other way around: the analysis of discrete (i.e., 6 response options) data representing unequal count intervals (e.g., 0, 12, 35, 69, 1019, 2039, 40+) using continuous Poisson models. The choice would appear to be between an ordered polytomous outcome (i.e, 0,1,2,3,4,5,6) or a recoded somewhat discrete countapproximation using interval midpoints (or endpoints). A colleague told me Bengt had written a paper on adolescent alcohol use where a Poisson model was used, although there was some discussion as to whether response options or interval midpoints should have been used. I have been unable to clearly identify this paper, or the subsequent critique. Does this ring any bell? Also, what about the more general question as the best approach for modeling data obtained from a survey item about number of drinks in the past month ... etc? The zeroinflated Poisson is attractive because many of these highschool students did not drink and reported zero drinks in the past month. But I am worried that the data is not continuous. Our respondents are nested within school, so with either the discrete or the continuous model, I would want to appropriately handle the nesting. Any advice, references to your own work, pointers to MPLUS examples, or other references would be greatly appreciated. Thanks 


Answer to your first message of today: No, I have not written about this. One relevant, related ref. is the Roeder et al (1999) JASA article on ZIP mixture modeling. 


My 1999 Biometrics paper with Shedden worked with frequency of heavy drinking as the outcome. The outcome was a categorized count outcome such as 0, 12 times, 35 times etc. One could argue that this should be handled via some generalized Poisson model  earlier in this thread a "grouped zeroinflated Poisson" model was mentioned, but Mplus does not support that yet. I tend to want to treat such data as ordered categorical. This then takes care of the strong floor effect. Another approach is 2part modeling which Mplus also supports and that modeling has the advantage of letting covariates have different impact on the probability of engaging in an activity at all vs how much. Mplus can handle 2level data in either case. 


Thanks Bengt  When you refer to twopart modeling, do you refer to Example 7.25, which uses two classes? or do you mean a manual split of the data into drinkers and nondrinkers and I predict frequency of drinking using only the drinkers? Or, perhaps I misunderstand any you mean something else? 


No, I refer to example 6.16. 


I have a mediating and outcome variable that are both count variables. The outcome variable is a categorical count variable. The mediating variable is continuous. Below is the syntax that I used to try to run my model but for some reason the operation can not be performed. Can you let me know what part of the syntax needs to be adjusted? This is a straight forward regression model with only one LV. Thanks. TITLE:1207 Instr mediation model2 DATA: FILE IS "C:\1207.dat"; VARIANCES=CHECK; VARIABLE: NAMES ARE id a5 b2 d1 h5 h9 h17 needdepr recgende forsrv lngsocco lngmedia contlang socethn relginf2; MISSING are all (999); USEVARIABLES ARE a5 b2 d1 h5 h9 h17 needdepr recgende forsrv lngsocco lngmedia contlang socethn relginf2; COUNT ARE h5 h9 h17 forsrv; CATEGORICAL ARE forsrv; ANALYSIS: TYPE=GENERAL; MODEL: cultural_integration by lngsocco lngmedia contlang socethn b2; h9 on a5 cultural_integration recgende relginf2; forsrv on a5 cultural_integration recgende relginf2 h9; d1 with a5 cultural_integration recgende relginf2 forsrv needdepr; needdepr with a5 cultural_integration recgende relginf2 forsrv; OUTPUT: SAMPSTAT RESIDUAL STANDARDIZED; 


I don't think you can have WITH statements that include count variables. If this is not the problem, please send your input, data, output, and license number to support@statmodel.com. 


Drs. Muthen & Muthen, On an earlier post, you mentioned that "Mplus is developing missing data facilities for more types of outcomes". (posted on Tuesday, June 25, 2002  2:33 pm). I am developing some models with a count outcome. These models will be analyzed with longitudinal data, so missing data is an issue. Are there capabilities in the latest version of Mplus for handling missing data on count variables? Thanks so much for your time! Bryan 


Yes. Mplus provides maximum likelihood estimation under MCAR (missing completely at random) and MAR (missing at random; Little & Rubin, 2002) for continuous, censored, binary, ordered categorical (ordinal), unordered categorical (nominal), counts, or combinations of these variable types. 


Many Thanks! 


I would like to model indirect effects using a zero inflated count outcome variable. The IV and mediator are both continuous. I receive an error message when I try to model indirect effects. 


Model Constraint with Count Variable: I have a zero inflated count outcome variable, 2 IVs and 3 mediators. My dataset has missing data. I am trying to compare the strength of different parameters using bootstrapping to derive confidence intervals. None of the estimators that are available for bootsrapping, however, work with count data or missing data. Can you please advise me of my options. 


MODEL INDIRECT is available only for continuous, binary, and ordered categorical dependent variables. Bootstrapping is not available when numerical integration is required. 


I had a brief question. I created a measurement model using both categorical and continuous variables. My final model has correlated errors between the indicators. I would like to create a full model using a count outcome and my measurement model as the exposure. However, I cannot seem to do this in MPlus. Is there any way in the current version of MPlus to include WITH statements with a count outcome? 


You would need to put a factor behind the two indicators as shown in Example 7.16. Note that each residual covariance requires one dimension of numerical integration. 


Thank you! 


Dr. Muthen, I am having problems with the numerical integration in my model per your note above. I have 6 correlated error covariances in my original measurement model. c by x1 x2 x3 x4 x5 x6; f1 by x2 x3; . . . f6 by x5 x6; u1 ON c; Where the x's are my indicators, c is my latent variable of interest and u1 is my count outcome. What would you recommend I do to facilitate the integration if anything. Change the INTEGRATION= in the Analysis step? If so, to what? Thank you! 


Not enough information to say  please send your input, output, data, and license number to support@statmodel.com. 

Mary Campa posted on Tuesday, April 21, 2009  9:21 am



Hello. I am trying to estimate a multiple mediator path model with a negative binomial distributed Y, one continuous M, one Poisson distributed M, and a threelevel X. The convention in my discipline is to present standardized coefficients; however, Mplus will not provide these with the count mediating variables. Is there any way to get these estimates or a reference I can provide as to why these estimates are not valid? Thank you. 


Regression with a count dependent variable does not involve a residual variance parameter. There is also not an underlying continuous DV for which a residual variance can be conceptualized like with logit or probit regression. That's why you don't see standardized count regression coefficients in the literature. You can standardize with respect to the other variables. Note that mediational models are perfectly valid in their raw, unstandardized form. 

Mary Campa posted on Thursday, April 23, 2009  6:03 am



Thank you for your prompt response. Could you tell me what situation would create a significant raw effect and a very nonsignificant standardized effect (in both STDYX and STDY)? How should I interpret this? 


Such big differences are rare, so there is probably something peculiar about this example. Please send the output and license number to support@statmodel.com. 

Rob Dvorak posted on Thursday, October 22, 2009  11:56 am



Hi Drs. Muthen, I am running a model in which I have two latent variables and their interaction predicting two zero inflated negative binomial distribution outcomes, with one of the zinb outcomes predicting the final zinb outcome as well. Mplus ran the model and terminated normally, however, I just want to be sure that what I am getting is valid. I am wondering, because I am only getting a dispersion test for one of the zinbs (the one that serves as a mediator). 


To be able to answer that you need to send your input, output, data, and license number to support@statmodel.com. 


I have a SEM with a latent variable mediating the effect of an intervention on 3 outcome variables (2 count outcomes and 1 binary outcome) and want to be sure I’m calculating and interpreting the effects correctly. For the count outcomes, I used an approach recommended in a previous post and calculated the indirect effects by simply calculating the product of the unstandardized regression coefficient of the intervention on the latent mediator times the unstandardized regression coefficient of the latent mediator on the count outcome variable. Then, I calculated the SE using the delta method; this was also done for the binary outcome. Here are my questions: 1) Did I follow your recommended approach? 2) For the binary outcome, is it appropriate to calculate the % mediation by dividing the indirect effect B by the direct effect (without the mediator) B? For example, x> m> y B=1.969 and x>y B=0.697; 0.697/1.969= .354, or 35.4% of the effect of the intervention was mediated by the mediator. 3) Could the approach used in question 2 be used to calculate the % mediation for the count outcomes? 4) Your previous post states that WITH statements can’t be included with count variables, and I’m unclear why? Is there a reference available for not including the correlations? 5) Would it be useful to exponentiate the product terms (i.e., indirect effects) to get a fractional interpretation? Thank you 


1. It sounds like you did. Remember that the final dependent variable is the log rate. 23. This sounds questionable to me. 4. There is no model estimated variance for a count variable. See one of the Agresti books on categorical data analysis. 5. I think this would work. When you exponentiae a log rate, it becomes a rate. 


Hi  I'm new to path modeling and Mplus(so forgive me if this question is simple!) but I was wondering if there is a way to get fit statistics for path models that have an outcome variable with a negative binomial distribution? 


Chisquare and related fit statistics are not available for these models because means, variances, and covariances are not sufficient statistics for model estimation. In these cases, people compare nested models using loglikelihood difference testing and also look at BIC. 


Thanks for your response  I appreciate it! 


Dear Mplus: I have a multiwave sample of 5000 children involved with the child welfare system. I have measurements of behavior problems (approximately normally distributed) at three times, T2, T3, and T4. I have counts of the number of outofhome placements experienced by the children at three time intervals: from T1 to T2, from T2 to T3, and from T3 to T4. The counts are highly skewed. For any given interval, about 85% of children experience zero placements, about 78% experience one placement only, and about 78% experience multiple placements. I am modeling the effects of: behavior on placement, placement on behavior, behavior on behavior, and placement on placement. For the regressions on placement counts, I used a zeroinflated negative binomial model. This model worked reasonably well (after appropriate starting values were supplied) and provided sensible/interpretable results. Due to the skewness issue, a journal reviewer has recommended that the placement counts be modeled either as binary or ordered categorical variables. My question is: Is the placement count data so skewed that the zero inflated negative binomial is not be recommended and, thus, a binary or ordered categorical model would be preferred. Just hoping to get an opinion on this. I realize there may not be an “answer.” Thanks in advance. Jim 

Rob Dvorak posted on Monday, May 03, 2010  4:39 pm



Hi Drs. Muthen, I have a latent variable interaction predicting a zinb outcome. There is a significant path from the latent variable interaction to the count portion of the model. I computed the simple slopes ala Aiken & West (i.e., +/ 1 SD on one of the latent variables), and computed the SEs of these slopes using asymptotic covariances. I then exponentiated the simple slope coefficients, making them into incident rate ratios. Is this an appropriate way to compute simple slopes for nonlinear models? If not, do you know of a reference for computing these? Thanks in advance. 


James: I would disagree with the reviewer. A count variable is skewed by definition and the count model is made to handle that. 


Rob: I think what you are doing sounds correct. I don't know of a reference. 


Using the output from a negative binomial model in MPlus, how can I calculate a rate ratio? Is there an option for this using the MPlus code? 


You should exponentiate the coefficient. You can do this in MODEL CONSTRAINT and you will obtain a standard error for it. 

Chris posted on Saturday, September 01, 2012  5:35 pm



Is it possible to use a zero inflated variable as a mediator in a SEM? If yes, is there any special treatment on the variable or in the way to interpret the results? I usually use negative binomial regression when this variable is my DV as there is too much over dispersion for a Poisson regression. In the SEM my DV is binary. Thank you. 


Yes. In the y on m regression it is treated as a continuous variable. For issues related to the computation of indirect effects, see the following paper which is available on the website: Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus. 

Chris posted on Sunday, September 02, 2012  5:51 pm



Thank you Linda. Much appreciated. 

Tom Booth posted on Thursday, August 01, 2013  12:27 am



Linda/Bengt, I have a CFA model which forms part of a larger SEM which uses both count and categorical variables. I have been asked by a reviewer to provide some information on how well the model fits. As we only have one model, AIC and BIC are not useful. Also, our sample size is moderate to large (700+) so I am not sure how useful the chisquare for the categorical and count portions of the model will be. I was considering the following as options: 1  fix all loadings in the CFA to nominally small values (.01), and compare AIC and BIC from a psuedomodel of no association to the CFA with free loadings. 2  provide the average of the residual matrix. Does this sound reasonable? Tom 


You might consider looking at TECH10 and also estimating neighboring models. 

Tom Booth posted on Thursday, August 01, 2013  11:36 am



Thanks Linda. My poor phrasing, by point (2) I was referring to a summary of TECH10. Please excuse my ignorance, but what do you mean by neighbouring models? Tom 


A neighboring model would be a model which frees a key parameter that is fixed at zero in the original model. 


I have a path model with three outcomes measured over three time points. Two of the outcomes are continuous and one is a count, and I am using numerical integration. I need to test whether some parameters are significantly different (e.g. stronger) than others. I know that with count outcomes that standardized estimates are not available, and I also read above that bootstrapping is not available when numerical integration is required. Would it be feasible to answer this question by comparing the fit of two models, one where the parameters to be compared are constrained to be equal, and another where they are free? 


You can use chisquare difference testing to test the difference between parameters with the same scale. 


Thank you very much  but I cannot use chisquare difference testing to test the difference between parameters that do not have the same scale? If not, is there any way to compare these parameters? 


No, this is not possible. 

Yvonne LEE posted on Tuesday, April 22, 2014  8:54 am



As a novice to Mplus, I am conducting modeling on 'rape tendency' on a group of offenders and have made enquiries earlier. I have 2 indicators for my rape tendency DV i.e. selfreport number of rape and official record of rape count. The former indicator scored 0 on 64% of the cases and the latter scored 0 on 79%. Because of the very low frequency on higher count, I collapse these 2 indicators into 4 levels. For the selfreport rape, I collapse into 0, 1, 2, >=3 and the same for the official rape count. Question 1: Should I treat the 2 indicators as categorical or count data? The estimation model terminated normally when treating as count data but error message 'THE RESIDUAL COVARIANCE MATRIX (THETA) IS NOT POSITIVE DEFINITE.' appears if treating as categorical data. Pls advise. Question 2: Should I use twopart modeling ? I have tried and the estimation model terminated normally. 


You should treat this as a categorical variable not a count variable. I would not recommend twopart modeling. 

Yvonne LEE posted on Wednesday, April 23, 2014  5:16 pm



Thanks for your advice. I have run analysis treating them as categorical variable but error message of 'nonpositive definite covariance matrices' was shown, involving variable PAbuse. Pls advise how to go about. MODEL: f1 by EAbuse PAbuse SAbuse ENeglect CTS_PV; f2 by Rape0123 SES0123; f2 on f1; One thing is that both categorical indicators have a Ushaped distribution. Does it matter? Major problem observed in TECH4 is correlation between F2 and SES0123 is 1.14. Also, Rsquare for SES0123 is undefined. 


Categorical data methodology can handle variables with both floor and ceiling effects. Please send the output and your license number to support@statmodel.com. 

Back to top 