Message/Author 


hi all, could somebody tell me if MPLUS is able to combine poisson distributed count data variables (independent as well as dependent) and normal distributed variables in one path model with observed variables? many thanks in advance, joerg. 


In path analysis, observed outcome variables can be continuous, censored, binary, ordered categorical (ordinal), counts, or combinations of these variable types. In addition, for path analysis for nonmediating outcomes, observed outcomes variables can be unordered categorical (nominal). Observed independent variables can be binary or continuous. 


Hi, I have run a path analysis on a longitudinal dataset which combines continuous and count data, treated with Poisson regression and zeroinflated Poisson regression. After comparing a few alternative models, I have choosen the best model (the lowest BIC). Now, I would like to crossvalidate this model on a second dataset, but I have 2 doubts: 1) How can I evaluate the goodness of the model for the crossvalidation when Poisson regression is involved? I guess that I cannot use the traditional tests of model fits... 2) What is the syntax for the crossvalidation? Are there some specific aspects that I should take into account? The syntax for my model follows: VARIABLE: NAMES ARE y1y6 u1u3 x1x2; USEVARIABLES ARE y1y6 u1u2 x1 x2; COUNT IS u1; COUNT IS u2 (i); MODEL: y1 on x1 x2; y2 on x1 y1 u1; y3 on x1 y1 y2 u2; y4 on x1 x2; y5 on y4 x2 u1; y6 on y4 y5 x2 u2; u1 on x1 x2; u2 on x1 x2; u2#1 on x1 x2; y1 y2 y3 (3); y4 y5 y6 (4); y6 with y3@0; y6 with y1@0; y6 with y2@0; y1 with y3@0; y2 with y1@0; y2 with y3@0; Thank you very much for your time and support. michela 


I'm sure there is a crossvalidation literature out there. I am not familiar with it. With count variables, chisquare and related fit statistics are not available. Nested models can be compared using 2 times the loglikelihood difference which is distributed as chisquare. As far as crossvalidation, I would look at the pattern of signficance across the two data sets. You could also do a multiple group analysis. 


Dear Linda, Thank you very much. Best, michela 


We are running an SEM model in which we want to use zero inflated poisson regression to predict each of 4 count variable criteria. We have checked each count variable to ensure that each has only integer values. Each time we try to run the analysis, we get an error message saying "There is at least one observation in the data set where a count variable has negative or noninteger values." None of the 4 variables shows anything but 0, 1, 2, 3, 4, or 5. We cannot fix it and need help. Thank you. 


It sounds like you have blanks in the data set. This is not allowed with free format data. If you can't figure it out, please send the input, data, output, and your license number to support@statmodel.com. 

sahar shadi posted on Wednesday, August 22, 2012  11:21 pm



Dear all, I am now constructing a SEM model with 1 exogenous variable and 1 final endogenous variable . In between there are 5 key mediators (2 latent and 3 observed). The type of my endegenous variable is count. I used AMOS18 but I know I have to do this analysis with MPLUS Is my analysis completly wrong ? or is better to do with mplus? I am new begginer. please help me thank you very much 


You should not treat a count variable as a continuous variable. This will result in improper results. 

sahar shadi posted on Thursday, August 23, 2012  10:29 pm



Thanks for answer . I did this model with SMARTPLS also . Is it wrong in SMARTPLS also ? thank you in advance 


It would be incorrect if you do not treat the count variable as a count variable. I don't know anything about SMARTPLS so I can't say. 

sahar shadi posted on Friday, August 24, 2012  9:51 am



Thank you very much Linda 


Hello, I am running a model in which I have a count predictor (IV), and continuous mediators and outcomes. Can I use COUNT = IV in Mplus, when the predictor is count (and not the DV)? Would it be appropriate to specify a predictor as a count variable? Thank you so much 


The scale of predictors is not taken into account in regression. Only the scale of dependent variables matters. All predictors are treated as continuous variables. 

Andy Daniel posted on Thursday, June 05, 2014  4:52 am



Two quick questions: Is there any problem using a count variable as a mediator? If there is no problem, it should also be possible to build a simple markov chain with several count variables measured in different time point, right? 


A count mediator (M) presents a problem as I see it. In the regression M ON X; we can specify M as count and do a Poisson regression. But what do we do with M in the regression Y on M; ? ML estimation in Mplus would treat M as continuous which contradicts the first regression. There is no underlying latent response variable M* for counts so WLSMV and Bayes can't use that approach. I see it as an unresolved research area. 


Hello, I'm trying to perform what I believe is a simple analysis. I have a latent variable predicting a count variable and I want to examine group differences. From what I've seen, I have to use the KNOWNCLASS command, but I can't seem to figure out how to satisfy all of the requirements to get the analysis to run. This is the model I *want* to run: USEVARIABLES ARE g2choice ind1ind3; MISSING IS .; GROUPING = GROUP (1 = g1 2 = g2 3 = g3); COUNT = outcome; MODEL: FAC by ind1ind3; outcome on FAC; Could you clarify how I should fix my syntax? Also, with the corrected syntax, will I be able to constrain the ON path across groups to test for differences? I imagine that it won't be quite the same as using the MODEL subgroup commands. Thank you! 


As an example of using Knownclass, see UG ex 8.8, where cg is the Knownclass group variable. Ignore the c variable. Then add %Overall% FAC BY .... outcome on FAC@0; %cg#1% outcome on FAC (p1); %cg#2% outcome on FAC (p2); %cg#3% outcome on FAC (p3); Then you can do any tests you want with p1p3, for instance test that they are the same using Wald testing in Model Test. 


Thank you so much! That was very helpful. Also, to clarify for anyone else who may come across this, my previous example had a typo; g2choice should have been 'outcome' in the usevariables command. My model converged and I just want to be sure I am doing this correctly. Could you verify if I used the CLASS and KNOWNCLASS commands correctly? USEVARIABLES ARE outcome ind1ind3; MISSING IS .; CLASSES = group(3); KNOWNCLASS = group (group = 1 group = 2 group = 3); COUNT = outcome; MODEL: %Overall% FAC by ind1ind3; outcome on FAC@0; %group#1% outcome on FAC(p1); %group#2% outcome on FAC(p2); %group#3% outcome on FAC(p3); ANALYSIS: ALGORITHM=INTEGRATION; TYPE = MIXTURE; I also have a followup question. I want to know which group has the strongest association between FAC and OUTCOME. Knowing they are different is obviously the first step. Will that look like... MODEL TEST: p1=p2; and will I conclude they are different if the Wald test is significant? Finally, to test which group has the strongest association, I would typically look at something like RSQUARE. Is there an equivalent for count variables? Thank you again for all your help! 


Group should be on the NAMES = list. The latent class name should not be the same as this group variable name. Mplus does not test for strongest effect, but equality is tested like you mention, although as 0 = p1  p2; For counts there is no meaningful R2 of the usual kind. You want to ask these more syntaxrelated questions on Support. 


Hi all, i want to run a SEM with two dependent count variables which are zeroinflated and at least five independent continuous variables. So I use the zeroinflated poisson regression as in example 3.8. If I use more as two independent variables for one dependent count variable, then the Error Message in the output is: “COUNT VARIABLE HAS LARGE VALUES. IT MAY BE MORE APPROPRIATE TO TREAT SUCH VARIABLES AS CONTINUOUS” My Questions are: 1.) If possible, how can I transform zeroinflated dependent count variables in continuous (with the poisson assumption)? 2.) Do I need a pc with more power? 3.) In another post with the same error I have read that it’s possible to use the twopart model. But this was in the case of a LGM, so I’m not sure, if this solution also works in my case? Thank you and best regards, Christoph 


Please send the output, the data set, and your license number to support@statmodel.com. 

Sarah Arpin posted on Thursday, February 22, 2018  10:31 am



Hello, I am having the same issues as others have posted with the following error message, when trying to model a DV as count: *** ERROR There is at least one observation in the data set where a count variable has negative or noninteger value. Please check your data and format statement. There are no negative or noninteger values in my DVs, and there are no blanks in the data. Do you have any suggestions for how to proceed? Thank you. 


Sounds like there is something off in the input related to data reading. One way to check that your input is correct is to use the Savedata command to see that the analysis variables contain what you expect. 

Sarah Arpin posted on Friday, February 23, 2018  4:18 pm



Thank you for your fast response and your suggestion. I checked the input file using the Savedata command and all looks fine. I also opened the file within the Mplus Editor, and deleted the strange character at the beginning of the file, resaved, and still received the same error message. There are no blanks in the file. Do you have any other recommendations? Thank you, Sarah 


Then you need to send your output and data to Support along with your license number. 


Hello, What type of standardization should be used when a count variable is the dependent variable in growth curve analyses? STDY or STDYX? Thanks! Hillary 


You should standardize only with respect to the growth factors, so use STD. 


Is that the unstandardized estimate? 


No, it is standardized with respect to the growth factors. 


Ok, thank you! 


Also, why is the STD estimation most appropriate for count data? 


Count DVs don't have a residual so standardizing with respect to such a DV doesn't really make sense. You can standardized wrt to predictors of such a DV and the growth factors are such predictors. STD standardizes wrt factors including growth factors. That's the reasoning. 


I have an independent variable which is a count variable (v) in my model. Everything I have read about count variables in mplus, is about dependent count variables not independent count variables. v's kurtosis and skewness are 3 and 1.4. When i use "COUNT is v(i);" the results and fit indices are different. The question is, when the count variable is not the dependent variable in a model, should I use the statement "COUNT is v(i);", or just run the analysis with MLR and treat v like other variables? any comment would be appreciated. cheers 


Count as an IV is problematic. There isn't an underlying continuous latent response variable that you can use for linearly predicting a DV from it. If you treat it as just another continuous predictor, you have to ask if the scores 0, 1, 2,.. are meaningful for producing a linear prediction of a DV. 


Thanks a lot. I totally agree with you and will try to find a way to make the distribution of the count variable more similar to a normal distribution. Maybe using transformations. but this count variable is an independent variable in a large model. If I include "COUNT is v(i);", will a zeroinflated Poisson regression be estimated? if yes, this is a problem, because my outcome variable is a continuous variable and I need a normal regression. 1 so one question is, what will happen in mplus if I an independent variable is specified to be a count variable not a dependent variable? 2 is there a way to tell mplus that an independent variable is a count variable (and mplus takes this into account), but still run a normal regression? many thanks, Ebi 


Q1: Yes. 1. It will be treated as continuous. 2. Only by assuming that the continuous scores 0, 1, 2... are relevant. You can categorize it and treat is as a Categorical variable using either WLSMV or Bayes in which case an underlying continuous latent X* variable will be the linear predictor. 


Thanks a million for your useful answers. All the best, Ebi. 

Back to top 