Count Data Within Path Models PreviousNext
Mplus Discussion > Structural Equation Modeling >
 Joerg Luedicke posted on Thursday, September 15, 2005 - 1:38 am
hi all,

could somebody tell me if MPLUS is able to combine poisson distributed count data variables (independent as well as dependent) and normal distributed variables in one path model with observed variables? many thanks in advance, joerg.
 Linda K. Muthen posted on Thursday, September 15, 2005 - 8:15 am
In path analysis, observed outcome variables can be continuous, censored, binary, ordered categorical (ordinal), counts, or combinations of these variable types. In addition, for path analysis for non-mediating outcomes, observed outcomes variables can be unordered categorical (nominal). Observed independent variables can be binary or continuous.
 Michela Addis posted on Monday, May 11, 2009 - 7:03 am
I have run a path analysis on a longitudinal dataset which combines continuous and count data, treated with Poisson regression and zero-inflated Poisson regression. After comparing a few alternative models, I have choosen the best model (the lowest BIC). Now, I would like to cross-validate this model on a second dataset, but I have 2 doubts:
1) How can I evaluate the goodness of the model for the cross-validation when Poisson regression is involved? I guess that I cannot use the traditional tests of model fits...
2) What is the syntax for the cross-validation? Are there some specific aspects that I should take into account? The syntax for my model follows:
VARIABLE: NAMES ARE y1-y6 u1-u3 x1-x2;
USEVARIABLES ARE y1-y6 u1-u2 x1 x2;
COUNT IS u2 (i);

y1 on x1 x2;
y2 on x1 y1 u1;
y3 on x1 y1 y2 u2;
y4 on x1 x2;
y5 on y4 x2 u1;
y6 on y4 y5 x2 u2;
u1 on x1 x2;
u2 on x1 x2;
u2#1 on x1 x2;
y1 y2 y3 (3);
y4 y5 y6 (4);
y6 with y3@0;
y6 with y1@0;
y6 with y2@0;
y1 with y3@0;
y2 with y1@0;
y2 with y3@0;

Thank you very much for your time and support.
 Linda K. Muthen posted on Tuesday, May 12, 2009 - 9:02 am
I'm sure there is a cross-validation literature out there. I am not familiar with it.

With count variables, chi-square and related fit statistics are not available. Nested models can be compared using -2 times the loglikelihood difference which is distributed as chi-square.

As far as cross-validation, I would look at the pattern of signficance across the two data sets. You could also do a multiple group analysis.
 Michela Addis posted on Sunday, May 24, 2009 - 1:00 am
Dear Linda,
Thank you very much.
Best, michela
 Tamika Zapolski posted on Wednesday, November 02, 2011 - 2:08 pm
We are running an SEM model in which we want to use zero inflated poisson regression to predict each of 4 count variable criteria. We have checked each count variable to ensure that each has only integer values. Each time we try to run the analysis, we get an error message saying "There is at least one observation in the data set where a count variable has negative or non-integer values." None of the 4 variables shows anything but 0, 1, 2, 3, 4, or 5. We cannot fix it and need help. Thank you.
 Linda K. Muthen posted on Wednesday, November 02, 2011 - 3:43 pm
It sounds like you have blanks in the data set. This is not allowed with free format data. If you can't figure it out, please send the input, data, output, and your license number to
 sahar shadi posted on Wednesday, August 22, 2012 - 11:21 pm
Dear all, I am now constructing a SEM model with 1 exogenous variable and 1 final endogenous variable . In between there are 5 key mediators (2 latent and 3 observed). The type of my endegenous variable is count. I used AMOS18 but I know I have to do this analysis with MPLUS Is my analysis completly wrong ? or is better to do with mplus? I am new begginer. please help me thank you very much
 Linda K. Muthen posted on Thursday, August 23, 2012 - 8:31 am
You should not treat a count variable as a continuous variable. This will result in improper results.
 sahar shadi posted on Thursday, August 23, 2012 - 10:29 pm
Thanks for answer .

I did this model with SMARTPLS also .

Is it wrong in SMARTPLS also ?

thank you in advance
 Linda K. Muthen posted on Friday, August 24, 2012 - 6:28 am
It would be incorrect if you do not treat the count variable as a count variable. I don't know anything about SMARTPLS so I can't say.
 sahar shadi posted on Friday, August 24, 2012 - 9:51 am
Thank you very much Linda
 Camilla Overup posted on Tuesday, August 20, 2013 - 7:51 am

I am running a model in which I have a count predictor (IV), and continuous mediators and outcomes. Can I use COUNT = IV in Mplus, when the predictor is count (and not the DV)? Would it be appropriate to specify a predictor as a count variable?

Thank you so much
 Linda K. Muthen posted on Tuesday, August 20, 2013 - 8:09 am
The scale of predictors is not taken into account in regression. Only the scale of dependent variables matters. All predictors are treated as continuous variables.
 Andy Daniel posted on Thursday, June 05, 2014 - 4:52 am
Two quick questions: Is there any problem using a count variable as a mediator? If there is no problem, it should also be possible to build a simple markov chain with several count variables measured in different time point, right?
 Bengt O. Muthen posted on Thursday, June 05, 2014 - 2:12 pm
A count mediator (M) presents a problem as I see it. In the regression


we can specify M as count and do a Poisson regression. But what do we do with M in the regression

Y on M;

? ML estimation in Mplus would treat M as continuous which contradicts the first regression. There is no underlying latent response variable M* for counts so WLSMV and Bayes can't use that approach. I see it as an unresolved research area.
 Daniel Forster posted on Monday, January 18, 2016 - 2:26 pm

I'm trying to perform what I believe is a simple analysis. I have a latent variable predicting a count variable and I want to examine group differences. From what I've seen, I have to use the KNOWNCLASS command, but I can't seem to figure out how to satisfy all of the requirements to get the analysis to run.

This is the model I *want* to run:

USEVARIABLES ARE g2choice ind1-ind3;
GROUPING = GROUP (1 = g1 2 = g2 3 = g3);
COUNT = outcome;

FAC by ind1-ind3;
outcome on FAC;

Could you clarify how I should fix my syntax? Also, with the corrected syntax, will I be able to constrain the ON path across groups to test for differences? I imagine that it won't be quite the same as using the MODEL subgroup commands.

Thank you!
 Bengt O. Muthen posted on Monday, January 18, 2016 - 2:36 pm
As an example of using Knownclass, see UG ex 8.8, where cg is the Knownclass group variable. Ignore the c variable. Then add

FAC BY ....
outcome on FAC@0;

outcome on FAC (p1);
outcome on FAC (p2);
outcome on FAC (p3);

Then you can do any tests you want with p1-p3, for instance test that they are the same using Wald testing in Model Test.
 Daniel Forster posted on Monday, January 18, 2016 - 3:25 pm
Thank you so much! That was very helpful. Also, to clarify for anyone else who may come across this, my previous example had a typo; g2choice should have been 'outcome' in the usevariables command.

My model converged and I just want to be sure I am doing this correctly. Could you verify if I used the CLASS and KNOWNCLASS commands correctly?

USEVARIABLES ARE outcome ind1-ind3;
CLASSES = group(3);
KNOWNCLASS = group (group = 1 group = 2 group = 3);
COUNT = outcome;

FAC by ind1-ind3;
outcome on FAC@0;
outcome on FAC(p1);
outcome on FAC(p2);
outcome on FAC(p3);


I also have a follow-up question. I want to know which group has the strongest association between FAC and OUTCOME. Knowing they are different is obviously the first step. Will that look like...


and will I conclude they are different if the Wald test is significant?

Finally, to test which group has the strongest association, I would typically look at something like R-SQUARE. Is there an equivalent for count variables?

Thank you again for all your help!
 Bengt O. Muthen posted on Monday, January 18, 2016 - 5:34 pm
Group should be on the NAMES = list. The latent class name should not be the same as this group variable name.

Mplus does not test for strongest effect, but equality is tested like you mention, although as

0 = p1 - p2;

For counts there is no meaningful R2 of the usual kind.

You want to ask these more syntax-related questions on Support.
 Christoph Weiss posted on Wednesday, August 24, 2016 - 1:15 am
Hi all,

i want to run a SEM with two dependent count variables which are zero-inflated and at least five independent continuous variables. So I use the zero-inflated poisson regression as in example 3.8.
If I use more as two independent variables for one dependent count variable, then the Error Message in the output is:


My Questions are:

1.) If possible, how can I transform zero-inflated dependent count variables in continuous (with the poisson assumption)?

2.) Do I need a pc with more power?

3.) In another post with the same error I have read that it’s possible to use the two-part model. But this was in the case of a LGM, so I’m not sure, if this solution also works in my case?

Thank you and best regards,

 Linda K. Muthen posted on Wednesday, August 24, 2016 - 9:00 am
Please send the output, the data set, and your license number to
 Sarah Arpin posted on Thursday, February 22, 2018 - 10:31 am

I am having the same issues as others have posted with the following error message, when trying to model a DV as count:

There is at least one observation in the data set where a count variable
has negative or non-integer value. Please check your data and format statement.

There are no negative or non-integer values in my DVs, and there are no blanks in the data. Do you have any suggestions for how to proceed? Thank you.
 Bengt O. Muthen posted on Thursday, February 22, 2018 - 3:26 pm
Sounds like there is something off in the input related to data reading. One way to check that your input is correct is to use the Savedata command to see that the analysis variables contain what you expect.
 Sarah Arpin posted on Friday, February 23, 2018 - 4:18 pm
Thank you for your fast response and your suggestion. I checked the input file using the Savedata command and all looks fine. I also opened the file within the Mplus Editor, and deleted the strange character at the beginning of the file, resaved, and still received the same error message. There are no blanks in the file. Do you have any other recommendations?

Thank you,

 Bengt O. Muthen posted on Friday, February 23, 2018 - 4:40 pm
Then you need to send your output and data to Support along with your license number.
 Hillary Gorin posted on Friday, April 05, 2019 - 6:08 pm

What type of standardization should be used when a count variable is the dependent variable in growth curve analyses?


 Bengt O. Muthen posted on Saturday, April 06, 2019 - 1:04 pm
You should standardize only with respect to the growth factors, so use STD.
 Hillary Gorin posted on Saturday, April 06, 2019 - 1:21 pm
Is that the unstandardized estimate?
 Bengt O. Muthen posted on Saturday, April 06, 2019 - 3:39 pm
No, it is standardized with respect to the growth factors.
 Hillary Gorin posted on Saturday, April 06, 2019 - 4:03 pm
Ok, thank you!
 Hillary Gorin posted on Saturday, April 06, 2019 - 4:13 pm
Also, why is the STD estimation most appropriate for count data?
 Bengt O. Muthen posted on Monday, April 08, 2019 - 9:23 am
Count DVs don't have a residual so standardizing with respect to such a DV doesn't really make sense. You can standardized wrt to predictors of such a DV and the growth factors are such predictors. STD standardizes wrt factors including growth factors. That's the reasoning.
 Ebrahim Hamedi posted on Thursday, July 04, 2019 - 9:05 pm
I have an independent variable which is a count variable (v) in my model. Everything I have read about count variables in mplus, is about dependent count variables not independent count variables. v's kurtosis and skewness are 3 and 1.4. When i use "COUNT is v(i);" the results and fit indices are different.
The question is, when the count variable is not the dependent variable in a model, should I use the statement "COUNT is v(i);", or just run the analysis with MLR and treat v like other variables? any comment would be appreciated. cheers
 Bengt O. Muthen posted on Friday, July 05, 2019 - 3:40 pm
Count as an IV is problematic. There isn't an underlying continuous latent response variable that you can use for linearly predicting a DV from it. If you treat it as just another continuous predictor, you have to ask if the scores 0, 1, 2,.. are meaningful for producing a linear prediction of a DV.
 Ebrahim Hamedi posted on Friday, July 05, 2019 - 10:25 pm
Thanks a lot. I totally agree with you and will try to find a way to make the distribution of the count variable more similar to a normal distribution. Maybe using transformations. but this count variable is an independent variable in a large model.

If I include "COUNT is v(i);", will a zero-inflated Poisson regression be estimated? if yes, this is a problem, because my outcome variable is a continuous variable and I need a normal regression.

1- so one question is, what will happen in mplus if I an independent variable is specified to be a count variable not a dependent variable?

2- is there a way to tell mplus that an independent variable is a count variable (and mplus takes this into account), but still run a normal regression?

many thanks,
 Bengt O. Muthen posted on Saturday, July 06, 2019 - 3:22 pm
Q1: Yes.

1. It will be treated as continuous.

2. Only by assuming that the continuous scores 0, 1, 2... are relevant.

You can categorize it and treat is as a Categorical variable using either WLSMV or Bayes in which case an underlying continuous latent X* variable will be the linear predictor.
 Ebrahim Hamedi posted on Saturday, July 06, 2019 - 9:33 pm
Thanks a million for your useful answers. All the best, Ebi.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message