Count variable correlation matrix
Message/Author
 Tracy Witte posted on Thursday, August 11, 2011 - 9:35 am
I am working on revisions for a manuscript. One of the reviewers requested a correlation matrix for all of the variables I used in my negative binomial regression model. My DV is a count variable (with a NB distribution), as are some of my predictor variables. Other predictor variables are categorical and continuous. Is it possible to obtain correlations between count variables and other types of variables? Or, should I just report correlations for the binary and continuous variables, and say that it's inappropriate to run correlations with count variables?
 Linda K. Muthen posted on Friday, August 12, 2011 - 8:50 am
Correlations are not relevant for count variables. You should report the correlations for the binary and continuous variables.
 Tracy Witte posted on Friday, August 12, 2011 - 9:05 am
Thank you so much! Is there a citation I can use for this, or is this pretty much common knowledge?
 Bengt O. Muthen posted on Friday, August 12, 2011 - 9:17 am
Don't know of a citation - I think it can be called common knowledge. There is no such thing as a polychoric/polyserial correlation with count variables because there is not an underlying continuous response variable formulation for counts (at least not one that is generally accepted). And regular Pearson product-moment correlations are not suitable for variables that have such a limited range (non-negative integer values).
 Moira Haller posted on Sunday, October 02, 2011 - 6:49 pm
For count predictor variables, is it appropriate to present correlations with a square root transformation (or some other transformation) of the count variable in order to approximate the relationship between the count variable and the other variables in the model?

Does your answer to this question change if the predictor count variable is zero-inflated?
 Linda K. Muthen posted on Monday, October 03, 2011 - 12:06 pm
I would not present the correlations. These are not appropriate measures for a variable with a limited range.

You could consider a transformation of log (count + 1).
 Moira Haller posted on Monday, October 03, 2011 - 5:03 pm
To clarify, are you saying that it would be appropriate to present correlations for a log transformed count variable, or is it never appropriate to look at correlations with a count variable?

Relatedly, when examining a count variable with an excess of zeros as an exogenous predictor variable, do you think one should use a log transformation? I was also considering using a truncated version of the count variable (0,1,or 2+).

Thanks in advance for your help! Surprisingly, I can't find very much information about count variables as independent variables, despite lots of information about counts as DVs.
 Bengt O. Muthen posted on Monday, October 03, 2011 - 8:29 pm
Never appropriate, really, given that such variables have limited range and piling up at zero (censoring).

That could be a good idea, because there is probably diminishing degrees of influence when you get up to high counts. The question is what transformation (or none) that gives the best linear relationship to the DV.
 Leon posted on Wednesday, August 28, 2013 - 12:20 am
Apologies if this is not the right thread, but I believe others will also find this thread from Google if they have the same question.

I am using categorical data with WLSMV, but I am also including two continuous variables, age and a sumscore of personality. Is it normal that those two variables are not in the correlation matrix in the tech4 output?

Thank you.
 Leon posted on Wednesday, August 28, 2013 - 12:39 am
Apologies for a second post, but similarly if a continuous variable is introduced under STDYX there are only Beta's with no other information such as Std. Error. etc.
 Linda K. Muthen posted on Wednesday, August 28, 2013 - 10:16 am
TECH4 usually contains only latent variable information and RESIDUAL contains observed variable information.

With WLSMV when the model has covariates, all standardizations are not given and standard errors of the standardized estimates are not given. This will change in the next version of Mplus.
 Leon posted on Wednesday, August 28, 2013 - 11:15 am
Thanks for the response Linda.

May I ask the reason why it currently does not provide it, or is it just something that was on the to-do list?

Also, any ideas on when the next version will release? ;)
 Linda K. Muthen posted on Wednesday, August 28, 2013 - 5:57 pm
We didn't get to it until now.

A few months.