Tracy Witte posted on Thursday, August 11, 2011 - 9:35 am
I am working on revisions for a manuscript. One of the reviewers requested a correlation matrix for all of the variables I used in my negative binomial regression model. My DV is a count variable (with a NB distribution), as are some of my predictor variables. Other predictor variables are categorical and continuous. Is it possible to obtain correlations between count variables and other types of variables? Or, should I just report correlations for the binary and continuous variables, and say that it's inappropriate to run correlations with count variables?
Don't know of a citation - I think it can be called common knowledge. There is no such thing as a polychoric/polyserial correlation with count variables because there is not an underlying continuous response variable formulation for counts (at least not one that is generally accepted). And regular Pearson product-moment correlations are not suitable for variables that have such a limited range (non-negative integer values).
For count predictor variables, is it appropriate to present correlations with a square root transformation (or some other transformation) of the count variable in order to approximate the relationship between the count variable and the other variables in the model?
Does your answer to this question change if the predictor count variable is zero-inflated?
To clarify, are you saying that it would be appropriate to present correlations for a log transformed count variable, or is it never appropriate to look at correlations with a count variable?
Relatedly, when examining a count variable with an excess of zeros as an exogenous predictor variable, do you think one should use a log transformation? I was also considering using a truncated version of the count variable (0,1,or 2+).
Thanks in advance for your help! Surprisingly, I can't find very much information about count variables as independent variables, despite lots of information about counts as DVs.
Never appropriate, really, given that such variables have limited range and piling up at zero (censoring).
That could be a good idea, because there is probably diminishing degrees of influence when you get up to high counts. The question is what transformation (or none) that gives the best linear relationship to the DV.
Leon posted on Wednesday, August 28, 2013 - 12:20 am
Apologies if this is not the right thread, but I believe others will also find this thread from Google if they have the same question.
I am using categorical data with WLSMV, but I am also including two continuous variables, age and a sumscore of personality. Is it normal that those two variables are not in the correlation matrix in the tech4 output?
Leon posted on Wednesday, August 28, 2013 - 12:39 am
Apologies for a second post, but similarly if a continuous variable is introduced under STDYX there are only Beta's with no other information such as Std. Error. etc.