CFA of count data PreviousNext
Mplus Discussion > Confirmatory Factor Analysis >
 J.D. Haltigan posted on Thursday, February 13, 2014 - 11:54 am

Is it possible to model count data in an IRT framework using a zip model? I would like to adopt an IRT/latent trait model for the count data to generate IRT graphics as are generated in the dichotomous case. The only way I have been able to do this so far is to recode zero-inflated data to presence/absence--the implications of which I am still not certain about.
 Bengt O. Muthen posted on Friday, February 14, 2014 - 10:50 am
Yes. I think we have an example of that in the UG.
 J.D. Haltigan posted on Saturday, February 15, 2014 - 12:44 pm
Thank you. I also consulted short course Topic 2 notes re: IRT using NLSY data as an example. I have two remaining questions.

Is it possible to generate item characteristic curves when data are specified as count?

In a ZIP count model, would not estimates from the binary component approximate those obtained when data are treated as categorical in prototypical IRT models?

Many thanks.
 Linda K. Muthen posted on Sunday, February 16, 2014 - 11:13 am
We don't give ICC's for count variables.

These should be similar.
 J.D. Haltigan posted on Sunday, February 16, 2014 - 11:59 pm
Thank you. I just noticed in the Topic 2 notes for the NLSY example, the estimator used was MLR rather than the default estimator of WLMSV for categorical indicators. Was this decision made to better deal with the non-normality of the data?

The reason I ask is that I get some different item discriminatory results depending on the estimator I choose. Incidentally, the ones most impacted are those which when treated as count data have the narrowest range of values.
 Linda K. Muthen posted on Monday, February 17, 2014 - 2:03 pm
Which slide are you looking at>

I am not aware of any estimator choice for a count outcome other than maximum likelihood.
 J.D. Haltigan posted on Monday, February 17, 2014 - 2:10 pm
Slide 98 for NLSY example with categorical indicators.
 Linda K. Muthen posted on Monday, February 17, 2014 - 2:18 pm
We use MLR because we are doing a logistic IRT. WLSMV provides probit regression only.
 J.D. Haltigan posted on Monday, February 17, 2014 - 10:52 pm
Thank you. Estimates from the IRT model where I treat outcomes as binary (0/1) categorical and from the zero-inflated part of the model where I treat data as count are indeed similar (both using MLR of course).

Loading estimate magnitudes for the binary IRT model using MLR are somewhat different than loading estimate magnitudes for the binary IRT model when WLMSV is used (as are associated p-values, most of which are NS using MLR). My best guess is that this is due to how MLR handles non-normality...are there any other likely reasons why the loading estimates would differ somewhat between the two estimator methods?
 Linda K. Muthen posted on Tuesday, February 18, 2014 - 6:07 am
WLSMV is probit. MLR is logistic. Missing data handling differs between the two estimators. Normality of the observed variables is not an issue with categorical data methodology.
 J.D. Haltigan posted on Saturday, March 29, 2014 - 11:10 pm
I wanted to follow-up on this thread as I am following Wang (2010) in attempting to understand results from the IRT-ZIP model described above. More specifically, in the binary portion of the IRT-ZIP model, would parameter estimates (discrimination/location) be essentially describing the items' ability to predict the structural zero group (where p(u#=1)) whereas in the count portion they would be describing the items' ability to predict the estimated frequency/count given that the subject is in the Poisson process?

Also, would the intercepts in the CFA count model be analagous to the item difficulties/location parameters when data are treated as categorical (rather than count)? I ask as I am attempting to compare results from a 2PLM where I treated data as binary so as to generate ICCs to the ZIP-IRT approach.
 Bengt O. Muthen posted on Sunday, March 30, 2014 - 4:31 pm
In a factor analysis model such as UG ex 5.4, the latent binary u# variables are not related to the factor, but only their means are estimated. So the loadings/discriminations refer to the count part of the variable.

The intercepts for counts play the role of negative thresholds for binary items.
 J.D. Haltigan posted on Sunday, March 30, 2014 - 9:04 pm
Thanks Dr. Muthen. So if I am understanding correctly, the factor loadings are, in a sense, 'controlling' for that part of the model that is captured by the perfect zero state? Said differently, I am trying to better understand how this model (zip) differs from the same CFA of count data using a plain Poisson process.
 Bengt O. Muthen posted on Monday, March 31, 2014 - 8:22 am
The factor loadings and intercepts are better estimated when ZIP is used due to a need to handle the excess number of zeros.
 Emily Lowthian posted on Tuesday, April 09, 2019 - 5:31 am
Hi there,

I have conducted a CFA (n=9557) with seven count variables, using a negative binomial regression and MLR estimator (default).

I have not used STD (or any derivative) as you mention that count variables cannot be standardized. The chi2 suggests good model fit (X=45494.7, df=77826, p=1.00). The factor loading's look fine (.54-.95) however, the variance for the latent variable is 3.106. It usually is 1.00 when I have previously done CFA with categorical variables, do you know why the variance for the latent variable is so large? Is this normal for CFA with count variables?

Many thanks,

Emily Lowthian
 Bengt O. Muthen posted on Tuesday, April 09, 2019 - 5:36 pm
You can use STD which standardized wrt the factors. I would not necessarily worry about these large factor loadings.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message