Message/Author 

Jon Elhai posted on Friday, November 30, 2012  6:13 pm



Linda, Do you have sample input syntax for a CFA measurement model, whereby the observed variables are count variables, with zeroinflated negative binomial regression paths as factor loadings? I could not find zeroinflated syntax examples in the Mplus Manual. 


Use the regression example ex3.8 as guide. 

Jon Elhai posted on Saturday, December 01, 2012  11:25 am



Bengt. Would I simply substitute "BY" for the "ON" statements in example 3.8? Or is there something else needed to transfer the syntax in 3.8 to a CFA model? 


I think using BY is all you have to do. 

Jon Elhai posted on Tuesday, December 04, 2012  7:59 am



Bengt. You mentioned that taking example 3.8 and substituting BY for the ON statements would give me neg binomial or poisson (or zeroinflated) regression paths within CFA. But in 3.8, the variable preceding the word "ON" is the count dependent variable. But in a "BY" command for CFA, the variable preceding the word "BY" would not be the count variable, but rather would be the latent factor; the variables after the "BY" would be the count variables. So how I would turn 3.8 into a CFA with something like negative binomial factor loadings estimated? 


f BY y; is the same as y ON f; So you just specify u1u10, say as count negbin: COUNT = u1u10(nb); and then say f BY u1u10; 

Tom Booth posted on Wednesday, June 12, 2013  4:13 am



Dear Linda/Bengt, I am fitting a CFA model with 5 indicators. 3 are ordered categorical and 2 are count variables with a high proportion of zeros. The model is fit with MLR and numerical integration with logit link. I have a number of questions about this model: 1) Is it reasonable to fit zeroinflated parameters within CFA? 2) If the answer to (1) is yes, does the inflation parameter get included as a factor indicator? e.g. f by u1 u1#1 u2 u2#1 .... 3) I am also not sure which combination of standardizations I would need to report in this model for the values to be comparable. For example, I assume the inflation params would be STDY as this is binary. Having run a couple of variates of this model, I tend to receive estimates and pValues in the raw and STD solutions, but estimates of 1.00 and associated 999 for the count variables in the STDYX results. Any assistance would be warmly received. As always, apologies if this has been answered elsewhere but I have struggled to find it. Thanks Tom 


1. Yes. 2. You would have factors with count indicators and factors with inflation indicators. You would not use them in the same factors. 3. Standardization is not done with count variables. 

Tom Booth posted on Wednesday, June 12, 2013  2:17 pm



Thanks Linda. Can I ask why you would fit difference factors for the inflation indicators. Is this because the zero inflation models assume 2 processes are underlying the patterns of responses in the count variables and thus 2 latent factors are required? Best Tom 


What influences the inflation probabilities may be different from what influences the number of counts among those in the nonzero class. 

Tom Booth posted on Wednesday, June 12, 2013  11:27 pm



Thanks both. In sum: 1) Its fine to model inflation params in CFA. 2) Model them on a separate factor to model different influences. 3) Report unstandardized values. Sorry for what may be a further very simplistic question, but is there a good reference/reading for why counts are not standardized? thanks 


In a model where a residual variance is not an estimated parameter, standardization with respect to y cannot be done. You can standardize with respect to x. I know of no article that addresses this. 

Rob Dvorak posted on Monday, April 21, 2014  8:35 am



Hi there, I'm running some CFA models using behavioral observation data where the variables are counts (the number of X type of utterance), but unlike most count variables, their distribution does not really approach a Poisson or Negative Binomial distribution because it's extremely skewed (e.g., the median may be 2 or 3, but valid cases have counts over 50). In fact, when I run count models in Mplus, it gives a warning that counts exceed 50 and perhaps a continuous model would be better. My sense is that there is no ideal model for data distributed this way (count data that are extremely positively skewed), but I figured I would ask if you have any recommendations. 


Is there a substantive reason for the counts over 50. Do they represent a different subpopulation? 

Back to top 