Message/Author 

Chelsea Jin posted on Thursday, March 01, 2012  9:19 pm



I've recently worked on a project involving correlated errors between a count and a continuous variables. For example, I have equations like: y1 ON x1  x5; y2 ON x1  x6; Say, y1 is a count variable. It could be negative binomial or zeroinflated. y2 is approximately normally distributed. I want to do "y1 WITH y2", however, the statement doesn't apply to a count variable with a continuous one. So, I'm looking for a solution. Many thanks! 


In your situation, each residual covariance requires one dimension of integration. You need to specify them using the BY option, for example, f BY y1@1 y2; f@1; [f@0]; where the factor loading for y2 will contain the residual covariance parameter. 


Oh, thanks so much, Linda! But does it matter much if y1 is either negative binomial, poisson, or zeroinflated distributed? In addition, what if I want to regress y2 on y1, say "y2 ON x1  x6 y1", should I do further steps to take y1 as a count variable into account? I would appreciate that you will reply me. 


No, it does not matter what type of model you are estimating. You cannot regress y2 on y1 if you have a residual covariance for y2 and y1. Both parameters cannot be identified. 


Oh, it's another question... I mean there's no residual covariance between y2 and y1 this time. It's just a continuous variable regressing on a count one. Maybe the count one is a outcome of another regression,like "y1 ON x1  x5; y2 ON x1  x6 y1;", so y1 is a mediator. I think I read some notes saying in Mplus, if a count variable is a predictor, then it's being considered as a continuous variable. Even it's a mediator, it's still a continuous variable. Am I right? Is there any other situation to deal with the count variable as a mediator? Many thanks. 


When a count variable is a mediator, it is treated as a count variable when it is a dependent variable and a continuous variables when it is an independent variable. 

Chelsea Jin posted on Sunday, March 04, 2012  12:06 pm



Hi, I have questions still back to correlated residuals. Now, I have three regressions: y1 ON x1  x5; y2 ON x1  x6; y3 ON x1  x5; Still, y1 is a count, and y2 is a continuous. y3 could be either count or continuous. What if I want three residuals mutually correlated, should I say: if y3 is continuous: f BY y1@1 y2 y3; f@1; [f@0]; if y3 is count: f BY y1@1 y3@1 y2; f@1; [f@0]; or two factors have to be extracted, one from y1 and the other from y3,like: f1 BY y1@1 y2; f1@1; [f1@0]; f2 BY y3@1 y2; f2@1; [f2@0]; f1 WITH f2; I'm not sure which one should be correct... Then for the first one, "f BY y1@1 y2 y3", I can get factor loadings on y2 and y3, but how can I know the correlation coefficient of the residuals between y2 and y3? The same question for the second situation. Many thanks. 


For each pair of residuals you need one factor. 


Hmmm... but how to correlate two count variables' residuals, since the both factor loadings are 1... Thanks~ 


They are not both one: f BY c1@1 c2; f@1; [f@0]; where the factor loading of c2 is the covariance. 


But it's still confusing~ Mplus also estimates the correlations among the factors... How can I know the correlated factors are not the correlated residuals...? I would appreciate your reply. 


The factors should be uncorrelated: f1 with f2@0 etc 


Hello, This is an interesting problem that my apply to a parallel process model I am running. If one growth process is continuous and the other process is specified with a Poisson distribution through the COUNT command, are the covariances between the latent intercepts of each process (would also apply to the slope) specified correctly by a simple WITH statement, or would I need to specify with the BY command as described above? Thank you for your advice. Nick 


The WITH statement correlate the latent variables. You can use the BY approach to correlate observed outcomes beyond what the correlation among their latents can explain, so a residual correlation. 


OK that helped, thank you. 


Hello, I would like to clarify the points above. I am running a multiple wave autoregressive model with a count variable as a DV. There are five other variables that need to be correlated with the count variable. The advice above indicates that each of the other variables should be specified as a separate, two indicator factor with the count dv (f1 by c@1 y2; f2 by c@1 y3;). With 274 participants, this model won't converge or is reaching saddle points. My question is: would a single latent factor with all of the variables as indicators (f by c@1 Y2 Y3 Y4 Y5 Y6) be appropriate for capturing the residual correlations among all of the variables both with the count variable and with each other? I do not need to be able to see the values of each correlation separately. 


One factor would be an approximation where the different loadings would have to pick up the differentsized correlations. Note, however, that it isn't clear how an autoregressive model with counts should be defined. Each DV can follow a count regression but the IV is treated as a regular continuous variable. That becomes a strange mix of regression equations which doesn't seem right. And, there is no underlying latent response variable concept for count variables that can resolve it as far as I know (which means there is a chance there might be). One way around this is to treat the count variable as an ordinal variable  although not perfect, it might be a practical way out. In that case you can use WLSMV where you have no problems with these correlations. Or use Bayes with ordinal which handles missing data better than WLSMV. Or ML, but then you have the correlation problem. 


Thank you Bengt. Much appreciated. 

Back to top 