Thessa Wong posted on Wednesday, March 18, 2009 - 8:34 am
Hello, I am doing a simple multiple regression with a count variables as the dependent. I have some missingness in the data which can be considered as missing by design, which I specified in the data command. However, Mplus still deletes all the cases in which there are missings. Is there any way I can keep these cases of which there are some missing covariates by design, in the analysis ?
Mplus deletes observations with missing on covariates. To avoid that you have to bring the covariates into the model instead of just conditioning on them as is done in regression. Mplus brings them into the model if you mention their means or variances. They will then be assumed to be normally distributed.
Dear Bengt--I have a follow-up to the question above. I am trying to model a count outcome using poisson/nb regression in Mplus. I have a small amount missing data on the count outcome and a small amount on the predictors.
You mentioned above that to avoid deleting cases with missing data on the covariates, one can bring them into the model by mentioning their means or variances. I tried this by simply listing the covariates as part of the model statement. I was told by Mplus to using INTEGRATION=MONTECARLO, which I did. All the case were then listed as part of the analysis and there was an additional part of the output that included covariances among all the predictors.
However, I noticed that this model (compared to one where Mplus deletes cases with missing count outcomes or covariates) the LL, AIC, and BIC all increased substantially (which I did not expect). The count regression effects and/or SEs also change slightly (which I might have expected if more data are being used in the analysis).
Can you explain what Mplus is doing when "bringing in the means or variances" of the covariates into the model to make use of all available data?
I appreciate any light you can shed on this situation.
In Mplus terms you are turning x's into y's which changes the metric of the loglikelihood. It is like having more outcomes than you had before. You are no longer considering the likelihood for [y |x] but [y, x].
Thanks for your reply to my question. Since I am not familiar with bringing the covariates into the model in this way, I understand your explanation for why the model likelihood would change, but what does doing this do to the interpretation of the substantive count regression coefficients?
Would what you suggested be akin to regressing the count outcome on the substantive predictors which in turn are regressed on another covariate?
Just trying to wrap my head around an approach I haven't considered before. I much appreciate your insight!
Would you give an example of "mention[ing] their means or variances" as you suggested in your 3.20.09 post? I am unsure whether I have done this correctly.
I have a similar situation as described above with a count variable as the dependent variable. Two of my x variables (variable a & variable b) have some missing data and the dependent count variable also has missing data. I added a line for the two x variables after the model specification to mention their means... [a]; [b];
This added in cases with missing data for variable a, but not for variable b. Variable a is an exogenous variable; for variable b, I am examining indirect effects from another x variable through variable b to variable c (count dependent variable). Could this be the issue? Is there a way to have cases with missing data for variable b included as well?
In the output, I receive this statement: "Data set contains cases with missing on all variables except x-variables. These cases were not included in the analysis."