

Binary data with monotone missing 

Message/Author 


Hi, I'd like to make regression analysis of Y2 on Y1 both of which are binary. Only Y2 may involve missing values and a missing indicator variable R2 (binary) is introduced. I created the following input model file in Mplus, the model file successfully ran and we obtained a beautiful result. DATA: FILE IS rdata_mnar.dat; VARIABLE: NAMES ARE Y1 Y2 DOSE R2 ; USEVARIABLES ARE Y1 Y2 R2 ; CATEGORICAL ARE Y1 Y2 R2; MISSING = . ; ANALYSIS: TYPE = MISSING; ESTIMATOR=ML; INTEGRATION=MONTECARLO; MODEL: Y2 ON Y1; R2 ON Y1 Y2; My question is kind of unstandard, why doesn't the model file have any trouble? We have the following 6 patterns: P(Y1=0, Y2=0, R2=1) P(Y1=0, Y2=1, R2=1) P(Y1=1, Y2=0, R2=1) P(Y1=1, Y2=1, R2=1) P(Y1=0, R2=0) P(Y1=1, R2=0) Notice that the sum of those cell probabilities is one, so we only have 5 free cell probabilities. On the other hand, there are 6 parameters (3 thresholds and 3 regression coefficients) in this model. So, the parameters are too many (by one) to estimate, and the Mplus should have troubles in estimation. Am I wrong? Any comments are welcome. many thanks. 


The independent variable y1 should not be on the categorical list. This list is for dependent variables only. The degrees of freedom for the model are computed as follows: You have two dependent variables, y2 and r2, and one independent variable y1. The parameters in the H1 model are: one correlation, two thresholds, and two covariances between the dependent and independent variables for a total of 5 parameters. The parameters in the H0 model are: three regression coefficients and two thresholds for a total of 5 parameters. The model is justidentified. 


Dear Dr. Linda Muthen, Thank you very much for your kind explanation. Yes, you are right. I should have treated Y1 as an independent variable. Then I have made sure that the models in H0 and H1 have both 5 parameters. I think that these models were estimable if R2 were not a missing indicator but a usual observation variable. In our case, when R2=0, we have no observation on Y2. We then have the following 6 different patterns: P(Y2=1, R2=1 Y1=1) (a) P(Y2=0, R2=1 Y1=1) (b) P( R2=0 Y1=1) (c) P(Y2=1, R2=1 Y1=0) (d) P(Y2=0, R2=1 Y1=0) (e) P( R2=0 Y1=0) (f) As you can easily make sure, sum of the first three quantities (a) to (c) equals one, and sum of the latter three quantities (d) to (f) equals one, so we only have FOUR different probabilities and the corresponding sample proportions. And we have FIVE parameters to be estimated. I think that the model is then not identifiable. Is it not true? Thank you very much for your assistance. 


Linda's answer is right in general as seen from the WLSMV estimation perspective. But in this special case, the WLSMV H1 residual correlation between your Y2 and R2 won't be estimable because R2 indicates missing on Y2. WLSMV should not be able to compute this. So there are indeed only 4 pieces of information for a model with 5 parameters. Now, you use ML in which case this nonidentification (underidentification) should show up in the information matrix being singular. Perhaps the singularity threshold hasn't been reached  but you probably have a very low condition number printed. Try more precision in the computations by saying integation = montecarlo(5000); You may also want to sharpen the mconvergence criterion. 

Back to top 

