I'm running a simple regression with a single predictor called X. I want another variable Z included in the modeling, since the probability that X has a missing value is related to Z. By including Z the missing data will satisfy the Missing at Random (MAR) assumption.
1. If Z is in my dataset, but not listed in the Model statement and not listed as an auxillary variable, will its relation with X be considered so that the MAR assumption can be satisfied? Or must Z be included in the Auxiliary command with the m setting?
2. If I list Z on the Auxiliary command,does Mplus include it per the Model 2 ("extra DV") or Model 3 ("saturated correlates")of Graham 2003 or in some other way? Is the way it's implemented in Mplus superior to Graham's models 2 and 3?
3. Considering SEM models in general, is there ever a need to explicity fit Graham's models 2 or 3 rather than rely on the Auxiliary command with the m setting to bring the auxiliary variables I want considered into the modeling (so as to meet the MAR assumption)?
Thanks for your reply and the links! The web talk was very clear, and the Monte Carlo simulation in it provided two good demonstrations -- both of the ease of doing simulations in Mplus and of the biases that occur when variables related to missingness are not included as auxiliary variables.
I'm trying to conduct a simulation study related to auxiliary variables. For a simple bivariate regression (replicating some of Collins et al. 2001), I get different results when I use the "auxiliary" command versus when I specify the saturated correlates model manually.
The following is my code using the auxiliary command:
VARIABLE: NAMES = x y z; USEVAR = x y; MISSING = ALL(-999); AUXILIARY = (m) z;
MODEL: y ON x*0.6;
which returns an average estimate of 0.5389 for the regression weight (note, I obtain an identical estimate when I exclude the auxiliary variable).
The following is my model statement for the saturated correlates version of this model
MODEL: eta BY y; y@0; xi BY x; x@0; eta ON xi*0.6; aux BY z; z@0; aux WITH eta;
which returns an average estimate of 0.5945 for the regression weight.
Can you see what I'm doing incorrectly? I should get the same results from the two approaches, right?
Thanks for your response. I'm not sure I understand, though, how the paper you mentioned relates to my problem. The paper seems to be about issues with auxiliary variables (i.e., distal outcomes) in a latent class framework rather than auxiliary variables related to missing data. Is there a connection that I'm not seeing?
Also, if the difference in the two approaches is expected, do you have a sense of which approach is to be preferred?