Message/Author 


I'm running a simple regression with a single predictor called X. I want another variable Z included in the modeling, since the probability that X has a missing value is related to Z. By including Z the missing data will satisfy the Missing at Random (MAR) assumption. 1. If Z is in my dataset, but not listed in the Model statement and not listed as an auxillary variable, will its relation with X be considered so that the MAR assumption can be satisfied? Or must Z be included in the Auxiliary command with the m setting? 2. If I list Z on the Auxiliary command,does Mplus include it per the Model 2 ("extra DV") or Model 3 ("saturated correlates")of Graham 2003 or in some other way? Is the way it's implemented in Mplus superior to Graham's models 2 and 3? 3. Considering SEM models in general, is there ever a need to explicity fit Graham's models 2 or 3 rather than rely on the Auxiliary command with the m setting to bring the auxiliary variables I want considered into the modeling (so as to meet the MAR assumption)? Thanks for your assistance! 


See our technical appendix document http://www.statmodel.com/download/AuxM2.pdf In other words, 1. z must included in the aux list with m 2. Mplus uses the saturated correlates approach 3. the Mplus approach with aux m is the saturated correlates  with the needed additional conveniences (see doc). 


See also my web talk on this topic at http://www.statmodel.com/webtalks.shtml 


Thanks for your reply and the links! The web talk was very clear, and the Monte Carlo simulation in it provided two good demonstrations  both of the ease of doing simulations in Mplus and of the biases that occur when variables related to missingness are not included as auxiliary variables. 


I'm trying to conduct a simulation study related to auxiliary variables. For a simple bivariate regression (replicating some of Collins et al. 2001), I get different results when I use the "auxiliary" command versus when I specify the saturated correlates model manually. The following is my code using the auxiliary command: VARIABLE: NAMES = x y z; USEVAR = x y; MISSING = ALL(999); AUXILIARY = (m) z; MODEL: y ON x*0.6; which returns an average estimate of 0.5389 for the regression weight (note, I obtain an identical estimate when I exclude the auxiliary variable). The following is my model statement for the saturated correlates version of this model MODEL: eta BY y; y@0; xi BY x; x@0; eta ON xi*0.6; aux BY z; z@0; aux WITH eta; which returns an average estimate of 0.5945 for the regression weight. Can you see what I'm doing incorrectly? I should get the same results from the two approaches, right? 


One additional note on the above post. I realized there's a much more straightforward way to specify the saturated correlates model for the bivariate regression. I ran the following model MODEL: y ON x; z WITH x; z WITH y; and obtained the same results as the more complicated version in the previous post (avg. b = 0.5945), but still different results than if I use the "auxiliary" variable command. 


These differences are to be expected. This is discussed in the following paper which is on the website: Clark, S. & Muthén, B. (2009). Relating latent class analysis results to variables not included in the analysis. 


Thanks for your response. I'm not sure I understand, though, how the paper you mentioned relates to my problem. The paper seems to be about issues with auxiliary variables (i.e., distal outcomes) in a latent class framework rather than auxiliary variables related to missing data. Is there a connection that I'm not seeing? Also, if the difference in the two approaches is expected, do you have a sense of which approach is to be preferred? 


I didn't see that. Why don't you send the two outputs that don't agree that you think should and your license number to support@statmodel.com. 

Back to top 