I need an opinion on an x-variable missing problem.
In a model that I am running, an observed (x) variable greatly reduces the sample analyzed (combined with other x-variables, the sample goes down with about 50%). I am thinking that a solution would be to make it endogenous, by making it latent (maybe with an formative-indicator factor). The problem is like this: I have a variable, match, obtained by summing three binary variables (tmatch, scmatch, stmatch). Al three are important, but they have missing values, a great deal. I am thinking of a model like this:
matchl by; t by tmatch; sc by scmatch; st by st match; matchl on t sc st;
y1 on matchl; y2 on matchl; (these last two are are needed for identification, according to Jarvis, MacKenzie, & Podsakoff, 2003; they can be two extra indicators or, actually, parts of the larger model).
or i can use define: matchl=t+sc+st; and drop matchl by;
Does it sound like a reasonable solution? Is this what you mean by "Covariate missingness can be modeled if the covariates are brought into the model and distributional assumptions such as normality are made about them." in Mplus manual, Chapter 1?
Iím interested in modeling the over-time relationship between two binary variables in a variety of models: autoregressive cross-lagged model, parallel process growth model, and an autoregressive latent trajectory model. I will have missing data on each of the repeated measures as well as at least some of the covariates that will be used as control variables. Iím planning to use weighted least squares estimation for these analyses rather than ML. I have come across 3 possible solutions for dealing with missing data for the binary dependent variables and the covariates on the discussion board, and Iím hoping for some guidance on which would be the best given my situation (or an alternative suggestion if Iíve overlooked something): 1) Multiple imputation for both of the binary dependent variables as well as the covariates. 2) Allow for WLS to estimate missingness as a function of the covariates for the two binary dependent variables, and use multiple imputation for just the covariates. 3) Allow for WLS to estimate missingness as a function of the covariates for the two binary dependent variables, and mention the variances of the covariates in the MODEL command.
2) and 3) would not allow for missingness predicted by the binary outcomes before dropout. Alternative 1) seems most reasonable. You may also want to try a Bayesian approach which like ML gives a full-information analysis.
Thank you for your help! I'll also look into the Bayesian approach.
Rod Bond posted on Tuesday, April 10, 2012 - 5:58 am
I have missing data on covariates and am trying to deal with the problem by bringing the x variable(s) into the model. When I bring some variables into the model, however, I get the following warning "THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX." I get this warning even if there is no missing data for the covariate in question and even when I have only one covariate. For example, I run an analysis with Gender as a covariate on which there is no missing data. If I run it without bringing it into the model, it runs fine. If I bring it into the model by referring to its variance in the MODEL command, I get the warning but also identical results. I have tried rescaling Gender so that its variance is similar to that of other variables, but that makes no difference. Any ideas? Thanks
I have some missings on covariates and brought them into the model by adding them to the MODEL command as mentioned above. But, as there are some binary variables, how can I tell Mplus which of them don't have normal distribution?
For that you should use multiple imputation. But ignoring the binary aspect may not be a big sin unless you also have categorical DVs in your model.
Tracy Witte posted on Wednesday, April 03, 2013 - 10:40 am
I have a couple of questions about bringing predictor variables into the model so that missing data on predictor variables can be handled with FIML:
1) I realize that doing so means that the same assumptions for the rest of the model will now be applied to the predictor variables, which may not be tenable. However, if one is using MLR as the estimator, is it less problematic to include predictor variables in the model, even if they deviate somewhat from a normal distribution?
2) Does including predictor variables in the model change the substantive interpretation of the results? Or, will any differences in parameter estimates primarily be a function of the degree to which the model assumptions are untenable for the predictor variables?
3) This issue is addressed on the following website (http://www.ats.ucla.edu/stat/mplus/faq/fiml_counts.htm). I noticed that they included the predictors in the model by modeling the intercepts, rather than the variance of the predictor variables. Is this because this model included count variables, rather than continuous variables?
4) In general, is it considered better to use multiple imputation if you have missing data for predictor variables? That is, how "experimental" is it considered to be to use this approach with FIML?
1. It may make it less problematic. 2. No. 3. You can mention any parameter of the covariate. It does not matter which one. 4. The two methods are asymptotically equivalent. Imputation may be better for categorical variables. Imputation has fewer testing options.