Missing on x-variable - solution? PreviousNext
Mplus Discussion > Missing Data Modeling >
 Cristian Dogaru posted on Wednesday, August 06, 2008 - 2:59 pm
Dear Linda and Bengt,

I need an opinion on an x-variable missing problem.

In a model that I am running, an observed (x) variable greatly reduces the sample analyzed (combined with other x-variables, the sample goes down with about 50%). I am thinking that a solution would be to make it endogenous, by making it latent (maybe with an formative-indicator factor).
The problem is like this: I have a variable, match, obtained by summing three binary variables (tmatch, scmatch, stmatch). Al three are important, but they have missing values, a great deal. I am thinking of a model like this:

matchl by;
t by tmatch;
sc by scmatch;
st by st match;
matchl on t sc st;

y1 on matchl;
y2 on matchl;
(these last two are are needed for identification, according to Jarvis, MacKenzie, & Podsakoff, 2003; they can be two extra indicators or, actually, parts of the larger model).

or i can use
define: matchl=t+sc+st;
and drop matchl by;

Does it sound like a reasonable solution? Is this what you mean by "Covariate missingness can be modeled if the covariates are brought into the model and distributional assumptions such as normality are made about them." in Mplus manual, Chapter 1?

Thank you in advance.

Cristian Dogaru
 Linda K. Muthen posted on Wednesday, August 06, 2008 - 3:14 pm
I think it is more straightforward to bring it into the model by mentioning the variance of the variable in the MODEL command. For example,

 Cristian Dogaru posted on Wednesday, August 06, 2008 - 4:35 pm
Thank you!
One of my advisers keeps telling me that I tend to see things too complicated...
 Dana Garbarski posted on Tuesday, February 01, 2011 - 5:46 pm
Iím interested in modeling the over-time relationship between two binary variables in a variety of models: autoregressive cross-lagged model, parallel process growth model, and an autoregressive latent trajectory model. I will have missing data on each of the repeated measures as well as at least some of the covariates that will be used as control variables. Iím planning to use weighted least squares estimation for these analyses rather than ML. I have come across 3 possible solutions for dealing with missing data for the binary dependent variables and the covariates on the discussion board, and Iím hoping for some guidance on which would be the best given my situation (or an alternative suggestion if Iíve overlooked something):
1) Multiple imputation for both of the binary dependent variables as well as the covariates.
2) Allow for WLS to estimate missingness as a function of the covariates for the two binary dependent variables, and use multiple imputation for just the covariates.
3) Allow for WLS to estimate missingness as a function of the covariates for the two binary dependent variables, and mention the variances of the covariates in the MODEL command.
 Bengt O. Muthen posted on Tuesday, February 01, 2011 - 6:04 pm
2) and 3) would not allow for missingness predicted by the binary outcomes before dropout. Alternative 1) seems most reasonable. You may also want to try a Bayesian approach which like ML gives a full-information analysis.
 Dana Garbarski posted on Thursday, February 03, 2011 - 5:36 am
Thank you for your help! I'll also look into the Bayesian approach.
 Rod Bond posted on Tuesday, April 10, 2012 - 5:58 am
I have missing data on covariates and am trying to deal with the problem by bringing the x variable(s) into the model. When I bring some variables into the model, however, I get the following warning "THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX." I get this warning even if there is no missing data for the covariate in question and even when I have only one covariate. For example, I run an analysis with Gender as a covariate on which there is no missing data. If I run it without bringing it into the model, it runs fine. If I bring it into the model by referring to its variance in the MODEL command, I get the warning but also identical results. I have tried rescaling Gender so that its variance is similar to that of other variables, but that makes no difference. Any ideas? Thanks
 Linda K. Muthen posted on Tuesday, April 10, 2012 - 1:54 pm
This message comes about because the mean and variance of a binary variable are not orthogonal. If you intend to bring the variable into the model, you can ignore the message.
 Rod Bond posted on Wednesday, April 11, 2012 - 5:54 am

Thanks for your prompt reply. That's very helpful.
 Katja Haberecht posted on Thursday, November 29, 2012 - 1:24 am
I have some missings on covariates and brought them into the model by adding them to the MODEL command as mentioned above. But, as there are some binary variables, how can I tell Mplus which of them don't have normal distribution?
 Bengt O. Muthen posted on Thursday, November 29, 2012 - 6:48 am
For that you should use multiple imputation. But ignoring the binary aspect may not be a big sin unless you also have categorical DVs in your model.
 Tracy Witte posted on Wednesday, April 03, 2013 - 10:40 am
I have a couple of questions about bringing predictor variables into the model so that missing data on predictor variables can be handled with FIML:

1) I realize that doing so means that the same assumptions for the rest of the model will now be applied to the predictor variables, which may not be tenable. However, if one is using MLR as the estimator, is it less problematic to include predictor variables in the model, even if they deviate somewhat from a normal distribution?

2) Does including predictor variables in the model change the substantive interpretation of the results? Or, will any differences in parameter estimates primarily be a function of the degree to which the model assumptions are untenable for the predictor variables?

3) This issue is addressed on the following website (http://www.ats.ucla.edu/stat/mplus/faq/fiml_counts.htm). I noticed that they included the predictors in the model by modeling the intercepts, rather than the variance of the predictor variables. Is this because this model included count variables, rather than continuous variables?

4) In general, is it considered better to use multiple imputation if you have missing data for predictor variables? That is, how "experimental" is it considered to be to use this approach with FIML?

Thank you very much for your help!
 Linda K. Muthen posted on Wednesday, April 03, 2013 - 4:58 pm
1. It may make it less problematic.
2. No.
3. You can mention any parameter of the covariate. It does not matter which one.
4. The two methods are asymptotically equivalent. Imputation may be better for categorical variables. Imputation has fewer testing options.
 Stine Hoj posted on Wednesday, January 22, 2014 - 5:04 pm
I am wondering if you could help me to understand why bringing a covariate into the model has a pronounced effect on model fit statistics.

I have a linear growth model with a continous covariate (Y1) measured at 3 time points. The model includes one time-varying covariate (X1) and several time-invariant covariates (Z1-Z7).

i s | Y11@0 Y12@1 Y13@2;
i s ON Z1-Z7;
Y11 on X11;
Y12 on X12;
Y13 on X13;

(RMSEA = 0.05, CFI = 0.968, SRMR = 0.036)

Approximately 20% of the sample are missing values on X1 at some time point. However, if I bring X1 into the model to retain these observations, the fit statistics are markedly worse.

i s | Y11@0 Y12@1 Y13@2;
i s ON Z1-Z7;
Y11 on X11;
Y12 on X12;
Y13 on X13;
X11 X12 X13;

(RMSEA = 0.11, CFI = 0.658, SRMR = 0.113)

Any guidance would be appreciated.
 Linda K. Muthen posted on Thursday, January 23, 2014 - 10:29 am
Please send the two outputs and your license number to support@statmodel.com. Include TECH1 in both outputs.
 Malte Jansen posted on Thursday, February 27, 2014 - 7:28 am
Dear Mplus Team,

it would be great if had could give me some advice with regard to the following situation:

I am trying to regress student achievement on a number of predictors and several interactions between continuous predictors. The predictors include binary variables (e.g. sex), continuous variables with one indicator and continuous variables with several indicators. The students are nested in classes, but all predictors are on the individual student level. The dataset includes some missing values on both the outcome and all three kinds of predictor variables.

I am unsure which modeling approach to use because:

1. It would like to use FIML to handle the missing data as itís more convenient than multiple imputation. When I use FIML (by estimating the variances of all predictors in the MODEL statement), all manifest predictors are treated as latent variables with one indicator, right? I guess this might be problematic for binary predictors such as sex. Would you recommend using FIML on the binary predictors?

2. If I use FIML only for continuous predictors and do not include their covariances with the binary predictors in the model, a bad model fit results. If I include the covariances, again FIML is used on the binaries. Do I necessarily have to include the covariances?

[part 2 coming up]
 Malte Jansen posted on Thursday, February 27, 2014 - 7:28 am
3. When I include latent interaction, no standardized coefficients are reported. I tried to standardize all variables prior to the analysis as well as set the variance of all latent variables to 1. However, the results for the models without interactions are different from the STDYX output, especially the regression coefficients for the binary predictors. Is there any way to obtain ďSTDYĒ coefficients (or coefficients that could be interpreted in a similar way) for the binary predictor variables when interactions are included?

Which modeling approach do you think would be the most suitable? It would be great if you could help me.

Best regards and thank you in advance
 Linda K. Muthen posted on Thursday, February 27, 2014 - 10:46 am
I would use FIML. You must mention the variances of all of the predictors. You cannot mention a subset.

Please send the outputs and your license number to support@statmodel.com.

In the future, please limit your post to one window.
 Stine Hoj posted on Wednesday, March 26, 2014 - 7:52 pm
Can you point me to any resources that might help me understand how important the assumption of continuous normality is when bringing covariates into the model?

As in the last post, I would like to use FIML for reasons of convenience, but most of my covariates are binary. In actual fact, these binary covariates are missing almost no data; it is just the one continuous covariate that is missing ~20% of responses. I am aware that I need to bring all of the covariates into the model, not just a subset, but I am wondering how to assess what the implications of this might be. Thank you.
 Linda K. Muthen posted on Thursday, March 27, 2014 - 1:58 pm
I don't know of any references on this. In our experience we don't think it has too much of an effect. You could do a Monte Carlo simulation study to investigate this.
 Laurel Wallace posted on Thursday, April 17, 2014 - 6:51 am

I am using Mplus to run a longitudinal cross-lagged model, with clustered data. I am using the MLR estimator and my syntax looks like this:


When I run the analyses, I get warning messages indicating that some cases are excluded due to missing data on x-variables. As you know, I can resolve this issue and include all cases by adding:

However, do you recommend doing so, so that as many cases as possible are included, or do you perceive it to be more appropriate to not include cases with missing data on x-variables?

Thanks so much for your help.
 Linda K. Muthen posted on Friday, April 18, 2014 - 9:05 am
With continuous covariates, it is asymptotically equivalent to doing multiple imputation. With binary covariates, it is a good approximation. I would not be too concerned.
 sojung park  posted on Sunday, April 20, 2014 - 11:59 pm
I have a path model. In order not to loss observations in my sample, I bring predictors (that has missings) into the model.

Earlier model

(RMSEA = 0.05, CFI = 0.99)

Later model:
[X11 X12 X13];

(RMSEA = 0.06, CFI = 0.02)

Clearly, the model fit became substantially worse, too much that it is not acceptable.

Any guidance would be appreciated!
Also, could you tell me in the path model (my DV is continuous, no latent variables), why does Mplus still seems to the list iswe deletion?

thank you so much-----
 Linda K. Muthen posted on Monday, April 21, 2014 - 6:26 am
Which estimator are you using. Why do you think Mplus uses listwise deletion. It does not.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message