Mplus Discussion >> Missing at random

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Missing at random

Mplus Discussion > Structural Equation Modeling >

Message/Author

Alyson Zalta posted on Sunday, March 04, 2007 - 6:10 am

I am using the “TYPE = MISSING H1” option in MPlus. For this option, the default assumes that data are missing at random. In my dataset, measures in the beginning of the survey were more likely to be completed and higher order factors (interactions) are missing whenever a component factor is missing. Do I need to address these concerns when doing missing data imputation?

Thanks for your assistance!

Bengt O. Muthen posted on Sunday, March 04, 2007 - 9:58 am

Remember that the "missing at random (MAR)" approach that Type = missing uses is not the same as assuming missing completely at random (MCAR), but missingness can be quite selective (the terms are misleading). Listwise deletion (not using Type=Missing) is correct only under MCAR. MAR is much more flexible than MCAR, for instance if your attrition happens for subjects having particularly high (or low) values at the first time point, MAR may be approximately true. It is probably the case that missingness is often NMAR (not missing at random) and a function of many unobserved variables, but MAR may still be a reasonable approximation. In short, use Type = Missing. I assume that the data you have is sufficient to identify your interactions.

Alyson Zalta posted on Monday, March 19, 2007 - 7:05 pm

Thank you for your response. I decided to test whether people with missing data were significantly different on any of our variables compared to those who had complete data. There was one significant difference (even if I were to Bonferroni correct for the number of analyses) and several p <.10 trends. In general, people with higher psychopathology were less likely to have complete data. Does this violate even more flexible MAR assumptions? Do you still recommend using the MAR approach in this case?

I have one interaction term ("sxc" in syntax below) and N=363. I am also running a split group analysis for men (N=180) vs. women (N=183) I thought this should be enough data to identify the interaction. Do you agree? Do you have any recommendations for how to do a power analysis to test this?

MODEL:
ib ON smsr strc;
coprc ON smsr;
sxc WITH strc coprc;
strc WITH coprc;
pswq ON smsr strc ib coprc sxc;
rrs ON smsr strc ib coprc sxc;
rrs WITH pswq;

Linda K. Muthen posted on Tuesday, March 20, 2007 - 4:06 pm

If psychopathology has missing data and the missingness is related to the values of psychopathology, you do not have MAR. If psychopathology has missing data and the missingness is not related to the values of psychopathology, you have MAR.

You can see the following paper where assessing power is discussed:

Muthén, L.K. & Muthén, B.O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 4, 599-620.

Alyson Zalta posted on Tuesday, March 20, 2007 - 6:16 pm

Unfortunately, our missingness is related to the values of psychopathology, so we do not have MAR. Given this limitation, is listwise deletion preferable to TYPE = MISSING?

Thanks again for your help!

Linda K. Muthen posted on Tuesday, March 20, 2007 - 6:35 pm

I think using TYPE=MISSING is preferable even in this situation.

Eulalia Puig posted on Wednesday, November 28, 2007 - 6:31 pm

Hello.
What is the default in v5 for using WLSMV in SEM?
Thanks!

Eulalia Puig posted on Thursday, November 29, 2007 - 6:05 am

What I meant above is what the default for dealing with missing values is - MAR or listwise?
Thanks.

Linda K. Muthen posted on Thursday, November 29, 2007 - 6:33 am

The default in Version 5 is to estimate the model using all available data and missing data theory. Listwise deletion can be obtained using LISTWISE=ON in the DATA command.

kirby posted on Wednesday, May 21, 2008 - 3:02 am

With
TYPE IS MISSING;
ESTIMATOR IS MLR;

am I right that MPlus uses the expectation maximization-algorithm to handle missing data?

Bengt O. Muthen posted on Wednesday, May 21, 2008 - 8:57 pm

The EM algorithm is used to give ML estimates of the H1 unrestricted model, but for the H0 model other ML algorithms are used (Quasi-Newton, FS).

kirby posted on Thursday, May 22, 2008 - 1:47 am

Dear Bengt,

thanks for your answer. Would it be possible to give me some further information about how missing data is handled with MLR? Why are the other ML algorithms (QN, FS) used?

Since I do not know any papers I could have a look at to solve my questions, I really appreciate your help. Thanks a lot.

Bengt O. Muthen posted on Thursday, May 22, 2008 - 5:28 pm

There are some not-too-technical overview papers on missing data techniques using "FIML" - which is ML - one is I think written by Werner Wotke and/or by Schumaker. You can search for that. Essentially, the EM algorithm is suitable with an unrestricted H1. EM makes the estimation easy by estimating the expected missing data values in each iteration. But that is not necessarily the best algorithm in the H0 computations - here you can focus on estimating the model parameters directly, going over each of the missing data patterns.

Alexander Kapeller posted on Friday, April 08, 2011 - 3:13 pm

Dear Bengt,
are there differences in ML vs. MLR in the use of algorithm in the case of missing data?

thanks alex

Bengt O. Muthen posted on Friday, April 08, 2011 - 5:21 pm

No.

Alexander Kapeller posted on Saturday, April 09, 2011 - 5:19 pm

Thanks, two more questions.

1)which sandwich estimator is used to obtain the corrected s.e. ?

2) are the parameter estimates obtained via raw ML (as I understand) or also via a sandwich estimator?

best
Alex

Linda K. Muthen posted on Sunday, April 10, 2011 - 10:12 am

1. See on the website Technical Appendix 8 formula 170.

2. Sandwich estimators are used for standard errors not parameter estimates.

Alexander Kapeller posted on Tuesday, April 12, 2011 - 1:54 pm

Thanks Linda,

do you know if the sandwich type estimator of MLR is the same used for the robust covariance matrix in EQS or do their exist several estimators?

Linda K. Muthen posted on Tuesday, April 12, 2011 - 2:37 pm

There are different algorithms for sandwich estimators. I'm not sure what EQS uses.

Marcel Paulssen posted on Monday, April 30, 2012 - 8:32 am

Dear Linda, dear Bengt,

I have a non-convergence problem with a data set with missing completely at random data (respondents had to answer a random selection of items). The coverage of the off-diagonal elements of the covariance matrix is between 16% and 20%.

I use the the analysis command type = missing but get even for very simple models (e.g CFA with 4 indicators) the following message:

THE MISSING DATA EM ALGORITHM FOR THE H1 MODEL
HAS NOT CONVERGED WITH RESPECT TO THE LOGLIKELIHOOD
FUNCTION. THIS COULD BE DUE TO LOW COVARIANCE COVERAGE
OR A NOT SUFFICIENTLY STRICT EM PARAMETER CONVERGENCE
CRITERION. CHECK THE COVARIANCE COVERAGE, OR SHARPEN THE
EM PARAMETER CONVERGENCE CRITERION, OR RERUN WITHOUT H1
TO OBTAIN H0 PARAMETER ESTIMATES AND STANDARD ERRORS.

Is there anything else I can do? I have tried to set the iterations to 10000 but nothing changed - all suggestions are warmly appreciated.

Best

Marcel

Linda K. Muthen posted on Monday, April 30, 2012 - 1:54 pm

Your low coverage is causing the H1 model not to converge. This is because the H1 model uses both diagonal and off-diagonal elements. If the H1 model does not converge, you cannot get chi-square. You can add NOCHI to the OUTPUT command.

Johan Ng posted on Thursday, September 06, 2012 - 4:03 am

Dear Bengt & Linda,

I am trying to run a path analysis using the BAYES estimator. I understand missing data are treated using FIML by default when MLR etc are used. But how are they treated when I use BAYES?

Many thanks in advance!

Bengt O. Muthen posted on Thursday, September 06, 2012 - 7:50 am

Bayes and FIML (=ML) both assume "MAR" (see the Enders 2010 book), so using all available data.

Hillary Gorin posted on Friday, May 11, 2018 - 12:57 pm

Hello,

When running a four wave growth model using MLR estimation and a Zero inflated poisson distribution,
the following results in Chi-Square Difference tests for MCAR were obtained:

Pearson Chi Squared: 601.499
Pearson df: 1090
Pearson p: >.999 (p = 1.000)
Likelihood Chi Squared: 674.469
Likelihood df: 1090
Likelihood p: >.999 (p = 1.000)

Can it be assumed the data is MCAR?

Thanks,
Hillary

Tihomir Asparouhov posted on Friday, May 11, 2018 - 3:15 pm

That's right. There is no evidence against that assumption. For more on this test see
https://www.tandfonline.com/doi/pdf/10.1080/01621459.1982.10477795?needAccess=true

Hillary Gorin posted on Friday, May 11, 2018 - 4:38 pm

Thanks for your response! So I can say that my data is missing completely at random?

Hillary

Tihomir Asparouhov posted on Saturday, May 12, 2018 - 8:30 am

Yes

Hillary Gorin posted on Saturday, May 12, 2018 - 9:45 am

Ok, great! Thanks so much!!

Hillary