Missing data in binary outcomes PreviousNext
Mplus Discussion > Missing Data Modeling >
Message/Author
 Ringo posted on Monday, October 07, 2002 - 3:03 pm
Dear Prof Muthen

Is there anyway to deal with missing data
with binary outcomes in Mplus (I can't find
any description about this in the Version 2
manual)? Or what will you suggest to do
instead?
Thank you very much for your help.
Ringo
 bmuthen posted on Sunday, October 13, 2002 - 10:46 am
Missing data for categorical outcomes is not in Mplus yet. We recommend using multiple imputations here.
 Tarani posted on Tuesday, March 16, 2004 - 2:00 am
Dear Bengt and Linda,

Will Mplus3 be able to handle missing data for categorical outcomes?

Many thanks,
Tarani
 Linda K. Muthen posted on Tuesday, March 16, 2004 - 8:12 am
Yes.
 Levent Dumenci posted on Thursday, May 06, 2004 - 9:39 am
Dear Bengt and Linda,

I was wondering if the MPlus Version 3 allows for the following LCA specification:

DATA:
.
.
CATEGORICAL ...;
PATTERN ...;
CLASSES ...;

ANALYSIS:
TYPE IS MIXTURE MISSING;

Thank you,
Levent
 Linda K. Muthen posted on Thursday, May 06, 2004 - 10:05 am
No, PATTERN is not allowed with MIXTURE. You could create data using the PATTERN option, save it, and then analyze it with MIXTURE MISSING.
 Merril Silverstein posted on Sunday, September 26, 2004 - 11:36 am
Dear Linda,
My colleague Jack McCardle wrote to me that Mplus V3 can handle variables that are censored at the lower limit (0). Is this true? And can one have missing data with this type of DV?
Thanks,
Merril
 Linda K. Muthen posted on Wednesday, September 29, 2004 - 4:17 pm
Yes on both counts.
 Ramin Mojtabai posted on Wednesday, February 08, 2006 - 1:03 pm
Dear Bengt and Linda:
I am planning to use Mplus 3 logistic regression with missing data (I suppose FIML estimator). However, I cannot find any technical references on how Mplus goes about doing this with categorical dependent variables. Any help would be appreciated.

Thanks,
Ramin
 bmuthen posted on Wednesday, February 08, 2006 - 6:38 pm
If you are considering a single categorical dependent variable where some people have missing on this dependent variable, there is no real missing data issue - those that don't have data on the dependent variable are excluded since they don't have information on the regression relationship.
 Ramin Mojtabai posted on Wednesday, February 08, 2006 - 7:10 pm
Dear Bengt:
Thanks for the quick response. The missing data are in the independent variables. I am looking for a technical reference on how Mplus 3 deals with missing data in independent variables in logistic regression. The technical appendice on your website states that "Missing data is allowed for in cases were all y variables are continuous and normally distributed" (p. 25) I understand that this is with regard to Mplus version 2 and that version 3 allows for missing data when y variables are categorical. I just need a technical reference for how Mplus 3 does this.

Thanks again,
Ramin--
 bmuthen posted on Thursday, February 09, 2006 - 6:33 am
Individuals with missing data on independent variables are deleted by default by Mplus when there are categorical dependent variables. Missing data are not allowed on independent variables because the model is estimated conditional on the covariates and the covariates have no distributional assumptions which is necessary for missing data handling such as MAR. Missingness on independent covariates can be modeled if the covariates are explicity brought into the model and given distributional assumptions. This is possible in Mplus, although it leads to heavy computations using numerical integration to get the ML estimates.
 Brennan Young posted on Tuesday, July 25, 2006 - 1:56 pm
Dear Bengt and Linda,

I am planning to use MPlus version 4.0 to conduct a survival analysis on adolescents' sexual victimization experiences. In Muthen & Masyn (2005) three patterns of observations are described: 1.) when an individual experiences the event in time j, 2.) when an individual is lost to follow-up, and 3.) when an individual does not experience the event and the study concludes. However, what is to be done when a participant is lost to follow-up for one wave of data collection but then returns to the study for a subsequent wave of data collection? Should data for the missing wave somehow be imputed or otherwise estimated? Or how does MPlus handle this situation?

Thanks so much.

Brennan Young
 Bengt O. Muthen posted on Wednesday, July 26, 2006 - 4:39 pm
I am deferring to Masyn on this one - she might answer in a while. To me, Mplus can be instructed to see this occasion as missing, and I guess either assume that the person did not experience the event during that time, or assume that he/she did.
 Katherine E. Masyn posted on Monday, July 31, 2006 - 6:07 pm
Hi, Brennan.

There are a couple of things that can be done in your situation, I think.
Partly it depends on how much you know about what happened during the time
a participant goes missing. If you *know* the participant did not
experience the event during the time he/she was missing from the study, you
can simply code his/her event indicators as "zero", i.e., non-events, for
the waves he/she was missing. If the participant returns and you do not
know whether he/she had the event during the time he/she was missing, you
can censor him/her after the last wave before he/she went missing. This
may be the most reasonable alternative if there are not that many
partcipants who fall into this category. If a large portion of your sample
falls into this category, there are two more complex alternatives I have
explored but both are still in development. They are described in my next posting...
 Katherine E. Masyn posted on Monday, July 31, 2006 - 6:08 pm
My posting, Part II:

One possibility is to use
categorical multiple imputation on the full sample using all the waves of
data *before* you do the special data coding necessary for surival
analysis, e.g., coding event indicators following an event as missing. The
other possibility is to use a latent class variable at each wave for the
"underlying" event status with the observed event indicators as perfect
measures of the event status at each wave. This allows missing on event
indicators without assuming a non-event during the period the subject goes
missing. These two later approaches will work best with recurrent event or
multiple spell process.

Hope that helps some.

Best,
Katherine Masyn
kmasyn@ucdavis.edu
 Anjali Gupta posted on Monday, October 12, 2009 - 1:59 pm
Hello,

I believe there is a technique to 'overcome' missing covariates when performing logistic regressions.

I explicitly state ML as the estimation for my logistic path models. This appears to lead to record exclusion - unlike my path models with continuous dependent variables. I believe this is expected, correct?

And is there a way to have such records (those with missing covariate values) included in logistic regressions?
 Linda K. Muthen posted on Tuesday, October 13, 2009 - 8:04 am
You can bring the covariates into the model and estimate their means, variances, and covariances. When you do this, you make distributional assumptions about them which may or may not hold. Following is an example of how to do this:

u ON x;
x;
 Emil Coman posted on Tuesday, April 03, 2012 - 7:57 am
This builds on Bengt's posting [February 08, 2006 - 6:38 pm:
'If you are considering a single categorical dependent variable where some people have missing on this dependent variable - those that don't have data on are excluded since they don't have information on the regression relationship.']
I am puzzled by a logistic path model with a dichotomous DV; the DV has valid [0 or 1] values for only 275 cases, and is regressed on other IVs (who are also regressed on their prior time values). The model with WLSMV shows 'Number of observations 390', but again, only 275 DV values are valid, the rest are defined as missing (-99).
If I run the model with USEOBSERVATION ARE (DV==0 OR DV==1 ) I get the right no. of observations 275. Am I messing up something here? Thanks!
 Linda K. Muthen posted on Tuesday, April 03, 2012 - 4:11 pm
Please send the relevant output and your license number to support@statmodel.com.
 Ilana Raskind posted on Thursday, August 04, 2016 - 10:40 am
Hello Drs. Muthen,

I have a question regarding the use of FIML and missing data on my covariates. I am running a one-level logistic regression model with complex survey data. I have brought the x-variables with missing data into my model so that I can retain as many observations as possible.

I have 366 total observations, 201 of which have a valid outcome (these are coded as 0/1 and the remainder of the 366 are coded as missing). When I run the code below 210 observations are used--I can't figure out why the additional 9 observations are being brought in. Do you have any thoughts?

Thank you so much for your time,

Ilana

VARIABLE:
NAMES = x1 x2 x3 discordsib respondentschooltype_a schoolcluster
sw_a;
MISSING = ALL (-1234);
CATEGORICAL = discordsib;
STRATIFICATION = respondentschooltype_a;
CLUSTER = schoolcluster;
WEIGHT = sw_a;
ANALYSIS: TYPE = COMPLEX;
integration = montecarlo;
ESTIMATOR = MLR;
MODEL: discordsib ON x1 x2 x3;
[x1 x2 x3];
 Linda K. Muthen posted on Thursday, August 04, 2016 - 1:56 pm
Please send the output and data set and your license number to support@statmodel.com.
 Ilana Raskind posted on Friday, August 05, 2016 - 1:30 pm
Thank you very much, Linda. I will do so.

One other question for you--I know that when you bring the x-variables with missing data into the model statement you are making assumptions about normality. Is it possible to specify categorical variables?

Thank you again,

Ilana
 Linda K. Muthen posted on Friday, August 05, 2016 - 2:19 pm
No.
 Javed Ashraf posted on Monday, April 23, 2018 - 6:23 am
Hello Drs. Muthen
I wish to seek guidance regarding missing data handling for the CFA, SEM and both with multi-group analysis. My observed variables are mostly binary for both the exogenous and endogenous latent variables. The missingness is missing at random (MAR) with a range of 20-30% in all the variables. Please guide me if can use an MLR estimator for the analysis provided the sample size is sufficiently large (N= 4691), to compensate for the missing data through FIML technique, in all of my analysis.
Regards
Javed
 Bengt O. Muthen posted on Monday, April 23, 2018 - 4:39 pm
Yes, you can use MLR for this.
Back to top
Add Your Message Here
Post:
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Password:
Options: Enable HTML code in message
Automatically activate URLs in message
Action: