Ringo posted on Monday, October 07, 2002 - 9:03 pm
Dear Prof Muthen
Is there anyway to deal with missing data with binary outcomes in Mplus (I can't find any description about this in the Version 2 manual)? Or what will you suggest to do instead? Thank you very much for your help. Ringo
bmuthen posted on Sunday, October 13, 2002 - 4:46 pm
Missing data for categorical outcomes is not in Mplus yet. We recommend using multiple imputations here.
Tarani posted on Tuesday, March 16, 2004 - 8:00 am
Dear Bengt and Linda,
Will Mplus3 be able to handle missing data for categorical outcomes?
Dear Linda, My colleague Jack McCardle wrote to me that Mplus V3 can handle variables that are censored at the lower limit (0). Is this true? And can one have missing data with this type of DV? Thanks, Merril
Dear Bengt and Linda: I am planning to use Mplus 3 logistic regression with missing data (I suppose FIML estimator). However, I cannot find any technical references on how Mplus goes about doing this with categorical dependent variables. Any help would be appreciated.
bmuthen posted on Thursday, February 09, 2006 - 12:38 am
If you are considering a single categorical dependent variable where some people have missing on this dependent variable, there is no real missing data issue - those that don't have data on the dependent variable are excluded since they don't have information on the regression relationship.
Dear Bengt: Thanks for the quick response. The missing data are in the independent variables. I am looking for a technical reference on how Mplus 3 deals with missing data in independent variables in logistic regression. The technical appendice on your website states that "Missing data is allowed for in cases were all y variables are continuous and normally distributed" (p. 25) I understand that this is with regard to Mplus version 2 and that version 3 allows for missing data when y variables are categorical. I just need a technical reference for how Mplus 3 does this.
Thanks again, Ramin--
bmuthen posted on Thursday, February 09, 2006 - 12:33 pm
Individuals with missing data on independent variables are deleted by default by Mplus when there are categorical dependent variables. Missing data are not allowed on independent variables because the model is estimated conditional on the covariates and the covariates have no distributional assumptions which is necessary for missing data handling such as MAR. Missingness on independent covariates can be modeled if the covariates are explicity brought into the model and given distributional assumptions. This is possible in Mplus, although it leads to heavy computations using numerical integration to get the ML estimates.
I am planning to use MPlus version 4.0 to conduct a survival analysis on adolescents' sexual victimization experiences. In Muthen & Masyn (2005) three patterns of observations are described: 1.) when an individual experiences the event in time j, 2.) when an individual is lost to follow-up, and 3.) when an individual does not experience the event and the study concludes. However, what is to be done when a participant is lost to follow-up for one wave of data collection but then returns to the study for a subsequent wave of data collection? Should data for the missing wave somehow be imputed or otherwise estimated? Or how does MPlus handle this situation?
I am deferring to Masyn on this one - she might answer in a while. To me, Mplus can be instructed to see this occasion as missing, and I guess either assume that the person did not experience the event during that time, or assume that he/she did.
There are a couple of things that can be done in your situation, I think. Partly it depends on how much you know about what happened during the time a participant goes missing. If you *know* the participant did not experience the event during the time he/she was missing from the study, you can simply code his/her event indicators as "zero", i.e., non-events, for the waves he/she was missing. If the participant returns and you do not know whether he/she had the event during the time he/she was missing, you can censor him/her after the last wave before he/she went missing. This may be the most reasonable alternative if there are not that many partcipants who fall into this category. If a large portion of your sample falls into this category, there are two more complex alternatives I have explored but both are still in development. They are described in my next posting...
One possibility is to use categorical multiple imputation on the full sample using all the waves of data *before* you do the special data coding necessary for surival analysis, e.g., coding event indicators following an event as missing. The other possibility is to use a latent class variable at each wave for the "underlying" event status with the observed event indicators as perfect measures of the event status at each wave. This allows missing on event indicators without assuming a non-event during the period the subject goes missing. These two later approaches will work best with recurrent event or multiple spell process.
I believe there is a technique to 'overcome' missing covariates when performing logistic regressions.
I explicitly state ML as the estimation for my logistic path models. This appears to lead to record exclusion - unlike my path models with continuous dependent variables. I believe this is expected, correct?
And is there a way to have such records (those with missing covariate values) included in logistic regressions?
You can bring the covariates into the model and estimate their means, variances, and covariances. When you do this, you make distributional assumptions about them which may or may not hold. Following is an example of how to do this:
u ON x; x;
Emil Coman posted on Tuesday, April 03, 2012 - 1:57 pm
This builds on Bengt's posting [February 08, 2006 - 6:38 pm: 'If you are considering a single categorical dependent variable where some people have missing on this dependent variable - those that don't have data on are excluded since they don't have information on the regression relationship.'] I am puzzled by a logistic path model with a dichotomous DV; the DV has valid [0 or 1] values for only 275 cases, and is regressed on other IVs (who are also regressed on their prior time values). The model with WLSMV shows 'Number of observations 390', but again, only 275 DV values are valid, the rest are defined as missing (-99). If I run the model with USEOBSERVATION ARE (DV==0 OR DV==1 ) I get the right no. of observations 275. Am I messing up something here? Thanks!