Message/Author 

Ringo posted on Monday, October 07, 2002  3:03 pm



Dear Prof Muthen Is there anyway to deal with missing data with binary outcomes in Mplus (I can't find any description about this in the Version 2 manual)? Or what will you suggest to do instead? Thank you very much for your help. Ringo 

bmuthen posted on Sunday, October 13, 2002  10:46 am



Missing data for categorical outcomes is not in Mplus yet. We recommend using multiple imputations here. 

Tarani posted on Tuesday, March 16, 2004  2:00 am



Dear Bengt and Linda, Will Mplus3 be able to handle missing data for categorical outcomes? Many thanks, Tarani 


Yes. 


Dear Bengt and Linda, I was wondering if the MPlus Version 3 allows for the following LCA specification: DATA: . . CATEGORICAL ...; PATTERN ...; CLASSES ...; ANALYSIS: TYPE IS MIXTURE MISSING; Thank you, Levent 


No, PATTERN is not allowed with MIXTURE. You could create data using the PATTERN option, save it, and then analyze it with MIXTURE MISSING. 


Dear Linda, My colleague Jack McCardle wrote to me that Mplus V3 can handle variables that are censored at the lower limit (0). Is this true? And can one have missing data with this type of DV? Thanks, Merril 


Yes on both counts. 


Dear Bengt and Linda: I am planning to use Mplus 3 logistic regression with missing data (I suppose FIML estimator). However, I cannot find any technical references on how Mplus goes about doing this with categorical dependent variables. Any help would be appreciated. Thanks, Ramin 

bmuthen posted on Wednesday, February 08, 2006  6:38 pm



If you are considering a single categorical dependent variable where some people have missing on this dependent variable, there is no real missing data issue  those that don't have data on the dependent variable are excluded since they don't have information on the regression relationship. 


Dear Bengt: Thanks for the quick response. The missing data are in the independent variables. I am looking for a technical reference on how Mplus 3 deals with missing data in independent variables in logistic regression. The technical appendice on your website states that "Missing data is allowed for in cases were all y variables are continuous and normally distributed" (p. 25) I understand that this is with regard to Mplus version 2 and that version 3 allows for missing data when y variables are categorical. I just need a technical reference for how Mplus 3 does this. Thanks again, Ramin 

bmuthen posted on Thursday, February 09, 2006  6:33 am



Individuals with missing data on independent variables are deleted by default by Mplus when there are categorical dependent variables. Missing data are not allowed on independent variables because the model is estimated conditional on the covariates and the covariates have no distributional assumptions which is necessary for missing data handling such as MAR. Missingness on independent covariates can be modeled if the covariates are explicity brought into the model and given distributional assumptions. This is possible in Mplus, although it leads to heavy computations using numerical integration to get the ML estimates. 


Dear Bengt and Linda, I am planning to use MPlus version 4.0 to conduct a survival analysis on adolescents' sexual victimization experiences. In Muthen & Masyn (2005) three patterns of observations are described: 1.) when an individual experiences the event in time j, 2.) when an individual is lost to followup, and 3.) when an individual does not experience the event and the study concludes. However, what is to be done when a participant is lost to followup for one wave of data collection but then returns to the study for a subsequent wave of data collection? Should data for the missing wave somehow be imputed or otherwise estimated? Or how does MPlus handle this situation? Thanks so much. Brennan Young 


I am deferring to Masyn on this one  she might answer in a while. To me, Mplus can be instructed to see this occasion as missing, and I guess either assume that the person did not experience the event during that time, or assume that he/she did. 


Hi, Brennan. There are a couple of things that can be done in your situation, I think. Partly it depends on how much you know about what happened during the time a participant goes missing. If you *know* the participant did not experience the event during the time he/she was missing from the study, you can simply code his/her event indicators as "zero", i.e., nonevents, for the waves he/she was missing. If the participant returns and you do not know whether he/she had the event during the time he/she was missing, you can censor him/her after the last wave before he/she went missing. This may be the most reasonable alternative if there are not that many partcipants who fall into this category. If a large portion of your sample falls into this category, there are two more complex alternatives I have explored but both are still in development. They are described in my next posting... 


My posting, Part II: One possibility is to use categorical multiple imputation on the full sample using all the waves of data *before* you do the special data coding necessary for surival analysis, e.g., coding event indicators following an event as missing. The other possibility is to use a latent class variable at each wave for the "underlying" event status with the observed event indicators as perfect measures of the event status at each wave. This allows missing on event indicators without assuming a nonevent during the period the subject goes missing. These two later approaches will work best with recurrent event or multiple spell process. Hope that helps some. Best, Katherine Masyn kmasyn@ucdavis.edu 


Hello, I believe there is a technique to 'overcome' missing covariates when performing logistic regressions. I explicitly state ML as the estimation for my logistic path models. This appears to lead to record exclusion  unlike my path models with continuous dependent variables. I believe this is expected, correct? And is there a way to have such records (those with missing covariate values) included in logistic regressions? 


You can bring the covariates into the model and estimate their means, variances, and covariances. When you do this, you make distributional assumptions about them which may or may not hold. Following is an example of how to do this: u ON x; x; 

Emil Coman posted on Tuesday, April 03, 2012  7:57 am



This builds on Bengt's posting [February 08, 2006  6:38 pm: 'If you are considering a single categorical dependent variable where some people have missing on this dependent variable  those that don't have data on are excluded since they don't have information on the regression relationship.'] I am puzzled by a logistic path model with a dichotomous DV; the DV has valid [0 or 1] values for only 275 cases, and is regressed on other IVs (who are also regressed on their prior time values). The model with WLSMV shows 'Number of observations 390', but again, only 275 DV values are valid, the rest are defined as missing (99). If I run the model with USEOBSERVATION ARE (DV==0 OR DV==1 ) I get the right no. of observations 275. Am I messing up something here? Thanks! 


Please send the relevant output and your license number to support@statmodel.com. 


Hello Drs. Muthen, I have a question regarding the use of FIML and missing data on my covariates. I am running a onelevel logistic regression model with complex survey data. I have brought the xvariables with missing data into my model so that I can retain as many observations as possible. I have 366 total observations, 201 of which have a valid outcome (these are coded as 0/1 and the remainder of the 366 are coded as missing). When I run the code below 210 observations are usedI can't figure out why the additional 9 observations are being brought in. Do you have any thoughts? Thank you so much for your time, Ilana VARIABLE: NAMES = x1 x2 x3 discordsib respondentschooltype_a schoolcluster sw_a; MISSING = ALL (1234); CATEGORICAL = discordsib; STRATIFICATION = respondentschooltype_a; CLUSTER = schoolcluster; WEIGHT = sw_a; ANALYSIS: TYPE = COMPLEX; integration = montecarlo; ESTIMATOR = MLR; MODEL: discordsib ON x1 x2 x3; [x1 x2 x3]; 


Please send the output and data set and your license number to support@statmodel.com. 


Thank you very much, Linda. I will do so. One other question for youI know that when you bring the xvariables with missing data into the model statement you are making assumptions about normality. Is it possible to specify categorical variables? Thank you again, Ilana 


No. 


Hello Drs. Muthen I wish to seek guidance regarding missing data handling for the CFA, SEM and both with multigroup analysis. My observed variables are mostly binary for both the exogenous and endogenous latent variables. The missingness is missing at random (MAR) with a range of 2030% in all the variables. Please guide me if can use an MLR estimator for the analysis provided the sample size is sufficiently large (N= 4691), to compensate for the missing data through FIML technique, in all of my analysis. Regards Javed 


Yes, you can use MLR for this. 

Back to top 