Mplus Discussion >> Nonignorable Missing Data

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Nonignorable Missing Data

Mplus Discussion > Missing Data Modeling >

Message/Author

Moh Yin Chang posted on Saturday, July 21, 2007 - 1:55 pm

Dear Dr. Muthen,

I would like to ask if the current Mplus can fit the Diggle-Kenward selection model and/or shared-parameter model for nonignorable missing data. If yes, is there a code example that you may show me?

Thank you.

Bengt O. Muthen posted on Saturday, July 21, 2007 - 5:15 pm

By Diggle-Kenward, do you mean missingness that is a function of the variable which has missingness? If so, yes. If not, please send me the article describing the model.

V X posted on Wednesday, October 10, 2007 - 3:02 am

Dr. Muthen, I am also interested in learning Mplus to fit Diggle-Kenward selection model and shared-parameter model for nonignorable missing data (that is, missing not at random). Would you provide some Mplus code examples?

Thank you.

Bengt O. Muthen posted on Wednesday, October 10, 2007 - 3:49 pm

I think Diggle-Kenward consider missingness as a function of the (latent) response variables y - what you would have observed if if wasn't missing. You could use DATA MISSING to create binary missing data indicators and then regress those "u's" on the "y's" that have missingness by regular ON statements (y on u). I am not familiar with the term "shared-parameter model".

Bengt O. Muthen posted on Wednesday, October 10, 2007 - 3:51 pm

P.S. For more non-ignorable missing data modeling, see also the Lecture 17 handout from my UCLA latent variable course at

http://www.gseis.ucla.edu/faculty/muthen/Handouts.htm

V X posted on Friday, October 12, 2007 - 1:00 pm

Dr. Muthen, the Lecture 17 notes for
Educ 231E class helps me a lot to understand the "outcome based dependent selection model". Thank you.

Currently, I have one question about the path diagram of "growth mixture model with nonignorable missingness as a function of y".

I know a circle represents a latent variable. Would you intepreate that, what the meaning of putting a circle on y in the diagram under the contents of selection model?

Bengt O. Muthen posted on Friday, October 12, 2007 - 3:09 pm

For individuals who have missing data on y, the y variable is a latent variable.

Moh Yin Chang posted on Friday, November 30, 2007 - 8:55 am

Dr Muthen,
You said "for individuals who have missing data on y, the y variable is a latent variable". Do you mean that by creating a latent variable CY like the figure in slide 6 of your Lecture 17, we can fit a nonignorable missingness model with missing Y?

Bengt O. Muthen posted on Friday, November 30, 2007 - 3:29 pm

Yes. But you can also do non-ignorable missingness modeling without cy (which is categorical) using the model of slide 3.

Neither is an easy thing to do.

Bruckers Liesbeth posted on Wednesday, November 25, 2009 - 8:03 am

Dear Dr. Muthen,

I was wondering if the syntax for the models that you discuss in the slides

http://www.gseis.ucla.edu/faculty/muthen/Handouts.htm (Lecture 17)

is available.

Kind Regards,

Liesbeth

Bengt O. Muthen posted on Wednesday, November 25, 2009 - 9:33 am

No, it is not. But a new missing data paper will be posted within short which discusses alternative models for non-ignorable missing data and you can then request the Mplus setups for those analyses.

Tim Stump posted on Friday, April 20, 2012 - 3:01 pm

I have a cohort of type II diabetes adolescents with hemoglobin a1c collected at baseline (prior to high school graduation), 3, 6, 9, and 12 month time pts. We know that our a1c outcome does not satisfy MAR assumption because we could not get all chart review data from physicians offices after adolescents left home. Baseline a1c is not missing, but missing increases over time. The cohort is relatively small with 180 subjects, but would like to explore some of the models outlined in "Growth Modeling With Nonignorable Dropout: Alternative Analyses of the STAR*D Antidepressant Trial". Our goal is simply to model a1c over time and see if trajectory is different for a couple of binary covariates and if missing a1c influences trajectory. Would have you any suggestions as to which type of NMAR model would work better with our small sample?

Bengt O. Muthen posted on Friday, April 20, 2012 - 8:35 pm

Pattern-mixture modeling is probably the easier to work with.

Martin Blais posted on Thursday, May 29, 2014 - 1:27 pm

Dear Drs. Muthen,

I am looking for a way to model non-ignorable missing data in a LCA model.

Case is I have 3 variables, each representing the age at onset of a 3-stage process. As stages 2 and 3 are only possible if the previous one has been reached, missing data on stage 2 and/or stage 3 are non-ignorable. The missing data structure looks like a monotone missing pattern, except that they are not dropout: the missing data are informative that the next stage was not reached and I want to include this information in the model.

Structure of the database is :
s1 s2 s3
9 12 13
12 14 17
14 15 .
11 16 .
13 . .
17 . .

Do you have any advice on how to implement such a model in Mplus? The closest I found is the Diggle-Kenward selection model (Ex. 11.3), but there are no i, s, q components in what I model...

Many thanks

Bengt O. Muthen posted on Thursday, May 29, 2014 - 3:47 pm

So you have an LCA based on 3 cont's variables. Does s1 predict missingness on s2 and s3 and s1 is always observed, so MAR? Regarding non-ignorable, are you saying that the values that would have been observed for s2 and s3 predict their missingness?

Martin Blais posted on Friday, May 30, 2014 - 5:02 am

You are right on the MAR component. I might have erred regarding the non-ignorable.

Cont's var are age at three stages: s1 = self-awareness; s2 = self-identification; s3 = disclosure. They follow a sequence constraining values in a way that s3>s2>s1. Data are cross-sectional.

s1 is always observed (inclusion criteria). s2 can be observed or not (if stage s1 is completed and s2 has been reached). s3 can only be observed if s2 is observed AND s3 has been reached.

I want to model trajectories that take into account both the occurrence (yes or no) of the stages and, if the next stage has been reached, the age at which that happened.

Thanks for your help!

Bengt O. Muthen posted on Saturday, May 31, 2014 - 2:16 pm

I think I would need to understand the setting better to help you and that goes beyond Mplus Discussion. You may want to ask on SEMNET. You want to make clear why LCA is of interest to you (why mixtures?) and why you want to model trajectories (trajectories of what?). Two comments: Selection modeling of missing data like Diggle-Kenward can be done without a growth model; survival modeling might be relevant given that you want to model age at which the events happened (perhaps multivariate survival; see the Masyn dissertation on our website).

Lucy Markson posted on Friday, May 22, 2015 - 4:35 am

I have a longitudinal dataset over 5 waves with 1600 cases. I think that the missing data is MNAR because missingness on IVs is related to the DVs in my model. How is it best to model missing data in Mplus when it is MNAR? Am I right in thinking that FIML is not appropriate?

Many thanks

Bengt O. Muthen posted on Friday, May 22, 2015 - 5:49 pm

I don't understand how DV values can influence missing on IVs. Typically IVs influence DVs.

Lucy Markson posted on Saturday, May 23, 2015 - 1:39 am

Yes, sorry that's what I meant

Bengt O. Muthen posted on Saturday, May 23, 2015 - 10:06 am

IVs influencing DV missingness is covered by MAR, not necessarily necessitating NMAR modeling.

Lucy Markson posted on Wednesday, May 27, 2015 - 2:07 am

OK many thanks. Tabacnick & Fidell (2007) state that �MNAR is inferred if the (missing variable analysis) t- tests show that missingness is related to the DV.� (p63) Would it be possible to ask for clarification since this seems to contradict what you are saying?

Would it also be possible to ask for clarification on one separate Mplus issue? In order to conduct measurement invariance for GCM is it required to have the exact same measure at each wave? E.g if one has a measure with some different items at each wave (for example to make the measure age appropriate), am I right in thinking that standardised scores cannot be used and the only option is to use only those items that appear at each wave (wave counterpart items)?

Many thanks again

Bengt O. Muthen posted on Wednesday, May 27, 2015 - 10:06 am

MAR is proper if the missingness can be predicted by observed variables. NMAR is at hand if missingness is predicted by unobserved variables such as the value that would have been observed or other latent variables.

Don't use standardized scores in growth modeling. You can deal with different items at different time points if you take a multiple indicator approach with measurement invariance for items that are in common.

Harmen Zoet posted on Thursday, November 10, 2016 - 12:27 am

Hi,

I want to conduct an LPA with treatment outcome scores as indicators (operationalized as difference scores (T1 - T2) of items on a questionnaire). However, I have missing data which might be non-ignorable. After all, it is plausible that I have more missing data on my last point of measurement (T2) for those who do not respond to treatment .

What is now the best way to deal with my missing data? Should I first of all use multiple imputation for all items, then compute my difference scores, prior to running my LPA? Or is it also possible (and better) to first compute my difference scores, then run my LPA while at the same time using maximum likelihood estimation (or FIML)? Or do I completely miss the point and is it necessary to do this in a different way?

Thanks in advance!

Bengt O. Muthen posted on Thursday, November 10, 2016 - 12:31 pm

Use ML instead of multiple imputation whenever you can to deal with missing data.

I am not sure why you need to work with difference scores - keeping the original scores might be better and uses all available information with ML. The estimated classes will tell you if increases, decreases, or constancy forms the classes.

Harmen Zoet posted on Thursday, November 17, 2016 - 1:12 am

Dear dr Muthen,

Thanks for your answer. I'm considering your advice, but can not really figure it out. Would the following be correct:

Variable:
Names are nummer T0B1FR (...) KIPPOST;

USEVARIABLES are T0B1FR-T2D5IN KIPPOST;
CLASSES = c (4);
MISSING is all (999);

Analysis:
Type = mixture;
Estimator = ML;
Starts = 100 10;

Output:
TECH1 TECH10 TECH11 TECH14

Plot:
Type = PLOT3

Savedata:
File is GMM.dat;
save = cprobabilities;

All indicator variables are the items from the questionnaire, both pre- and post-treatment (expect for KIPPOST, this is a post-treatment total severity score).
I used specified starting values, because automatic starting values lead to local maxima. Weird thing, however, is that I cannot view any plot from the estimated probabilities when I run the above.

It is, otherwise, not possible to us ML while computing difference scores in Mplus (before running the LPA), right?

Bengt O. Muthen posted on Thursday, November 17, 2016 - 2:56 pm

You should have a semi colon after PLOT3.

Automatic starting values should be just fine using enough Starts. Check which run has the best loglikelihood.

To save time, never use Starts together with Tech11 or Tech14 - see web note 14.

You can use difference scores if you like.

Yajing Zhu posted on Thursday, February 08, 2018 - 10:47 am

Dear Profs,

I am wondering how to use Mplus to estimate a Diggle-Kenward model where the outcome of interest is categorical. I have consulted ex11.3 but got nowhere. When I want to have a categorical outcome y, I list it in the CATEGORICAl section because it is a DV. However, as the model of dropout is conditional ON categorical ys, I am not able to have ys on the RHS. I have tried to create a copy of categorical ys to include on the RHS but then, list-wise deletion is triggered...

Looking forward to your reply. Thank you!

Bengt O. Muthen posted on Thursday, February 08, 2018 - 4:10 pm

This should be possible using ML. You can send your failed output to Support along with your license number.

Brenna Gomer posted on Friday, February 28, 2020 - 12:40 pm

I'm new to Mplus and not sure if my syntax for pattern-mixture models and selection models is correct. I have 4 time points and a GCM with MNAR missingness on only the 4th time point. Some examples create missing data indicators for several variables, but I'm not sure how to adapt the code for missingness only at time 4. If I create indicators for more than the 4th variable, I get a message saying the indicator is not categorical ( it's all 0s). Here's my syntax for the pattern-mixture model:

VARIABLE: NAMES = y0 y1-y3;
USEVARIABLES = y0-y3 d3;
MISSING = ALL (999);

DATA MISSING:
NAMES = y2 y3;
TYPE = DDROPOUT;
BINARY = d3;

MODEL: i s | y0@0 y1@1 y2@2 y3@3;
i ON d3;
s ON d3;

Output:
TECH1;

For the selection model I added CATEGORICAL = d3; to the VARIABLE: section, changed TYPE to SDROPOUT, and added the following lines:

ANALYSIS: ESTIMATOR = ML;
ALGORITHM = INTEGRATION;
INTEGRATION = MONTECARLO;
PROCESSORS = 2;

And changed the last two lines of the model statement to
d3 ON y2 (1)
y3 (2);

I also hope to adapt this code for a linear regression model. Thanks for your help!

Bengt O. Muthen posted on Saturday, February 29, 2020 - 6:10 am

In this case, you should have a missing data indicator for only the 4th time point where there is missing data.

The first model is a bit strange because it says that missing at the 4th time point influences i which is defined at time 1 because of the zero time score at that point. Perhaps you want to instead say

y3 on d3;

For the second model, do you really think that y2 influences d3? That's a MAR situation because y doesn't have missingness. Seems like only y3 should influence d3.

Brenna Gomer posted on Saturday, February 29, 2020 - 11:01 am

Thanks so much for your help with my syntax! I have one more question. To create a missing data indicator for the 4th time point, is this syntax correct:

DATA MISSING:
NAMES = y2 y3;
TYPE = DDROPOUT;
BINARY = d3;

When I tried using just NAMES = y3; I got a message saying I needed to specify one more variable than indicators for pattern-mixture and selection models. Thanks again for your help!

Bengt O. Muthen posted on Monday, March 02, 2020 - 10:24 am

Yes, the syntax is correct.

Brenna Gomer posted on Monday, March 09, 2020 - 10:37 am

Thanks so much for your previous help! I have one more follow-up question. I seem to be having difficulty doing selection models for a regression context, I get this error message: *** WARNING
Data set contains cases with missing on x-variables.
These cases were not included in the analysis.
Number of cases with missing on x-variables: 41

And then the missing data indicator has 0 variance so I get an error message for that. Is there some modification I can make to my syntax to allow me to run this model? Thanks!

Bengt O. Muthen posted on Monday, March 09, 2020 - 4:21 pm

We need to see your full output to say - send to Support along with your license number.