Sample selection and missing data PreviousNext
Mplus Discussion > Growth Modeling of Longitudinal Data >
 Claire Noel-Miller posted on Tuesday, January 26, 2010 - 12:41 pm
I am examining middle-aged Americans' trajectories of parental care-giving (4 time points) using a dichotomous outcome( provided care or not). My question has to do with sample selection and the resulting implications regarding missing data overtime. At each time point, the caregiving variable is only relevant for adult children with a surviving parent. Three options for sample selection: (a) limit the analysis to adult children who had a surviving parent at all time points. I would have no missing values across time, but a drastically reduced sample size. (b) select adult children with a surviving parent at T1 and follow them through T4 regardless of whether or not they have a surviving parent at each later time point. I would include a TVC at T2-T4 measuring parental survival and individuals with no parent would have a 0 value on their outcome variable. Although this increases sample size, I've got people in the later time points for whom the outcome variable is not relevant. (c) at every time point, retain only the adult children with a surviving parent. Although this seems like a good approach, I am concerned that missingness across time will be 'systematic', i.e. related to parent's likelihood of dying. Any thoughts? Any references for substantive studies that have used LGC to model an outcome that is itself dependent on the presence of some other 'thing' would also be greatly appreciated.
 Bengt O. Muthen posted on Wednesday, January 27, 2010 - 11:17 am
I agree that approaches (a), (b), and (c) are not good. In situations like this, I think one should bring both the outcome and the "dropout" into the model, where in this case dropout means that the outcome is no longer relevant (is missing) after the parent's death. There is a big stat lit on non-ignorable dropout modeling, but the essence is that you bring dropout time information into the model.

One approach would add to the growth model of the outcome a survival model for dropout. In your case, a discrete-time survival model would be suitable. The Diggle-Kenward model assumes that the survival part is influenced by the outcome and/or the potential outcome that would have been observed had dropout not occured. The Beunckens' model specifies that the growth factors and latent classes influence the survival.

Another approach brings dropout dummies into the model as covariates. There you have a choice of pattern-mixture modeling or Roy latent class modeling.

I have an overview paper on how to do all this and several more models in Mplus, which I will release in a few weeks.
 Claire Noel-Miller posted on Wednesday, January 27, 2010 - 1:31 pm
A couple of follow-up questions, if I may. (1) Do I understand correctly that the Diggle-Kenward and the Beunckens models would assume that dropout status is a function of caregiving and caregiving slope/intercept respectively? If this is indeed the case, these don't appear to be the appropriate tools as dropout status in my case is primarily determined by parent's survival risk (there is some attrition). (2) This then leaves the pattern mixture modeling appraoch (both the simple and fancier version). Am I correct in thinking that I would need to select the sample as in (c) above and then allow the growth parameters to differ by pattern of missingness? (3) Finally, where will the overview paper appear? Thanks!
 Bengt O. Muthen posted on Thursday, January 28, 2010 - 9:58 am
(1) Yes. And, if care giving cannot be said to influence dropout, then MAR doesn't hold.

(2) Yes, you could use pattern-mixture in the way you state.

(3) First on our web site; it is not yet submitted.

You say that dropout is primarily determined by parent's survival risk. If you have measured predictors of survival risk you may want to include them in the model. This could make MAR more plausible and not necessitate pattern-mixture. The predictors could be included in two ways. First, they can be used as covariates if they are also substantively relevant predictors of care giving. Second, if they are not substantively relevant for care giving, they can be included as "correlates", that is, auxiliary variables that simply enhance the MAR plausibility. The latter approach using Mplus is described in the Web Talk on our web site "Missing data correlates using ML" at
 Claire Noel-Miller posted on Thursday, January 28, 2010 - 1:58 pm
Thanks! This is very helpful.
 Nicholas Bishop posted on Wednesday, February 10, 2010 - 2:49 pm
Hello Bengt. I also look forward to your release of the pattern mixture modeling paper. When do you expect that the paper will be posted? Thanks.
 Bengt O. Muthen posted on Wednesday, February 10, 2010 - 4:47 pm
It is now ready to be submitted. It will be posted by next week.
 Claire Noel-Miller posted on Tuesday, April 06, 2010 - 2:04 pm
Hello Bengt.
I am fitting a LGC model to binary outcome data (4 data points on time transfer yes/no) using wlsmv. My data has some attrition and some loss to follow-up due to death as respondents are relatively old.
1-- Am I correct in thinking that Mplus deletes any observation with a missing on the x variable?
2-- I would like to retain my entire sample for analysis. How practically would you recommend imputing these missing values? In particular, I have Time-varying covariates: if a person dies at T2 for instance, how would I think about imputing his/her T3 and T4 time varying covariates?
 Bengt O. Muthen posted on Tuesday, April 06, 2010 - 5:54 pm
1. Yes. And WLS does not handle the attrition as well as ML. That is, WLS is not able to draw on MAR, i.e. borrow information from outcomes at other time points, as ML does.

2. Attrition due to death is a bit complicated. But isn't the imputation question the same for the outcomes as for the time-varying covariates? ML under MAR would implicitly impute outcome values after death. So a multiple-imputation program could treat these two types of variables the same. More complex/refined approaches would delve into the question of whether attrition due to death is MAR, or whether it is better considered as non-ignorable missing (NMAR). See my new missing data paper on that subject which was recently posted.

If you wait for Mplus Version 6 you can do the multiple imputations there.
 Claire Noel-Miller posted on Tuesday, April 06, 2010 - 7:13 pm
Thanks for your answer.
1- Just to clarify, does this mean that under ML, Mplus does not delete observations with missing values on x?
2- The reason I opted for WLS rather than ML was to allow correlation between the outcome variable residuals. The model that was fitted on the subset of observations with complete Xs fit the data well and the residual correlations were non-significant. Would you recommend then moving to ML and assuming no correlation between residuals, thereby taking advantage of ML's greater capacity to deal with missingness in the X variables.
Many thanks!
 Bengt O. Muthen posted on Wednesday, April 07, 2010 - 10:03 am
1 ML does delete observations with missing on x (because the model is conditional on the x's). You can either bring the x's into the model and thereby handle their missingness by ML under MAR, or you can do multiple imputation for the x's and then do ML.

2 With strong attrition I think it is more important to handle that missingness well - using ML - than allowing correlated residuals. An alternative is to do multiple imputation for the outcomes and then do WLS with correlated errors - this is a possibility in the forthcoming Mplus Version 6.
 Claire Noel-Miller posted on Wednesday, April 07, 2010 - 2:45 pm
Thank you for your answer. I have a couple of questions on a different topic, that I was hoping you might be able to assist me with. I am estimating a lgc model with 4 binary outcomes (wlsmv & delta parametrization). I get the following warning: the residual covariance matrix (theta) is not positive definite. [...] Problem involving variable mhlpr4. This is the outcome for the last time point.
1- which output can be invoked in Mplus to follow the leads in the warning and try to detect the problem?
2- are there any solutions to this problem? i read in a different post that the residuals could be constrained across time. When I tried this, I received a message indicating this was not possible with delta parametrization.
Thanks for your help!
 Bengt O. Muthen posted on Wednesday, April 07, 2010 - 6:16 pm
You can try using the Theta parameterization and fix the residual variances at one at all time points instead of the default of fixing only the first.

The message could imply that the growth model is not correctly specified (e.g. wrt growth shape, or influence of covariates).
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message