Mplus Discussion >> Side effects of including missing data

Topics
Last Day
Last 3 Days
Last Week
Tree View


Side effects of including missing data

Mplus Discussion > Missing Data Modeling >

Message/Author

Jon Heron posted on Wednesday, May 24, 2006 - 12:20 am

Hi

I am running a series of LCA/LCGA models on 5 binary variables which measure the absence/presence of a symptom at 5 time points.

One of our plans is to relate the resulting classes to an outcome measured at some later date. A clinician could then potentially infer latent-class and hence the risk of outcome from the patient history of presence/absence of this symptom.

Comparison of the results from a complete-case analysis with those from including missing data and missing data predictors (so far just gender) has the effect of changing the resulting latent classes. I am fine with this as we have a larger population and the additional cases are likely to be different.

What I am less happy about is the discovery that one now needs to know gender as well as the repeated measures in order to assign each child to the modal class.

For those children with a complete set of repeated measures, this should be enough for the clinician to infer risk of outcome. The idea of him/her having to ask additional questions, particularly those relating to poverty/social-standing doesn't sit well with me.

Any comments would be grateful recieved

cheers

Jon

Bengt O. Muthen posted on Wednesday, May 24, 2006 - 8:02 am

I can understand that you want classes to be determined by symptom outcomes rather than gender. However, the fact that including gender as a covariate changes the latent class results is important to take a closer look at. Imagine that the correct model specification is one where gender influences not only the latent class variable but also one or more items directly. In that case, if you omit gender, your class formation (the percentages in each class) will change because the model is misspecified. This is because the model says that you have gender measurement invariance and you don't. So I would first check if measurement invariance holds with respect to gender by using LCA with gender as a covariate and including one direct effect of gender on an item at a time to see if some such direct effects are significant.

Jon Heron posted on Wednesday, May 24, 2006 - 11:29 pm

Thanks, I'll investigate

Jon

Jason Bond posted on Friday, August 18, 2006 - 9:48 am

Hi Bengt and Linda,

I'm running a LCGM with (obviouisly) a time varying outcome being predicted at each point by a time varying covariate along with the latent growth parameters. From the output what I've noticed is that, if a person does not show up at any wave, the person's entire set of observations is removed from the analysis which is the same thing that happens if they have item non-response for the covariate. So any missingness for a covariate removes the entire individual from the analysis while this is not so for missingness only in the response variable at a given time. In the 'stacked' or 'long' data format (e.g., in HLM), one might just leave out the row corresponding to the person-wave that is missing (or where a person-wave has a missing covariate). Is there a similar way to deal with this issue in Mplus? Can I just use the 'long' format and leave out these person-wave points? Thanks much,

Jason

Jason Bond posted on Friday, August 18, 2006 - 2:34 pm

I've also now tried including the variances of these time varying covariate predictors in the %OVERALL% part of the model, which (from reading other posts in the discussion group and the users manual) I understand treats them as dependent variables and therefore the entire listwise deletion of a person (all person-waves in wide format) should not occurr when using TYPE = MISSING. Will doing this include these cases not only in the %Overall% model (the reference class) but in the other classes as well (I didn't include these variances in each of the %C#X% statements)? It is in the context of a quadratic ZIP model so I'm having to do montecarlo integration of 10 dimensions so it is taking a while to converge.

If one could use the widetolong command to essentially stack the time varying variables on top of each other, then I imagine one would need to modify the model commands accordingly. I've looked through the version 4 users manual but can't seem to find how such commands might be modified. Any additional light you could shed on this would be appreciated. Thanks,

Jason

Bengt O. Muthen posted on Friday, August 18, 2006 - 5:08 pm

You want to avoid working with 10 dimension of integration, so the long approach sounds like the way to go. I think WIDETOLONG gives you missing data for the covariate at the time point - so not deleting that row for the person in the long form.

Jason Bond posted on Friday, August 18, 2006 - 5:30 pm

I thought it might be. Is this a problem that you've seen alot of people ask about? Seems like it might be. Obviously, model commands such as:

i s q | aacapt1@0 aacapt2@1 aacapt3@3 aacapt4@5 aacapt5@7;
ii si qi | aacapt1#1@0 aacapt2#1@1 aacapt3#1@3 aacapt4#1@5 aacapt5#1@7;

don't make sense anymore (where aacapt1-aacapt5 are the wide versions of the outcome variables measured at 5 time points) because there would only be a single outcome variable. How would one fix the time paths? Using the:
Repitition = time;
command, where time is the variable indicating measurement time points? Is there a place you can point me to where there is example syntax implementing growth models for long formats? Thanks much again,

Jason

Bengt O. Muthen posted on Friday, August 18, 2006 - 6:10 pm

Look at ex 9.16.

Jason Bond posted on Monday, August 21, 2006 - 11:29 am

Thanks much. Ex 9.16 was quite helpful. Unfortunately, as always happens, I'm trying to do a combination of all of the different types of analyses. What I'd like to estimate is a LCGM for a ZIP outcome predicted by, in addition to the random intercept and slope and quadratic terms, a time varying covariate in the widetolong framework. Example 9.16 is helpful for the widetolong analysis of a continuous outcome in a non-mixture framework. Example 9.17 is useful for a multilevel analysis of a ZIP outcome but using the wide format instead of long. Example 10.4,5 are useful for a LCGM for continuous and varying types of outcomes in the wide format. I attach in the following post an attempt at some Mplus code to try and implement the LCGM for a ZIP outcome in a long format dataset. Any comments/suggestions you might have would be greatly appreciated. Thanks so much again,

Jason

Jason Bond posted on Monday, August 21, 2006 - 11:30 am

DATA:
FILE IS "I:\MyFiles\Trajectories\AA-Tx-Careers\ZIP\AA-TX-traj.dat";

Data Widetolong:
Wide = aacapt1 aacapt2 aacapt3 aacapt4 aacapt5 |
rqtrtmt1 qtrtmnt2 qtrtmnt3 qtrtmnt4 qtrtmnt5 |
time1 time2 time3 time4 time5;

Long = aacap | qtrtmnt | time;
IDvariable = id;
Repitition = time;

VARIABLE:
NAMES = id
aacapt1 aacapt2 aacapt3 aacapt4 aacapt5
rqtrtmt1 qtrtmnt2 qtrtmnt3 qtrtmnt4 qtrtmnt5
time1 time2 time3 time4 time5;

USEVARIABLES ARE id time aacap qtrtmnt;
Count = aacapt1-aacapt5 (i);
!(SHOULD THIS BE INSTEAD STATED AS:
!COUNT = AACAP (I); ?)
Cluster = id;
Within = time qtrtmnt;
Classes = C(2);
MISSING ARE ALL (-9);
IDvariable = id;

ANALYSIS:
TYPE = Twolevel Mixture Missing;
STARTS = 10 5;
ESTIMATOR = MLF;
!(I IMAGINE THIS WILL REQUIRE INTEGRATION)

MODEL:
%WITHIN%
%OVERALL%
s q | aacap on time;
si qi | aacap on time;

%c#1%
aacapt on qtrtmt;

%BETWEEN%
%OVERALL%
s q | aacap on time;
si qi | aacap on time;

%c#1%
aacapt on qtrtmt;

OUTPUT: TECH1 TECH8;

Linda K. Muthen posted on Monday, August 21, 2006 - 3:55 pm

Looking at an input to determine if you are specifying a model in a particular way is difficult. My suggestion is to run the input and see if you get what you expect in the model results. If you don't, then I would send the input, data, output, license number and a description of which parameters you are not obtaining to support@statmodel.com.

Bengt O. Muthen posted on Monday, August 21, 2006 - 3:58 pm

A quick look at this says that you should try it with

!COUNT = AACAP (I); ?)

Also, in the MODEL paragraph, you want to have the regression

%c#1%
aacapt on qtrtmt;

in the %Overall% part also.

Mplus will tell you that you need integration.

If you get stuck on this, the quickest solution is to send all materials to support@statmodel.com.

One more important thing - note that the mixture option refers to a latent class variable that varies across units on the Within level; it is not a between-level variable. - When you do the growth as twolevel, you have person on the between level. Typically, persons, not time have mixtures. See the twolevel mixture examples.