Anonymous posted on Friday, August 12, 2005 - 6:06 am
I have a question about how best to setup my dataset to handle the pattern of missing data in the dataset I am using. The data come from a longitudinal follow-up evaluation of an early childhood intervention program. Children ranged in age from prenatal to 15 months at baseline. Family-level data were collected based on enrollment date and reflect varying child ages at each time point. So, while I have longitudinal data (upto 4 observations), these datapoints reflect different periods of children's life across the first three years (for some kids it represents the first 2 years, for others years 3-4). My questions are:
1) Is this considered missing by design? My understanding (from reading the manual) is that in order to account for this, and to distinguish between the missing by design vs. other missing (e.g., survey not administered, item skipped/refused) I include both a MISSING ARE(x) and a PATTERN IS statement in the variable command.
2) If so, then (a) I need to first create child-age specific study variables (e.g., income-age 3 months, income-age 4 months, etc.) and distinguish between those who should have data at that age but don't (e.g., skipped item on survey; -99) and those who are missing because they are not of that age-range (e.g., -88). And, (b) in the PATTERN IS option I need to specify all potential combinations for ages across the four time points (e.g., 1= inc-3, inc-9, inc-18, inc-27)?
3) Once the nature of the missing data is correctly specified in the variable command, is it the case that I would set up a growth curve modeling month-to-month observations (as if for any given child I had observations across the first five years), but in reality any given child contributes a maximum of up to 4 points?
4) Can you recommend some articles/readings on this topic.
Thank you in advance for your response.
LMuthen posted on Friday, August 12, 2005 - 3:26 pm
I think you want to rearrange your data as shown in Chapter 13. See the section under Missing Data Analysis called Rearrangement of the Multiple Cohort Data. The data are rearranged so that rather than measurement occasion being the time variable, age is the time variable. Any missing data that arise because of this rearrangement is missing by design. Other missing data may not be, and note in Chapter 13 that there is a listwise deletion as part of this procedure. You would use the MISSING, COHORT, TMEASURES, and TNAMES options to do this. Alternatively, you could use program like SAS to rearrange the data.
When you do your growth model, you use the age variables in the analysis. In doing this, you also make the assumption that each age cohort comes from the same population. So, for example, you assume that 3 year olds born in 1999 are the same as three year olds born in 1998.
Following is one paper that used this approach:
Muthén, B. & Muthén, L. (2000). The development of heavy drinking and alcohol-related problems from ages 18 to 37 in a U.S. national sample. Journal of Studies on Alcohol, 61, 290-300.