Matt Cooper posted on Thursday, August 11, 2011 - 2:09 am
Am after a nudge in the right direction with a problem I have, will try to be concise.
I have longitudinal data on ~200 subjects. The subjects are children coming into a clinic on an ~3 monthly basis from diagnosis until they reach 18. Data starts in 1992 up until now. There is a primary numeric measure on these children measured at each visit, it ranges from 4 to 14, with majority being between 6 to 10. It is known that this variable goes up with age. What is apparent from plotting the data in spaghetti form is that there are groups that start high and stay high, start low and stay low, start low and creep up etc.
I'm keen to try and put people into groupings like this by using the data. Some caveats. Someone who was diagnosed at 10 in 92 and followed until 2010 might go from 8 up to 9.5 over those 8 years, this person may well be the same as someone diagnosed at 10 in 2000 and followed until 2008 who goes from 7 to 8.5, as over the last 20 years the average level across the clinic has declined. I can restrict the data to those diagnosed between certain years to try to remove this artifact in preliminary analysis.
I've looked at things like klm in R but that has the problem of imputing incomplete data (it has to to run), which is inappropriate here given some are diagnosed at 1 and others at 15.
Any helps as to which MPLUS package might be appropriate would be appreciated.
Sounds like you are interested in growth mixture modeling. Given the period trend, restricting the data to diagnoses during certain years might help as you say. Perhaps you also want to add a multiple-cohort twist to it, where the cohorts are ages (or age groupings) of diagnosis - unless age of diagnosis doesn't change the growth parameters.
Matt Cooper posted on Friday, August 12, 2011 - 12:09 am
Okay. I've got the Chapter 6 and 8 guide to work through. Will post back if I get stuck when looking to implement the model.
Thanks for you help.
Matt Cooper posted on Wednesday, August 17, 2011 - 12:35 am
So I've had a read and am not sure of a couple of things for working my data into an MPLUS form. I'm looking at something like:
TITLE: this is a test of a GMM DATA: FILE = "C:\analysis\for_mplus_test.dat"; VARIABLE: NAMES = id glych age yearcat obs; USEVARIABLES ARE glych age id; CLASSES = c(3); ANALYSIS: TYPE = MIXTURE; STARTS = 20 2; MODEL: %OVERALL% i s | glych@0; i s ON age; c ON age; OUTPUT: TECH1 TECH8;
My data is in a long form, where each subject has anywhere from 5 to 60 rows, so I need to tell the model that id is the grouping variable. In the examples in the chapters, are y1 y2 y3 the outcomes at the time points? Just getting confused here as my subjects have all come in at different ages so can't lock them into uniform endpoints.
Will continue to play.
Matt Cooper posted on Wednesday, August 17, 2011 - 1:36 am
In addition, I have hunted around and haven't been able to find an application of Mplus to a problem like mine, where I might have 5 visits for someone between ages 7 to 9 and then 40 on someone else between ages 5 and 17 etc trying to find if people lie on similar trajectories. If you can think of and point me in the direct of a similar application that would be great.
Example 9.16 shows how to set up a growth model when data are in long format. Example 6.12 shows a growth model with individually-varying times of observations.
Matt Cooper posted on Thursday, August 18, 2011 - 4:03 am
Thanks Linda. I really appreciate the prompt relies you both give.
I have spent some hours now looking over the examples and trying to think how my data fits in.
Given I want to see latent class groupings for the trajectories, I am confident that TYPE = MIXTURE is what I'm after.
With the wide vs long, my data is in long format, and I can't convert it to wide given the details explained above, almost every column would have 1 record in it if I did that as I have age to 2dp so everyone has a unique time record for their visits (91 days apart, then 115 then 84 etc).
This is where I am at now and the error I get. If I can apply the mixture model to data in long format I think I'll be set.
VARIABLE: NAMES = id glych age yearcat obs; USEVARIABLES ARE glych age id; IDVARIABLE = id; CLASSES = c(2);
ANALYSIS: TYPE = MIXTURE; STARTS = 20 2;
MODEL: %OVERALL% i s | glych; i s ON age; c ON age;
OUTPUT: TECH1 TECH8;
*** ERROR in MODEL command The number of fixed time scores is not sufficient for model identification in the following growth process: I S