Keri Jowers posted on Tuesday, April 20, 2010 - 7:19 pm
Iím getting ready to do an LTA with 2 time points using a multiply imputed dataset based on 10 smaller datasets. Iíve done LCA, but not LTA, and this is my first time working with MI data.
My understanding is that the steps should be as follows: (1) Fit time 1 LCA using the MI data (with Type=Imputation). (2) Fit time 2 LCA using the MI data. (3) Using the thresholds for the best fitting model for the time 1 LCA as start values, run the LCA on one of the smaller component datasets. Do the same for time 2 with the same smaller dataset. (4) Perform the LTA using the smaller dataset used for the LCAs.
Is that right? If so, when I do step 3, I get different class proportions from the MI and smaller datasets, meaning that the conditional probabilities for individuals are different in the 2 models. How do I resolve that?
Thanks so much for your help!
Keri Jowers posted on Tuesday, April 20, 2010 - 7:26 pm
A clarification on my question: The class proportions are different at the same time point when I use the MI and smaller dataset.
What do you mean by a multiply imputed dataset based on 10 smaller datasets?
Keri Jowers posted on Wednesday, April 21, 2010 - 9:24 am
Excellent question. This is data I inherited, so my explanation may be a little fuzzy, but it looks like it's actually 10 MI datasets that have been stacked (using Stata). So, the guidance I got was to run the LCA on the stacked data to get start values, then to run the LCA on one of the MI datasets to get cprobs. I suppose I'm unclear as to why, when I do that, I get different class proportions. And then, when I start the LTA, do I use the stacked data or one of the MI datasets? Thanks so much for your patience -- I'm in unfamiliar territory.
It seems like that would work. It makes me suspicious of the data that it does not. You could divide the data set into the ten data sets and do multiple imputation using TYPE=IMPUTATION. You would not get cprobs though. You could run separately for each data set and average the cprobs although I'm not sure if that is correct for imputation.
Keri Jowers posted on Wednesday, April 21, 2010 - 10:41 am
Thanks for your thoughts! It's encouraging that you're suspicious, too. I'll see what I can find out from the source of my data. Are the cprobs something I'll need for the LTA?
Keri Jowers posted on Wednesday, April 21, 2010 - 10:42 am
Sorry -- one more question. Is there any reason you can think of that I couldn't just use the stacked data for all of the analyses rather than trying to duplicate class proportions in one of the MI datasets?
A continuation of the issues above. I decided to increase the number of starts on the stacked data to make sure that I was avoiding local maxima. The 4-class model continues to be the best fit. The problem is that when I use the resulting thresholds to set the parameters in any one of the MI datasets, I get: "ONE OR MORE MULTINOMIAL LOGIT PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY BECAUSE THE MODEL IS NOT IDENTIFIED, OR BECAUSE OF EMPTY CELLS IN THE JOINT DISTRIBUTION OF THE CATEGORICAL LATENT VARIABLES AND ANY INDEPENDENT VARIABLES. THE FOLLOWING PARAMETERS WERE FIXED: 2 3", with 2 and 3 being alphas. What's confusing to me is that I'm not including any covariates (the model includes 6 latent class indicators), which is the only other time I've seen this warning.
Even though the class proportions reported in the results from the un-stacked MI dataset show 2 empty classes, the output still provides the proportion of individuals in each of the 4 user-defined classes that endorse "0" or "1" responses for each of the 6 latent class indicators. These proportions replicate over the 10 MI datasets. I'm wondering if these are usable proportions for tables/graphs, and I'm wondering if I need to somehow modify my model to make it identifiable, even though it's identified in the stacked data.