Missing values, MI, and growth mixtur... PreviousNext
Mplus Discussion > Missing Data Modeling >
 Shige Song posted on Saturday, March 10, 2007 - 12:08 am
Dear Linda and Bengt,

Here is something that has been bothering for a while: I want to do a growth mixture model to identify latent sub-groups. Some of the covariates I choose to predict latent class membership have significant amount of missing values. From what I have read, I can 1) give the covariates some distributional assumptions and make them as part of the model or, 2) I can do a multiple imputation.

In case of growth mixture model, is multiple imputation still a viable option? I mean, will the latent class group membership parameters be combined in the same way as other parameters using Rubin's rule? Does Mplus automatically use these combined class membership parameters to classify individuals into latent classes?

In short, what is the optimal method to handle missing covariate values in a growth mixture model?


 Linda K. Muthen posted on Saturday, March 10, 2007 - 7:44 am
We think that multiple imputation is a viable option for growth mixture modeling. One issue is that when you analyze the imputed data sets, give good starting values so that you do not run into label switching.
 Shige Song posted on Saturday, March 10, 2007 - 4:37 pm
Hi Linda,


Do you have examples showing how to do growth mixture modeling using imputed data sets?

 Linda K. Muthen posted on Saturday, March 10, 2007 - 5:05 pm
No. You just use the IMPUTATION option of the data command instead of a single data set. There is nothing else different from any other growth mixture model other than using starting values to avoid label switching.
 Myong Hwa Lee posted on Saturday, April 17, 2010 - 6:31 am
Dear Linda and Bengt,

I’m running growth mixture models with 4 binary covariates and one binary distal outcome. The growth outcome variables are ordinal (3 categories). All variables have some missing cases (1% - 30%). I created 5 imputed datasets by using ICE in STATA and then used “type=imputation” in Mplus. The outputs looked good.

But the output did not print both the results of probability scale of distal outcome in each class and the latent class odds ratio results. I want to know how the classes are related to distal outcome. How can I get these results? (If I don’t use “type=imputation” option, these outputs were always printed.)

I appreciate your help!
 Ashley Hum posted on Tuesday, May 27, 2014 - 4:22 pm
Hello, I'm doing a GMM with individually varying times of observation and using 100 imputed datasets. I see from posts that I should use starting values (SVs) to avoid label switching. 1) Does this still apply?

2) If I obtain SVs by using 1 dataset and running a 1-group GMM with the svalues command, how do I use these values (pasted below) to specify SVs for later models? How do I identify SVs for different classes or do I set SVs for the general model?

i WITH s1*-1.22193;
i WITH s2*0.08484;
i WITH q2*-0.08108;
s1 WITH s2*-1.01651;
s1 WITH q2*0.16719;
s2 WITH q2*-1.52019;
[ ash98m@0 ];
[ ash99m@0 ];
[ ash02m@0 ];
[ ash03m@0 ];
[ ash04m@0 ];
[ i*1.17682 ];
[ s1*0.03729 ];
[ s2*0.13607 ];
[ q2*-0.06221 ];

Thank you, Ashley
 Bengt O. Muthen posted on Tuesday, May 27, 2014 - 6:42 pm
1) Yes.

2)You get SVALUES from runs for each number of classes. The SVALUES contain values for both the Overall and class-specific parts of the model.
 Ashley Hum posted on Tuesday, May 27, 2014 - 7:04 pm
Thank you for your quick response.

1)Just to clarify, I should run a 2-class GMM on one of the imputed datasets and get svalues and then use these values for the analysis across the 100 datasets?
And then continue this procedure for more classes, i.e., run a 3-class on 1 dataset and then svalues from that to run across all datasets?

2) Sorry for the likely simple question, but do I copy all values from the svalues output onto the next analyses input or just a portion? Besides copying these, are there other things required to use these values as starting values?
 Bengt O. Muthen posted on Wednesday, May 28, 2014 - 2:13 pm
1) Yes.

2) Copy all values. I think that's it, but the program will tell you. Use Starts=0.
 Ashley Hum posted on Saturday, June 07, 2014 - 1:49 pm
Hello again,
Thanks for your help with my previous question. As mentioned with my previous question, I'm doing a GMM with individually varying times of observation and using 100 imputed datasets.

I tried to use Aux (R) to use covariates to predict class membership, but I received an error message that indicated that aux (r) was not allowed with type=imputation. This is likely a simple solution, but what is the appropriate syntax for using covariates to predict class membership?

Thank you,
 Linda K. Muthen posted on Saturday, June 07, 2014 - 5:32 pm
You would get these from the RESIDUAL option of the OUTPUT command. I am not sure if it is available for multiple imputation. Try it to find out.
 Ashley Hum posted on Sunday, June 08, 2014 - 12:11 pm
Thank you for your response. I tried using the RESIDUAL option, however, I got an error message stating that this option is not available for TYPE=RANDOM.

Do you have any other suggestions for using covariates to predict class membership for a model with type=random mixture and with imputed data?

Thank you again,
 Bengt O. Muthen posted on Sunday, June 08, 2014 - 4:21 pm
See Appendix 1 of our 3-step paper on our website, showing how Auxiliary R3STEP can be done manually:

Asparouhov & Muthén (2013). Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus. Accepted for publication in Structural Equation Modeling. An earlier version of this paper is posted as web note 15. Appendices with Mplus scripts are available here.
 Emma Davies posted on Monday, February 13, 2017 - 5:02 am

I'm trying to fit a single growth curve to my dataset, using multiple imputation (I have 50 imputed datasets). For now I am keeping the model basic as I am learning this technique. I have 6 timepoints, at baseline (0), 6, 12, 52 and 86 weeks, and as such, specify in the model command:

i s | abc0@0 abc6@1 abc12@2 abc36@6 abc52@8.7 abc86@14.3;

I have read above that it is advised to run the model on one imputed dataset and use the final estimates from this analysis as the start values for the model containing the 50 sets. As such, I have specified in the Analysis command STARTS=0 to turn off random starts, and have read in the manual that to assign a starting value to a parameter you use *. For example, abc36*0.496.

However, I do need the abc36 parameter (for example) to also be fixed at 6 (with @) to tell the model it is 6 on from baseline. So my question is how do I do both of these in the model?

Best Wishes,
 Bengt O. Muthen posted on Monday, February 13, 2017 - 5:55 pm
You only give starting values for free parameters. Your abc36 time score parameter is fixed, not free.
 Emma Davies posted on Tuesday, February 14, 2017 - 2:32 am
Thank you Bengt.

A follow up question: In reality, not everyone's first assessment fell at exactly 6 weeks. I have a variable "time" which denotes the exactly number of days since their baseline assessment that their assessments actually fell. I notice you can use TSCORES to incorporate this variation in the variables command, and thus the model command would look like:

i s | abc0 abc6 abc12 abc36 abc52 abc86 AT time0 time6 time12 time36 time52 time86;

1) is this correct?
2) is this the same as fixing the parameter, I am just fixing it to the variable "time"? (if so, your answer to my previous question about starting values would also apply here?). I am still using imputed data.

Thank you for all you help and advice.

 Bengt O. Muthen posted on Wednesday, February 15, 2017 - 11:14 am
1) See UG ex6.12

2) Yes my answer applies.
 AT Jothees posted on Friday, April 21, 2017 - 8:36 am
Hi ,
I have a longitudinal data , gathered in 6 waves. There are missing variables that are both continuous and ordered-categorical. In three waves some variables ( blood samples) were skipped by design to reduce the data collection cost. And in other waves new variables were added in the follow-up. Hence I have marked as missing in the initial waves.
I am trying to run the latent growth curve modelling ( particularly, second order latent growth curve). I have following questions for clarifications before I could run the analysis. I am new user of mplus. Therefore, your advice would be very helpful.

1. What syntax I should use for multiple imputation for my purpose? Can you please cite a specific reference syntax in the mplus user guide? I get confused when I read chapter 11.

2. I understand that I first have to run the multiple imputation syntax first and then run the input syntax for second-order growth curve by TYPE = IMPUTATION command . Is this correct ? or do I have to specify the model and run the imputation command simultaneously.

3. Do I have to specify only variables with incomplete and partially missing in the USER VAR command ? or all the variables? This may be a silly question, but for me I still have not understood how it works.
 Bengt O. Muthen posted on Friday, April 21, 2017 - 5:47 pm
Your missing data scoring is the way to go, but make things easier by not using Multiple Imputation. You don't need it in this case - just use ML under MAR (which some people call FIML) and you will be taking the optimal approach of using all available data.
 AT Jothees posted on Saturday, April 22, 2017 - 5:25 am
Dear Muthen,

Thank you for the quick response.

I have a follow up question. Two variables with missing values in my model Is categorical and four others are continuous . Can I still use ML under MAR assumption ?

Thanks in advance,

 Bengt O. Muthen posted on Saturday, April 22, 2017 - 5:33 pm
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message