Message/Author 

Andy Ross posted on Monday, December 01, 2008  2:56 am



Dear Prof Muthen When comparing means across latent classes using the auxiliary option (e)  how does mplus treat missingness on the variables specified in the auxiliary option? Many thanks Andy 


It skips the individuals who have missing on that variable. 


Greetings, I have some questions regarding the inclusion of covariates (predictors/outcomes) in mixture models. 1) Is it possible to compare latent classes on categorical "outcomes" (binary ? polytomous?) with the Auxilliary (e) option (or is it solely reserved for continuous outcomes) ? If it is possible, what kind of analysis will it do ? Will it simply compare the "mean" of the binary outcome across groups as if it was continuous ? 2) What does the Auxilliary (e) option do with missing values ? For instance, if 50% of the sample have missing on the outcome, will it still be possible to estimate the profiles with all (100%) subjects and then compare these profiles only on participants with full data ? 3) In the same manner, Mplus does not "tolerate" missing on the covariates. Lets' suppose we wish to predict latent class membership (with a multinomial logistic regression) on the basis of predictors we only measured on 50% of the sample (in fact we did combine two different samples in which the same mixture indicators but not covariates were measured to maximise the n in the estimation of the latent profiles). How would you do this ? Thank you very much 


Just realized that my second question appeared similar to the previous post on this list. Sorry for the confusion. Let me rephrased it: 2) When they are missing data on the variables compared with the Auxilliary (e) option is it still be possible to estimate the profiles with all (100%) subjects and then compare these profiles only on participants with full data ? 


1) Aux only does means, treating variables as coontinuous. 2) For missing on the outcomes, yes. Missing on the aux variable uses only those present. 3) You can bring the covariates into the model by mentioning their means or variances. This then will be missing data modeling including the covariates, assuming normality for them. 


Thank you Dr. Muthén, it clarifies part of the question but I'll be more specific. We plan on combining two samples (n =200 and n = 300) including the same mixture indicators; to maximise the available sample size. In one sample, we also have predictors (n = 200). In the other, we also have outcomes (n=300). So, in the combined sample, 40%60% of the data on the predictors or outcomes are missing. We do not want to do missing data imputation. (1) If we estimate the profiles with the full sample, will the auxilliary (e) function be able to compare the profiles on the outcomes using only the 300 available observations BUT on the basis of the model estimated with the full 500 subjects (or will the fact that 40% of the auxilliary are missing result in the deletion of the remaining cases in the estimation of the main model ?). (2) How can we do a similar thing with the predictors ? We would like to estimate the profiles on the n = 500 sample and then to predict profile membership (taking all model information into account) with a multinomial logistic regression on only the n= 200 sample... The only possibility I see would be to start with the full sample to estimate the latent profle model. Then, to use only the n = 200 subsample using "user defined starts values" from the previous model(shutting down the random start option) while including the predictors. Thanks again ! 


1) Aux(e) does not influence the number of observations used in the model, only variables on the usev list do. 2) If you do the n=500 analysis with only the outcomes (no covariates), this solution may be different than when including covariates (due to potential direct effects from some cov's to some outcomes). You can do the full model (with covariates) using n=200, then do an analysis of the n=300 with only the outcomes, fixing parameters at the values of the n=200 analysis and get the n=300's posterior probabilites. Both of these 2 approaches have their shortcomings. The 3rd approach would be to analyze all 500 together with covariates brought into the model as I mentioned. Then you would implicity "impute" which makes assumptions, but you are making assumptions in all 3 approaches. I would do them all and check the sensitivity. 


Thank you very much. Would a multiple group analysis in which the invariance of the latent profile model could be directly tested across groups (2 subsamples) allow for the inclusion of the covariates in only one of the groups ? 


That would be 2group analysis with different number of observed variables. This can be done but requires a little trickery. 


Greetings, Now you have me interested. What would this trickery be ? The only way I can see would be, if the model with outcomes only is invariant across groups, to restimate the model in only one group adding covariates (the previously demonstrated invariance would support this). But from your answer I understand there might be another way ? Thank you very much in advance! 


Actually, this boils down to the same thing as I mentioned earlier  bringing the covariates into the model and scoring them as missing for part of the subjects. 


Hello, Dr Muthen, I have some questions regarding distal outcome variables. I'm trying to test differences in the responses of distal outcome variables: some are continuous and some are categorical. 1) As far as I understand, variables in auxiliary are all treated as continuous variables. Is there any alternative way to test differences of responses in categorical distal outcome variables, such as chisquare test of contingency table? 2) For continuous outcome, I tried (e), (du3step), and (de3step) options to experiment. (e) estimated means/sd and test results were what I expected. However, (du3step) failed and just displayed 999.000 everywhere. (de3step) result of a variable was hugely different from (e) result. I think (e) result is more appropriate, but I wonder if you have any advice on why there are difference and what I could trust. Thank you very much for your help! 


1) The DCAT option is for categorical distal variables; see the V7 UG Addendum and also web note 15. 2) The DCAT option is better than the DU and E options in your situation. See web note 15. 


Thank you very much for your response Dr. Muthen. I tried DCAT and DCON options of auxiliary statement, and I have further questions to ask. 1. Results from DCON option is still different from the result of E option. Significance of some pairwise differences were not consistent between two options. Which one should I trust, or is there conditions that make either option more valid? 2. Is there any method that estimates latent class variables that maximize the difference in a single distal outcome or combination of multiple distal outcome. Is it a feasible idea, and, if it is, is there any method to do that? Thank you very much for your help! 


1. Use DCON. The reasons are discussed in Web Note 15. 2. I don't know of any such ethod. 

Back to top 