Hi, I am rather new in Mplus, working before mostly with SPSS and Stata - so my initial questions are purely theoretical. I will start with the data I want to analyse and with what I want to obtain - hopefully with the use of Mplus I will reach my goal.
I have a large dataset (of about 15 000 patients) with different variables measured at baseline and these patients were followed until death.
What I want to do: I have 3 continuous variables of interest (variables that are correlated - they are in the same biological pathway) - 2 of them with a normal distribution and one positively skewed. Based on these 3 variables, I want to create a number of risk groups - to identify which phenotypes are associated with an increased risk for death.
Initially, I have divided the 3 variables into quantiles, made dummy variables and used the LCA or the CART (Classification and Regression Analysis Tree) plugins from Stata. But using with these approaches, with dummy variables and not the continuous form of the 3 variables, there is a loss of information and the final results are somehow strange.
1. Is Mplus able to perform this kind of analysis - using the 3 continuous variables to categorise the patients into groups based on their risk for the outcome?
2. In case the answer to the first question is "yes" (and from what I have read it is), what would be the best approach for data modelling - making this analysis without any other confounders, saving the phenotypes risk groups and then used them in a Cox analysis or making a model (and I assume this is the correct approach) in which I would include all the confounders, clusters etc?
3. A question about missing values - I saw that Mplus can deal with this issue (unlike the LCA or CART analysis from Stata). In the final dataset I don't have missing data for the 3 variables of interest and for the outcome (I have deleted all the patients with missing values for these variables), but I do have missing values for the confounders. Is this approach correct - deleting all the missing values for the 3 variables of interest? Can Mplus deal with missing values for the confounders?
Thank you very much.
P.S: I am not an epidemiologist. I am a physician with some statistical background and I don't know/understand all the theory behind these analyses.
1-2. A key question is if your 3 variables by themselves are likely to capture substantively meaningful groups - or merely reflect segments of non-normal distributions. This is also related to the Mplus capability of mixture modeling that doesn't assume within-class normality but instead can use skew-t. So the question is, are there substantive reasons for e.g. bi-modality for a single variable, or specific combinations of one variable being high and the other two low, etc?
Or, do the groups exhibit themselves only in the perspective of time to death?
We have a paper posted on our website about mixture modeling in the Cox survival context which you might find interesting (and there are UG examples of this kind as well); see Papers, Survival analysis:
Muthén, B., Asparouhov, T., Boye, M., Hackshaw, M. & Naegeli, A. (2009). Applications of continuous-time survival in latent variable models for the analysis of oncology randomized clinical trial data using Mplus. Technical Report. Click here to view Mplus outputs used in this paper. download paper contact first author
3. I would not delete people with missing data (unless they are missing on all variables) and instead let "ML under MAR" take care of it. Confounders are covariates so if we want to handle them the same way we have to bring them into the model by mentioning them in the Model command, e.g. their variances: x1-x5;