Hi, I have a question about your missing data modeling analysis. If I run an LGM with missing data, am I able to test for MAR? In other words, how can I test whether my data is Missing at random versus missing not at random? Would failure to reject the null hypothesis mean my data is at minimum MAR? Is your Missing data analysis option similar to what I read in Duncan and Duncan (1995: "Modeling the process of development via latent variable growth curve methodlolgy."
Thanks so much! You and your husband are very good to the mplus users!
wendy posted on Wednesday, July 19, 2006 - 6:55 pm
Hi, Dr. Muthen I am currently conducting a research in terms of growth modeling with one covariate predicting both latent intercept and slope. Therefore, exogenous variable Z predicts I and S. I fit 2 models with a growth modeling and an equivalent model with special pattern of missing data; I would like to compare the parameter estimations, bias, and power of those two models. One unusual characteristic is that the power detecting S regressed on Z of full growth model is less than that of model with missing data and the direct effect of S on Z is insignificant. I suppose direct effects will be less powerful in model with missing data,is it right? Thank you.
When you say "pattern of missing data" I get the association of the "pattern-mixture" approach to non-ignorable missing data analysis. That is, using dummy covariates to represent missing data patterns so that model parameter values can vary for different missing data patterns. If that is the correct association, I would think that differences in results are due to MAR not holding but that the non-ignorable missing data approach is needed. That doesn't seem to be a matter of power alone but also bias.
I am modeling student examinee effort scores across seven time points (sample size = 1434). I have missing data (MAR) and am using full information maximum likelihood. My univariate skew and kurtosis are normal; however, my Mardia's is large, suggesting multivariate non-normality (but could be due to large sample size). The MLM estimator appears to require listwise deletion (which makes sense, given that FIML assumes multivariate normality, correct?). In this instance, do I ignore the Mardia's and use MLR?
We are running an LGM on unbalanced panel data spanning the years 1984 to 2009. Since we are interested in trajectories across the whole life course, we rearranged the data so that there are age-specific indicators for the dependent var, from ages 18 to 65 (wide format). The problem is that, obviously, no respondent is observed across the whole life course (we only have 12% valid measurements on average); hence, Mplus gives an error that the minimum covariance coverage was not fulfilled (even after setting COVERAGE=.01), and the model does not converge. Does that mean that it is not possible to run the LGM in wide data format so we have to switch to a multilevel framework?
You might consider the multiple group multiple cohort approach shown in Example 6.18.
Carolin posted on Wednesday, June 29, 2011 - 5:00 am
Dear Mr. and Mrs. Muthen,
I have some questions concerning FIML with GMM. After reading many papers I conclude that it is better to use FIML instead of LD (I hope I am right?!).
First question: in GMM the default is FIML, right?
Second: Is it ok to use FIML even if I have non-normal data (I read that FIML has the assumption of multivaraite normality)? I actually thought that non-normality is often assumed in GMM (because of unobserved distinct classes).
For mixture models, the assumption of normality is not for the data. The assumption is that there is normality within classes.
Carolin posted on Thursday, June 30, 2011 - 12:26 am
Thanks a lot for your answers! Two related questions:
1) In which cases would you recommend not to use FIML?
2) If there should be normality within classes, it could however be that there is non-normality in the data. If this would be the case, can I still use FIML (besides the assumption of multivariate normality for FIML)? Do you have any experiences, suggestions?
Carolin posted on Thursday, June 30, 2011 - 12:40 am
... concerning my second question: or did you mean that the assumption of normality of FIML is related to the classes instead of to the original data set in GMM?
I'm trying to estimate a growth model using FIML for the missings, however I get a warning message, saying that cases with missing data are excluded (see below). When I add covariates to the model, more cases are excluded. Is it possible to specify the model in such a way that all cases are included? What can be the reason that FIML excludes cases in a Growth Model (with a dependent metric variable) like this? (A search in the FAQ only generated an example of a count model, but this is another kind of model).
*** WARNING Data set contains cases with missing on all variables. These cases were not included in the analysis. Number of cases with missing on all variables: 1 *** WARNING Data set contains cases with missing on x-variables. These cases were not included in the analysis. Number of cases with missing on x-variables: 285
Missing data theory applies to dependent variables only because the model is estimated conditioned on the covariates. If you don't want to lose cases missing on covariates, you can include the variances of the covariates in the MODEL command. They will then be treated as dependent variables and distributional assumptions will be made about them. Cases with missing on all variables cannot be included because they have nothing to contribute.
Hello, I've run a growth mixture model on longitudinal data and determined the number of classes. I then tried to add auxiliary variables to test for mean differences/proportions across classes, using DCON and DCAT in version 7.11. The auxiliary variables had missing data, and the output contained the following message: WARNING: LISTWISE DELETION IS APPLIED TO THE AUXILIARY VARIABLE IN THE ANALYSIS. TO AVOID LISTWISE DELETION, DATA IMPUTATION CAN BE USED FOR THE AUXILIARY VARIABLES FOLLOWED BY ANALYSIS WITH TYPE=IMPUTATION.
So I imputed the data for the auxiliary variables and re-ran the model using TYPE=IMPUTATION. Now the output still includes the mean and S.E. for each class for each variable, but it no longer includes the test for mean differences, i.e. the chi-square and p-values.
Is there a way I can get this information with imputed data for auxiliary variables?
Further to my previous question, if it's not possible to get the DCON and DCAT tests with imputed variables, is there a way to see how many people are left in each class after listwise deletion? The output tells me the total number of observed and the total number deleted, but doesn't give me a breakdown by class. Many thanks.
The chi2 is much harder but in principle can be done the same way.
For the second question ... people donít really change classes. They are in the same class as the model without the aux variable ... but if you want to confirm that Ė tech8 contains such information (end of tech8 is the last iteration from the DCON analysis with the class allocation).
Iím planning to use LGM to model how motherís age at child birth affects her offspring sex. I know that I can accommodate the differing ages at child birth between mothers by using AT option, but Iím wondering whether missing data will be a problem here because mothers also differed in how many offspring they had during lifetime (e.g. 1-15)? In other words, does it matter assigning a missing value for offspring sex for those offspring that were never born? Can such models be fit in Mplus?
Sorry for being unclear! Is there a command for trying to find the amount of data used for a latent variable? Or does a latent variable take into account all data. For instance, if a model has 1149 observations, is the latent variable considering all 1149 observations?
I'd like to do LGM on data from 5 daily measures of employees. However, certain employees do not work every day, simply because of their working schedule.
For example, some employees work 5 days a week and some work 3. So the first group has 5 chances to show growth during a week (and their growth score "2" occurs on day 3, Wednesday) while the latter group has 3 chances to show growth and their growth score "2" occurs later on in the week, and this is already the end of their growth. In other words, the latter group will never reach a growth score of "4".
Basically I would need a growth model that models each time wave to be +1 higher than the previous non-missing wave. Can that happen in Mplus?
Yaqiong Wang posted on Tuesday, September 04, 2018 - 8:44 pm
This may be a basic question: my data have 4 time points. I wonder for those cases that are completely missing for one or two time points, should I exclude them from the analysis? Or keep them and use Mplus settings to address missing data?
Dear Dr. Muthen I am currently performing a four-wave LGM for a two-order model with categorical indicators (of 5 point Likert scale type). I use a cluster variable and the TYPE=Complex command. Even though there is a lot of missing data the model terminates normally with the following fit indices: RMSEA=0.026, CFI=.968 and TLI=.967. My first question is: Do you think I need to use a method for Imputing data or will the result be totally different if I impute data? How do I Impute data with categorical data and a cluster variable? Should I use MLR (but without the cluster variable?)
Another thing: When I am performing CFA for each wave, I am also thinking of saving Fscores for the latent variables in the model for subsequent LGM. However, in the new dat-file there are only Fscores for the observed variables. How do I save the Fscores for the latent factors built up by the observed variables?