Mplus Discussion >> Missing data

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Missing data

Mplus Discussion > Growth Modeling of Longitudinal Data >

Message/Author

Daniel Rodriguez posted on Thursday, April 03, 2003 - 5:33 am

Hi,
I have a question about your missing data modeling analysis. If I run an LGM with missing data, am I able to test for MAR? In other words, how can I test whether my data is Missing at random versus missing not at random? Would failure to reject the null hypothesis mean my data is at minimum MAR? Is your Missing data analysis option similar to what I read in Duncan and Duncan (1995: "Modeling the process of development via latent variable growth curve methodlolgy."

Linda K. Muthen posted on Thursday, April 03, 2003 - 6:48 am

There is no statistical test for whether missingness is MAR. See the book by Rubin and Little for a discussion of the missingdata that is used in Mplus.

Daniel Rodriguez posted on Thursday, April 03, 2003 - 7:28 am

Thanks. Also, are you planning on designing routines that would allow modeling with missing data with categorical observed variables?

Linda K. Muthen posted on Thursday, April 03, 2003 - 8:49 am

Yes, this will be available in Version 3.

Daniel Rodriguez posted on Thursday, April 03, 2003 - 9:00 am

Thanks so much! You and your husband are very good to the mplus users!

wendy posted on Wednesday, July 19, 2006 - 6:55 pm

Hi, Dr. Muthen
I am currently conducting a research in terms of growth modeling with one covariate predicting both latent intercept and slope. Therefore, exogenous variable Z predicts I and S. I fit 2 models with a growth modeling and an equivalent model with special pattern of missing data; I would like to compare the parameter estimations, bias, and power of those two models. One unusual characteristic is that the power detecting S regressed on Z of full growth model is less than that of model with missing data and the direct effect of S on Z is insignificant. I suppose direct effects will be less powerful in model with missing data,is it right? Thank you.

Bengt O. Muthen posted on Thursday, July 20, 2006 - 11:21 am

When you say "pattern of missing data" I get the association of the "pattern-mixture" approach to non-ignorable missing data analysis. That is, using dummy covariates to represent missing data patterns so that model parameter values can vary for different missing data patterns. If that is the correct association, I would think that differences in results are due to MAR not holding but that the non-ignorable missing data approach is needed. That doesn't seem to be a matter of power alone but also bias.

S. Jeanne Horst posted on Saturday, February 20, 2010 - 2:09 pm

I am modeling student examinee effort scores across seven time points (sample size = 1434). I have missing data (MAR) and am using full information maximum likelihood. My univariate skew and kurtosis are normal; however, my Mardia's is large, suggesting multivariate non-normality (but could be due to large sample size). The MLM estimator appears to require listwise deletion (which makes sense, given that FIML assumes multivariate normality, correct?). In this instance, do I ignore the Mardia's and use MLR?

Thank you for any advice or recommendations.

Sincerely,
Jeanne

Linda K. Muthen posted on Saturday, February 20, 2010 - 2:26 pm

MLR is robust to non-normality so I don't see that there is a problem.

Oliver Arranz-Becker posted on Thursday, June 23, 2011 - 12:40 am

We are running an LGM on unbalanced panel data spanning the years 1984 to 2009. Since we are interested in trajectories across the whole life course, we rearranged the data so that there are age-specific indicators for the dependent var, from ages 18 to 65 (wide format).
The problem is that, obviously, no respondent is observed across the whole life course (we only have 12% valid measurements on average); hence, Mplus gives an error that the minimum covariance coverage was not fulfilled (even after setting COVERAGE=.01), and the model does not converge.
Does that mean that it is not possible to run the LGM in wide data format so we have to switch to a multilevel framework?

Thanks in advance,
Oliver

Linda K. Muthen posted on Thursday, June 23, 2011 - 11:36 am

You might consider the multiple group multiple cohort approach shown in Example 6.18.

Carolin posted on Wednesday, June 29, 2011 - 5:00 am

Dear Mr. and Mrs. Muthen,

I have some questions concerning FIML with GMM. After reading many papers I conclude that it is better to use FIML instead of LD (I hope I am right?!).

First question: in GMM the default is FIML, right?

Second: Is it ok to use FIML even if I have non-normal data (I read that FIML has the assumption of multivaraite normality)? I actually thought that non-normality is often assumed in GMM (because of unobserved distinct classes).

Thanks a lot!
Carolin

Linda K. Muthen posted on Wednesday, June 29, 2011 - 10:20 am

In most cases you are better off using FIML.

Yes, it is the default.

For mixture models, the assumption of normality is not for the data. The assumption is that there is normality within classes.

Carolin posted on Thursday, June 30, 2011 - 12:26 am

Thanks a lot for your answers! Two related questions:

1) In which cases would you recommend not to use FIML?

2) If there should be normality within classes, it could however be that there is non-normality in the data. If this would be the case, can I still use FIML (besides the assumption of multivariate normality for FIML)? Do you have any experiences, suggestions?

Thanks!

Carolin posted on Thursday, June 30, 2011 - 12:40 am

... concerning my second question: or did you mean that the assumption of normality of FIML is related to the classes instead of to the original data set in GMM?

Linda K. Muthen posted on Thursday, June 30, 2011 - 10:27 am

I would always use FIML because most of the time it is the best and it is not possible to know the infrequent times when it is not the best. The odds are in your favor.

The assumption of normality does not apply to the data overall.

Carolin posted on Friday, July 01, 2011 - 12:42 am

Thanks a lot, you are really helpful!

What do you mean with your last statement? I'm not sure! Do you mean I can relax the assumption in GMM?

Linda K. Muthen posted on Friday, July 01, 2011 - 8:22 am

There is no normality assumption made about the observed data. The model assumes normality within classes which implies overall non-normality.

Marianne Schuepbach posted on Thursday, August 09, 2012 - 8:03 am

Dear Mr and Mrs Muthen

I'm trying to estimate a growth model using FIML for the missings, however I get a warning message, saying that cases with missing data are excluded (see below). When I add covariates to the model, more cases are excluded. Is it possible to specify the model in such a way that all cases are included? What can be the reason that FIML excludes cases in a Growth Model (with a dependent metric variable) like this? (A search in the FAQ only generated an example of a count model, but this is another kind of model).

*** WARNING
Data set contains cases with missing on all variables.
These cases were not included in the analysis.
Number of cases with missing on all variables: 1
*** WARNING
Data set contains cases with missing on x-variables.
These cases were not included in the analysis.
Number of cases with missing on x-variables: 285

Linda K. Muthen posted on Thursday, August 09, 2012 - 2:07 pm

Missing data theory applies to dependent variables only because the model is estimated conditioned on the covariates. If you don't want to lose cases missing on covariates, you can include the variances of the covariates in the MODEL command. They will then be treated as dependent variables and distributional assumptions will be made about them. Cases with missing on all variables cannot be included because they have nothing to contribute.

Rachel Ellis posted on Sunday, August 25, 2013 - 7:15 pm

Hello, I've run a growth mixture model on longitudinal data and determined the number of classes. I then tried to add auxiliary variables to test for mean differences/proportions across classes, using DCON and DCAT in version 7.11. The auxiliary variables had missing data, and the output contained the following message:
WARNING: LISTWISE DELETION IS APPLIED TO THE AUXILIARY VARIABLE IN THE ANALYSIS. TO AVOID LISTWISE DELETION, DATA IMPUTATION CAN BE USED FOR THE AUXILIARY VARIABLES FOLLOWED BY ANALYSIS WITH TYPE=IMPUTATION.

So I imputed the data for the auxiliary variables and re-ran the model using TYPE=IMPUTATION. Now the output still includes the mean and S.E. for each class for each variable, but it no longer includes the test for mean differences, i.e. the chi-square and p-values.

Is there a way I can get this information with imputed data for auxiliary variables?

Rachel Ellis posted on Sunday, August 25, 2013 - 8:34 pm

Further to my previous question, if it's not possible to get the DCON and DCAT tests with imputed variables, is there a way to see how many people are left in each class after listwise deletion? The output tells me the total number of observed and the total number deleted, but doesn't give me a breakdown by class.
Many thanks.

Tihomir Asparouhov posted on Monday, August 26, 2013 - 1:47 pm

Rachel

The current version of Mplus doesn't compute these.

The mean differences you can obtain by hand if you run all the imputed data sets one at a time and combine the results as in
http://sites.stat.psu.edu/~jls/mifaq.html#howto

The chi2 is much harder but in principle can be done the same way.

For the second question ... people don�t really change classes. They are in the same class as the model without the aux variable ... but if you want to confirm that � tech8 contains such information (end of tech8 is the last iteration from the DCON analysis with the class allocation).

Tihomir

Samuli Helle posted on Monday, October 07, 2013 - 5:25 am

Hello,

I�m planning to use LGM to model how mother�s age at child birth affects her offspring sex. I know that I can accommodate the differing ages at child birth between mothers by using AT option, but I�m wondering whether missing data will be a problem here because mothers also differed in how many offspring they had during lifetime (e.g. 1-15)? In other words, does it matter assigning a missing value for offspring sex for those offspring that were never born? Can such models be fit in Mplus?

Thanks a lot in advance!

Best,
Samuli

Linda K. Muthen posted on Monday, October 07, 2013 - 11:21 am

I would put the data in long format to deal with the missing values. Then cluster size would vary depending on the number of children. See Example 9.16.

Leo Young posted on Wednesday, June 17, 2015 - 4:27 am

Hi, Linda.
I am running LGM in Mplus 7.2

Model: i s | s1@0 s2@1 s3@2;
i s ON gender;
s1 ON income1;
s2 ON income2;
s3 ON income3;

but my covariates income have many missing values because some cases died in the next waves.so many cases dropped out when modeling.

you mentioned that " If you don't want to lose cases missing on covariates, you can include the variances of the covariates in the MODEL command." in the previous answers.

My question is how to write this command in the Model?

Thank you very much.

Bengt O. Muthen posted on Wednesday, June 17, 2015 - 6:10 pm

Don't use the missing data flag for income. Because your outcome is missing when income is missing, there is no effect of the income on the outcome for the drop-out occasions.

If you really want to include the covariates, you simply mention their means or variances.

Hillary Gorin posted on Wednesday, August 31, 2016 - 10:48 am

Hi Dr. Muthen,

Is there a way to find the amount of data present for latent variables and for predictors in a growth curve model?

Thanks!
Hillary

Linda K. Muthen posted on Wednesday, August 31, 2016 - 11:20 am

For observed variables, use the PATTERNS option of the OUTPUT command.

Hillary Gorin posted on Wednesday, August 31, 2016 - 12:35 pm

Thank you for your response Dr. Muthen! Any suggestions for latent variables?

Hillary

Linda K. Muthen posted on Wednesday, August 31, 2016 - 2:16 pm

I don't know what you mean by that.

Hillary Gorin posted on Thursday, September 01, 2016 - 9:17 am

Sorry for being unclear! Is there a command for trying to find the amount of data used for a latent variable? Or does a latent variable take into account all data. For instance, if a model has 1149 observations, is the latent variable considering all 1149 observations?

Linda K. Muthen posted on Thursday, September 01, 2016 - 9:21 am

The default in Mplus is to use all available information to estimate the model.

Paraskevas Petrou posted on Thursday, March 01, 2018 - 12:30 am

Dear Mplus users,

I'd like to do LGM on data from 5 daily measures of employees. However, certain employees do not work every day, simply because of their working schedule.

For example, some employees work 5 days a week and some work 3.
So the first group has 5 chances to show growth during a week (and their growth score "2" occurs on day 3, Wednesday) while the latter group has 3 chances to show growth and their growth score "2" occurs later on in the week, and this is already the end of their growth. In other words, the latter group will never reach a growth score of "4".

Basically I would need a growth model that models each time wave to be +1 higher than the previous non-missing wave.
Can that happen in Mplus?

Thank you!
Paris

Bengt O. Muthen posted on Thursday, March 01, 2018 - 6:30 pm

Try using a 2-level growth model where you represent the individually-varying times of observation with a level1 time variable. The UG has an example of such a 2-level growth model.

Nour Azhari posted on Sunday, March 11, 2018 - 6:10 pm

Hi,

Regarding missing data. Does choosing to do it using non-ignorable missing data instead of the default (MAR) require more power to estimate the model? Or it doesn't change that?

Thanks

Bengt O. Muthen posted on Monday, March 12, 2018 - 3:40 pm

It may in that such models are more complex.

Yaqiong Wang posted on Tuesday, September 04, 2018 - 8:44 pm

This may be a basic question: my data have 4 time points. I wonder for those cases that are completely missing for one or two time points, should I exclude them from the analysis? Or keep them and use Mplus settings to address missing data?

Linda K. Muthen posted on Thursday, September 06, 2018 - 1:49 pm

You should keep people that have some non-missing data. Only people missing at all four time points should be eliminated.

Daniel Olsson posted on Wednesday, December 18, 2019 - 12:22 am

Dear Dr. Muthen
I am currently performing a four-wave LGM for a two-order model with categorical indicators (of 5 point Likert scale type). I use a cluster variable and the TYPE=Complex command. Even though there is a lot of missing data the model terminates normally with the following fit indices:
RMSEA=0.026, CFI=.968 and TLI=.967. My first question is:
Do you think I need to use a method for Imputing data or will the result be totally different if I impute data?
How do I Impute data with categorical data and a cluster variable? Should I use MLR (but without the cluster variable?)

Another thing: When I am performing CFA for each wave, I am also thinking of saving Fscores for the latent variables in the model for subsequent LGM. However, in the new dat-file there are only Fscores for the observed variables. How do I save the Fscores for the latent factors built up by the observed variables?

Bengt O. Muthen posted on Wednesday, December 18, 2019 - 1:31 pm

Multiple imputation usually gives similar results to ML so I would not bother with that unless ML is demanding computationally. Chapter 11 of the UG shows multilevel imputation with categorical vbles.

Fscores are given only for latent variables. You can send your output and resulting file to support along with your license number.