Monte Carlo simulations with missing ... PreviousNext
Mplus Discussion > Missing Data Modeling >
 Jenny B posted on Friday, July 30, 2004 - 2:44 am
I am trying to use a Monte Carlo simulation to estimate the power, given the amount of missing data that I have, in a particular study.

So far I have used saved parameter estimates (from my sample) as true population values. Using MODEL MISSING, the output shows the missing data pattern from the first replication only. Is it possible to get the missing data patterns from every replication saved, so that I can check that the proportions of missing data generated are similar to those in the sample?

I have tried use PATMISS and PATPROBS to specify the missing data patterns/proportions but ran into difficulty specifying a pattern (and its relative proportion of the population) that is complete, i.e. has no missing data. Is this possible?

Thanks very much.
 Linda K. Muthen posted on Friday, July 30, 2004 - 8:08 am
The Monte Carlo output prints only the missing data patterns from the first replication. If you want it for each replication, you would need to save the data sets and analyze each one separately. It is usually not necessary to do this as the first replication is usually enough to show that you have gotten the patterns correct. You can increase the sample size for one replication to 100,000 to see if the missing value patterns are what you expect.

Regarding PATMISS and PATPROBS, you need to send the output to along with a description of what you want the missing data to look like.
 Anonymous posted on Sunday, January 09, 2005 - 10:24 am
With Mplus, model parameters estimated from a real data can be saved and used as the true population parameter values for Monte Carlo simulation. For Monte Carlo study on latent growth modeling with missing values and covariates, can Mplus save the missing patterns existing in the real data, then automatically use these missing patterns in data generation in Monte Carlo simulation?
If we have to specify the missing patterns in MODEL MISSING command, it would be very difficult to match the specified missing patterns with the missing patterns in the real data. Any comment or suggestion will be highly appreciated.
 bmuthen posted on Sunday, January 09, 2005 - 12:55 pm
It is currently not possible to save the missing data patterns for use in MC. I would try to mimic the major (most frequent) patterns. Missing data is hard to simulate in a realistic way because the cause of missingness (predicted by x's, earlier y's, missing y's, latent continuous vbles, latent categorical vbles) is not known. So experimentation with likely scenarios is needed.
 Mike Cheung posted on Sunday, May 15, 2005 - 8:11 pm
Dear Muthen,

I have two questions about Monte Carlo simulations with missing data:

1) I used "Montecarlo" to generate missing data and wanted to to analyze the data with "listwise deletion". Therefore, I used "ANALYSIS: TYPE=BASIC;" However, I got the following error message:

*** ERROR in Montecarlo command
TYPE = MISSING must be specified when using MONTECARLO to generate missing data.

How can I generate missing data without using the function of "missing" in the analysis?

2) In Monte Carlo simulations with TECH9 in OUTPUT, the program prints the error message for the replications. I would like to see if these problematic replications are included in calculating the summary statistics and parameter estimates or not.

If yes, how can I exclude these problematic replications in the simulations?

If no, the number of "convergent" and "good" replicatons is likely smaller than the number of replicatons stated in the design. Is it possible to have a fixed number, say 1000, of "convergent" and "good" replications?

Thanks in advance!
 Linda K. Muthen posted on Monday, May 16, 2005 - 8:27 am
It sounds like you want to generate data sets that contain missing data but then analyze them using listwise deletion. You cannot do this in one step. You will have to generate the data in the first step and analyze them using external Monte Carlo. See Example 11.6 and Chapter 18 of the Mplus User's Guide. To generate data with missingness, you need to include MISSING in the TYPE option.

The results are for the number of replications that were completed. This number is printed in the output. It is not possible to specify the number of completed replications. You would have to keep increasing the number of requested replications until you get the number of completed replications that you want.
 Anonymous posted on Sunday, August 21, 2005 - 7:37 am
Dear Muthen,

I would like to analyze many external "mean and covariance matrices" by MONTECARLO. I used something like below:

DATA: FILE IS covlist.dat;

I got an error message saying that the input covariance matrix could not be inverted. However, it worked perfectly okay if I analyzed the covariance matrices one by one.

It seems to me that it's not possible to use "TYPE =MONTECARLO" and "TYPE IS MEANS COVARIANCE" at the same time. Am I correct? Thanks!
 Linda K. Muthen posted on Monday, August 22, 2005 - 8:27 am
The external Monte Carlo facility of Mplus requires raw data. We will add a better error message.
 Tor Neilands posted on Thursday, March 23, 2006 - 11:22 am
Hi, Linda.

In your excellent and very helpful 2002 SEM Journal paper on Monte Carlo simulation of statistical power and effect size, on page 605 you provide an example of simulating a missing data process with missing at random data for a linear growth model with four indicators.

You note that when the dichotomous covariate has a value of zero, the first indicator has 12% missing data, the second indicator has 18% missing, the third has 27%, and the fourth indicator has 50% missing. When the covariate is 1.00 in value, the respective amounts of missing data are 12% again for the first indicator, 38%, 50%, and 78% for the remaining indicators.

The syntax for this model is shown on the top of page 616 of the appendix of the article. The missing data section is stipulated as follows:

[y1@-2 y2@-1.5 y3@-1 y4@0];
y2-y4 oN x@1 ;

I'm curious to learn how the values of -2, -1.5, -1, and zero map onto the proportions of missingness that you report on page 605. In other words, how does one work backwards from an expected amount of missing data per wave of measurement to obtain the proper regression weights to use in the MODEL MISSING section of the syntax?

Regards and thanks,

 Linda K. Muthen posted on Thursday, March 23, 2006 - 4:06 pm
he values are logits. To turn them into probabilities, use the formula

p = 1 / (1 + exp (-logit))

for y1 the logit of -2 results in the probability of 0.12.

For y2-y4, the logit is based on the covariate also. The logit for y2 is

logit = -1.5 + bx;

For x=1,

logit = -1.5 + 1*1 = -.5

The probability for a logit of -.5 is .38.
 Tor Neilands posted on Saturday, March 25, 2006 - 9:10 pm
Thank you, Linda.

I have a follow-up question. I am simulating power for a latent growth curve model with four observed indicators per time point and missing data at times 2-4. There are four time points, so there are 16 Y measures.

I want to create "wave missing" data that captures a typical attrition scenario where once a subject drops out of the study she no longer contributes data to that wave or subsequent waves. That is, I want Mplus to generate MAR missing data for 10% of the sample at time 2, 15% at time 3, and 20% at time4. The catch is that the missing data generation should yield four patterns:

- complete data for all 16 measures
- missing data for all time 2, 3, & 4 measures (10% of the sample)
- missing data for all time 3 & 4 measures (5% of the sample)
- missing data for all time 4 measures (5% of the sample)

So, 20% of the total sample will have some form of missing data.

This is my current MODEL MISSING syntax.

[y21@-2.198 y31@-1.735 y41@-1.387
y22@-2.198 y32@-1.735 y42@-1.387
y23@-2.198 y33@-1.735 y43@-1.387
y24@-2.198 y34@-1.735 y44@-1.387];

This yields measure-specific missingness that is correct within wave, but many missing data patterns are generated. Many of these are not realistic for my situation (e.g., one measure within a wave will be missing, but others will have data; our interviewer-based protocols will result in virtually complete data for any participant that is not lost to attrition).

Many thanks in advance for your suggestions,

 Bengt O. Muthen posted on Sunday, March 26, 2006 - 3:46 pm
Would the Mplus options patmiss and patprob do what you want?
 Tor Neilands posted on Sunday, March 26, 2006 - 5:40 pm
Thanks, Bengt.

Is it possible to use patmiss and patprob in conjunction with MODEL MISSING to generate MAR missing data patterns?


 Linda K. Muthen posted on Sunday, March 26, 2006 - 5:54 pm
No, you would need to use one or the other.
 Tor Neilands posted on Sunday, March 26, 2006 - 10:03 pm
Thank you, Linda.

You and Bengt are both amazing to respond in this forum over the weekend. Much appreciation to you both for your prompt and helpful replies.

I will continue to experiment with MODEL MISSING.

In your SEM Journal paper, you converted portions of the Mplus Monte Carlo output into Cohen-metric effect sizes. Could you describe what portions of the Mplus output you used to obtain the Cohen effect sizes you reported for the two growth model scenarios (slope regression weight = .2; slope regression weight = .1)? I'm unclear on how you were able to use the Mplus results to obtain the Cohen metric effect sizes.

Thank you,

 Bengt O. Muthen posted on Monday, March 27, 2006 - 6:35 am
I don't know what you mean by converting portions of the Mplus output to Cohen effect sizes. Perhaps you refer to page 604 in the SEM article where we say that we get different effect sizes for different regression coefficients for s regressed on the covariate. The effect size computations are described on pp. 604-605. One can also compute it wrt observed outcomes.
 Tor Neilands posted on Monday, March 27, 2006 - 11:49 am
Thank you, Bengt.

You're right. I was referring to the discussion on pp. 604-605.

Apologies for being dense, but I am still not seeing how one obtains the value of .63 as the estimate of the effect size from the input slope coefficient of .20.


 Bengt O. Muthen posted on Monday, March 27, 2006 - 3:14 pm
Just divide the reg coeff of the growth slope factor regressed on the binary x by the SD (the sqrt of the variance) of the growth slope factor. This is a Cohen-like quantity in that the reg coeff is the mean difference between the 2 x groups in the slope growth factor.
 Tor Neilands posted on Sunday, April 16, 2006 - 7:20 pm
Hi, Linda and Bengt.

I would like to simulate NMAR data using Mplus. Can you point me to any example MODEL MONTECARLO syntax that illustrates how to generate NMAR data using Mplus?

With best wishes and many thanks,

 Bengt O. Muthen posted on Monday, April 17, 2006 - 8:30 am
We don't have a specific example available, but you can start from the User's Guide ex 11.2 and modify the Model Missing part. For example, you can have missing on y4 as a function of the value of y4 that would have been observed by saying in Model Missing:

y4 on y4*c;

where c is a value such as 0.5.
 Shu Xu posted on Friday, January 26, 2007 - 12:16 am
Dear Linda and Bengt,

I am wondering what is the mechanism of generate the datasets for a two-part semicontinuous growth model for contimuous outcome (Ex 11.9). Is there a technique report on the data generating method for the two-part growth model?

 Linda K. Muthen posted on Friday, January 26, 2007 - 6:46 am
See the DATATWOPART command. The method is described and a reference is given.
 Patricienn Kaponson Moreno posted on Thursday, June 28, 2007 - 8:22 am
Dr. Muthen,

How do I generate differing missing data patterns in Mplus. I want to create several patterns of missing data to see if it affects the robustness of the imputation method.
 Linda K. Muthen posted on Thursday, June 28, 2007 - 8:37 am
The options of the MONTECARLO command are described in Chapter 18 of the Mplus User's Guide. Examples 11.1 and 11.2 show how to generate missing data.
 Scott R. Colwell posted on Tuesday, October 27, 2009 - 2:32 pm
If you only want 1 variable say X1 of a set of variables (x1-x12) to have missing values and you only want 1 pattern where 10% of X1 are missing across 25% of the cases (the other 75% are not missing any), is this the correct way to specify this:

PATMISS = X1(.10) | X1(.00);
PATPROBS = .25 | .75;
 Linda K. Muthen posted on Tuesday, October 27, 2009 - 4:04 pm
Try it. If it does not work, contact support.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message