I am trying to use a Monte Carlo simulation to estimate the power, given the amount of missing data that I have, in a particular study.
So far I have used saved parameter estimates (from my sample) as true population values. Using MODEL MISSING, the output shows the missing data pattern from the first replication only. Is it possible to get the missing data patterns from every replication saved, so that I can check that the proportions of missing data generated are similar to those in the sample?
I have tried use PATMISS and PATPROBS to specify the missing data patterns/proportions but ran into difficulty specifying a pattern (and its relative proportion of the population) that is complete, i.e. has no missing data. Is this possible?
The Monte Carlo output prints only the missing data patterns from the first replication. If you want it for each replication, you would need to save the data sets and analyze each one separately. It is usually not necessary to do this as the first replication is usually enough to show that you have gotten the patterns correct. You can increase the sample size for one replication to 100,000 to see if the missing value patterns are what you expect.
Regarding PATMISS and PATPROBS, you need to send the output to email@example.com along with a description of what you want the missing data to look like.
Anonymous posted on Sunday, January 09, 2005 - 10:24 am
With Mplus, model parameters estimated from a real data can be saved and used as the true population parameter values for Monte Carlo simulation. For Monte Carlo study on latent growth modeling with missing values and covariates, can Mplus save the missing patterns existing in the real data, then automatically use these missing patterns in data generation in Monte Carlo simulation? If we have to specify the missing patterns in MODEL MISSING command, it would be very difficult to match the specified missing patterns with the missing patterns in the real data. Any comment or suggestion will be highly appreciated.
bmuthen posted on Sunday, January 09, 2005 - 12:55 pm
It is currently not possible to save the missing data patterns for use in MC. I would try to mimic the major (most frequent) patterns. Missing data is hard to simulate in a realistic way because the cause of missingness (predicted by x's, earlier y's, missing y's, latent continuous vbles, latent categorical vbles) is not known. So experimentation with likely scenarios is needed.
I have two questions about Monte Carlo simulations with missing data:
1) I used "Montecarlo" to generate missing data and wanted to to analyze the data with "listwise deletion". Therefore, I used "ANALYSIS: TYPE=BASIC;" However, I got the following error message:
*** ERROR in Montecarlo command TYPE = MISSING must be specified when using MONTECARLO to generate missing data.
How can I generate missing data without using the function of "missing" in the analysis?
2) In Monte Carlo simulations with TECH9 in OUTPUT, the program prints the error message for the replications. I would like to see if these problematic replications are included in calculating the summary statistics and parameter estimates or not.
If yes, how can I exclude these problematic replications in the simulations?
If no, the number of "convergent" and "good" replicatons is likely smaller than the number of replicatons stated in the design. Is it possible to have a fixed number, say 1000, of "convergent" and "good" replications?
It sounds like you want to generate data sets that contain missing data but then analyze them using listwise deletion. You cannot do this in one step. You will have to generate the data in the first step and analyze them using external Monte Carlo. See Example 11.6 and Chapter 18 of the Mplus User's Guide. To generate data with missingness, you need to include MISSING in the TYPE option.
The results are for the number of replications that were completed. This number is printed in the output. It is not possible to specify the number of completed replications. You would have to keep increasing the number of requested replications until you get the number of completed replications that you want.
Anonymous posted on Sunday, August 21, 2005 - 7:37 am
I would like to analyze many external "mean and covariance matrices" by MONTECARLO. I used something like below:
DATA: FILE IS covlist.dat; TYPE = MONTECARLO; TYPE IS MEANS COVARIANCE; NOBSERVATIONS = 100;
I got an error message saying that the input covariance matrix could not be inverted. However, it worked perfectly okay if I analyzed the covariance matrices one by one.
It seems to me that it's not possible to use "TYPE =MONTECARLO" and "TYPE IS MEANS COVARIANCE" at the same time. Am I correct? Thanks!
In your excellent and very helpful 2002 SEM Journal paper on Monte Carlo simulation of statistical power and effect size, on page 605 you provide an example of simulating a missing data process with missing at random data for a linear growth model with four indicators.
You note that when the dichotomous covariate has a value of zero, the first indicator has 12% missing data, the second indicator has 18% missing, the third has 27%, and the fourth indicator has 50% missing. When the covariate is 1.00 in value, the respective amounts of missing data are 12% again for the first indicator, 38%, 50%, and 78% for the remaining indicators.
The syntax for this model is shown on the top of page 616 of the appendix of the article. The missing data section is stipulated as follows:
I'm curious to learn how the values of -2, -1.5, -1, and zero map onto the proportions of missingness that you report on page 605. In other words, how does one work backwards from an expected amount of missing data per wave of measurement to obtain the proper regression weights to use in the MODEL MISSING section of the syntax?
I have a follow-up question. I am simulating power for a latent growth curve model with four observed indicators per time point and missing data at times 2-4. There are four time points, so there are 16 Y measures.
I want to create "wave missing" data that captures a typical attrition scenario where once a subject drops out of the study she no longer contributes data to that wave or subsequent waves. That is, I want Mplus to generate MAR missing data for 10% of the sample at time 2, 15% at time 3, and 20% at time4. The catch is that the missing data generation should yield four patterns:
- complete data for all 16 measures - missing data for all time 2, 3, & 4 measures (10% of the sample) - missing data for all time 3 & 4 measures (5% of the sample) - missing data for all time 4 measures (5% of the sample)
So, 20% of the total sample will have some form of missing data.
This is my current MODEL MISSING syntax. MODEL MISSING:
This yields measure-specific missingness that is correct within wave, but many missing data patterns are generated. Many of these are not realistic for my situation (e.g., one measure within a wave will be missing, but others will have data; our interviewer-based protocols will result in virtually complete data for any participant that is not lost to attrition).
You and Bengt are both amazing to respond in this forum over the weekend. Much appreciation to you both for your prompt and helpful replies.
I will continue to experiment with MODEL MISSING.
In your SEM Journal paper, you converted portions of the Mplus Monte Carlo output into Cohen-metric effect sizes. Could you describe what portions of the Mplus output you used to obtain the Cohen effect sizes you reported for the two growth model scenarios (slope regression weight = .2; slope regression weight = .1)? I'm unclear on how you were able to use the Mplus results to obtain the Cohen metric effect sizes.
I don't know what you mean by converting portions of the Mplus output to Cohen effect sizes. Perhaps you refer to page 604 in the SEM article where we say that we get different effect sizes for different regression coefficients for s regressed on the covariate. The effect size computations are described on pp. 604-605. One can also compute it wrt observed outcomes.
Just divide the reg coeff of the growth slope factor regressed on the binary x by the SD (the sqrt of the variance) of the growth slope factor. This is a Cohen-like quantity in that the reg coeff is the mean difference between the 2 x groups in the slope growth factor.
We don't have a specific example available, but you can start from the User's Guide ex 11.2 and modify the Model Missing part. For example, you can have missing on y4 as a function of the value of y4 that would have been observed by saying in Model Missing:
y4 on y4*c;
where c is a value such as 0.5.
Shu Xu posted on Friday, January 26, 2007 - 12:16 am
Dear Linda and Bengt,
I am wondering what is the mechanism of generate the datasets for a two-part semicontinuous growth model for contimuous outcome (Ex 11.9). Is there a technique report on the data generating method for the two-part growth model?
If you only want 1 variable say X1 of a set of variables (x1-x12) to have missing values and you only want 1 pattern where 10% of X1 are missing across 25% of the cases (the other 75% are not missing any), is this the correct way to specify this:
Yes, that is MAR because the missing data probability depends on y1 which doesn't have missing data.
Joao Garcez posted on Wednesday, November 15, 2017 - 4:59 am
Hello Linda and Bengt Muthen,
I have the following question that, hopefully, you'll be able to help with:
I have a dataset with N = 160 and 60% missing data (MCAR) on some of the variables, and I wanted to do a Monte Carlo for power analysis, but I am a bit confused as to whether I should mention this pattern of missing data or not, because I intend to run my analyses using multiple imputation. So even if the full data (N = 160) is used in the analysis due to MI or FIML, do I still need to mention the PATMISS and PATPTOB when doing Monte Carlo for power analysis?