Message/Author 

Jenny B posted on Friday, July 30, 2004  2:44 am



I am trying to use a Monte Carlo simulation to estimate the power, given the amount of missing data that I have, in a particular study. So far I have used saved parameter estimates (from my sample) as true population values. Using MODEL MISSING, the output shows the missing data pattern from the first replication only. Is it possible to get the missing data patterns from every replication saved, so that I can check that the proportions of missing data generated are similar to those in the sample? I have tried use PATMISS and PATPROBS to specify the missing data patterns/proportions but ran into difficulty specifying a pattern (and its relative proportion of the population) that is complete, i.e. has no missing data. Is this possible? Thanks very much. 


The Monte Carlo output prints only the missing data patterns from the first replication. If you want it for each replication, you would need to save the data sets and analyze each one separately. It is usually not necessary to do this as the first replication is usually enough to show that you have gotten the patterns correct. You can increase the sample size for one replication to 100,000 to see if the missing value patterns are what you expect. Regarding PATMISS and PATPROBS, you need to send the output to support@statmodel.com along with a description of what you want the missing data to look like. 

Anonymous posted on Sunday, January 09, 2005  10:24 am



With Mplus, model parameters estimated from a real data can be saved and used as the true population parameter values for Monte Carlo simulation. For Monte Carlo study on latent growth modeling with missing values and covariates, can Mplus save the missing patterns existing in the real data, then automatically use these missing patterns in data generation in Monte Carlo simulation? If we have to specify the missing patterns in MODEL MISSING command, it would be very difficult to match the specified missing patterns with the missing patterns in the real data. Any comment or suggestion will be highly appreciated. 

bmuthen posted on Sunday, January 09, 2005  12:55 pm



It is currently not possible to save the missing data patterns for use in MC. I would try to mimic the major (most frequent) patterns. Missing data is hard to simulate in a realistic way because the cause of missingness (predicted by x's, earlier y's, missing y's, latent continuous vbles, latent categorical vbles) is not known. So experimentation with likely scenarios is needed. 


Dear Muthen, I have two questions about Monte Carlo simulations with missing data: 1) I used "Montecarlo" to generate missing data and wanted to to analyze the data with "listwise deletion". Therefore, I used "ANALYSIS: TYPE=BASIC;" However, I got the following error message: *** ERROR in Montecarlo command TYPE = MISSING must be specified when using MONTECARLO to generate missing data. How can I generate missing data without using the function of "missing" in the analysis? 2) In Monte Carlo simulations with TECH9 in OUTPUT, the program prints the error message for the replications. I would like to see if these problematic replications are included in calculating the summary statistics and parameter estimates or not. If yes, how can I exclude these problematic replications in the simulations? If no, the number of "convergent" and "good" replicatons is likely smaller than the number of replicatons stated in the design. Is it possible to have a fixed number, say 1000, of "convergent" and "good" replications? Thanks in advance! 


It sounds like you want to generate data sets that contain missing data but then analyze them using listwise deletion. You cannot do this in one step. You will have to generate the data in the first step and analyze them using external Monte Carlo. See Example 11.6 and Chapter 18 of the Mplus User's Guide. To generate data with missingness, you need to include MISSING in the TYPE option. The results are for the number of replications that were completed. This number is printed in the output. It is not possible to specify the number of completed replications. You would have to keep increasing the number of requested replications until you get the number of completed replications that you want. 

Anonymous posted on Sunday, August 21, 2005  7:37 am



Dear Muthen, I would like to analyze many external "mean and covariance matrices" by MONTECARLO. I used something like below: DATA: FILE IS covlist.dat; TYPE = MONTECARLO; TYPE IS MEANS COVARIANCE; NOBSERVATIONS = 100; I got an error message saying that the input covariance matrix could not be inverted. However, it worked perfectly okay if I analyzed the covariance matrices one by one. It seems to me that it's not possible to use "TYPE =MONTECARLO" and "TYPE IS MEANS COVARIANCE" at the same time. Am I correct? Thanks! 


The external Monte Carlo facility of Mplus requires raw data. We will add a better error message. 


Hi, Linda. In your excellent and very helpful 2002 SEM Journal paper on Monte Carlo simulation of statistical power and effect size, on page 605 you provide an example of simulating a missing data process with missing at random data for a linear growth model with four indicators. You note that when the dichotomous covariate has a value of zero, the first indicator has 12% missing data, the second indicator has 18% missing, the third has 27%, and the fourth indicator has 50% missing. When the covariate is 1.00 in value, the respective amounts of missing data are 12% again for the first indicator, 38%, 50%, and 78% for the remaining indicators. The syntax for this model is shown on the top of page 616 of the appendix of the article. The missing data section is stipulated as follows: MODEL MISSING: %OVERALL% [y1@2 y2@1.5 y3@1 y4@0]; y2y4 oN x@1 ; I'm curious to learn how the values of 2, 1.5, 1, and zero map onto the proportions of missingness that you report on page 605. In other words, how does one work backwards from an expected amount of missing data per wave of measurement to obtain the proper regression weights to use in the MODEL MISSING section of the syntax? Regards and thanks, Tor 


he values are logits. To turn them into probabilities, use the formula p = 1 / (1 + exp (logit)) for y1 the logit of 2 results in the probability of 0.12. For y2y4, the logit is based on the covariate also. The logit for y2 is logit = 1.5 + bx; For x=1, logit = 1.5 + 1*1 = .5 The probability for a logit of .5 is .38. 


Thank you, Linda. I have a followup question. I am simulating power for a latent growth curve model with four observed indicators per time point and missing data at times 24. There are four time points, so there are 16 Y measures. I want to create "wave missing" data that captures a typical attrition scenario where once a subject drops out of the study she no longer contributes data to that wave or subsequent waves. That is, I want Mplus to generate MAR missing data for 10% of the sample at time 2, 15% at time 3, and 20% at time4. The catch is that the missing data generation should yield four patterns:  complete data for all 16 measures  missing data for all time 2, 3, & 4 measures (10% of the sample)  missing data for all time 3 & 4 measures (5% of the sample)  missing data for all time 4 measures (5% of the sample) So, 20% of the total sample will have some form of missing data. This is my current MODEL MISSING syntax. MODEL MISSING: %OVERALL% [y21@2.198 y31@1.735 y41@1.387 y22@2.198 y32@1.735 y42@1.387 y23@2.198 y33@1.735 y43@1.387 y24@2.198 y34@1.735 y44@1.387]; This yields measurespecific missingness that is correct within wave, but many missing data patterns are generated. Many of these are not realistic for my situation (e.g., one measure within a wave will be missing, but others will have data; our interviewerbased protocols will result in virtually complete data for any participant that is not lost to attrition). Many thanks in advance for your suggestions, Tor 


Would the Mplus options patmiss and patprob do what you want? 


Thanks, Bengt. Is it possible to use patmiss and patprob in conjunction with MODEL MISSING to generate MAR missing data patterns? Thanks, Tor 


No, you would need to use one or the other. 


Thank you, Linda. You and Bengt are both amazing to respond in this forum over the weekend. Much appreciation to you both for your prompt and helpful replies. I will continue to experiment with MODEL MISSING. In your SEM Journal paper, you converted portions of the Mplus Monte Carlo output into Cohenmetric effect sizes. Could you describe what portions of the Mplus output you used to obtain the Cohen effect sizes you reported for the two growth model scenarios (slope regression weight = .2; slope regression weight = .1)? I'm unclear on how you were able to use the Mplus results to obtain the Cohen metric effect sizes. Thank you, Tor 


I don't know what you mean by converting portions of the Mplus output to Cohen effect sizes. Perhaps you refer to page 604 in the SEM article where we say that we get different effect sizes for different regression coefficients for s regressed on the covariate. The effect size computations are described on pp. 604605. One can also compute it wrt observed outcomes. 


Thank you, Bengt. You're right. I was referring to the discussion on pp. 604605. Apologies for being dense, but I am still not seeing how one obtains the value of .63 as the estimate of the effect size from the input slope coefficient of .20. Thanks, Tor 


Just divide the reg coeff of the growth slope factor regressed on the binary x by the SD (the sqrt of the variance) of the growth slope factor. This is a Cohenlike quantity in that the reg coeff is the mean difference between the 2 x groups in the slope growth factor. 


Hi, Linda and Bengt. I would like to simulate NMAR data using Mplus. Can you point me to any example MODEL MONTECARLO syntax that illustrates how to generate NMAR data using Mplus? With best wishes and many thanks, Tor 


We don't have a specific example available, but you can start from the User's Guide ex 11.2 and modify the Model Missing part. For example, you can have missing on y4 as a function of the value of y4 that would have been observed by saying in Model Missing: y4 on y4*c; where c is a value such as 0.5. 

Shu Xu posted on Friday, January 26, 2007  12:16 am



Dear Linda and Bengt, I am wondering what is the mechanism of generate the datasets for a twopart semicontinuous growth model for contimuous outcome (Ex 11.9). Is there a technique report on the data generating method for the twopart growth model? Violxu 


See the DATATWOPART command. The method is described and a reference is given. 


Dr. Muthen, How do I generate differing missing data patterns in Mplus. I want to create several patterns of missing data to see if it affects the robustness of the imputation method. 


The options of the MONTECARLO command are described in Chapter 18 of the Mplus User's Guide. Examples 11.1 and 11.2 show how to generate missing data. 


If you only want 1 variable say X1 of a set of variables (x1x12) to have missing values and you only want 1 pattern where 10% of X1 are missing across 25% of the cases (the other 75% are not missing any), is this the correct way to specify this: PATMISS = X1(.10)  X1(.00); PATPROBS = .25  .75; 


Try it. If it does not work, contact support. 


Dear, Muthen. I have x1x3, y1y3, u1u10 variables. I would like to generate MAR missing pattern on y1y3 and MCAR on u1u10. Can I use just "PATMISS" and "PATPROBS" only? Otherwise, do I have to use "MODEL MISSING" syntex? It seems to me that it's not possible to use "MISSING" in MONTECARLO and "PATMISS" at the same time. 


I recommend using Model Missing. 

Fred posted on Wednesday, August 02, 2017  2:19 am



Dr. Muthen, I´ve tried to model missing data under a MAR mechanism with the model missing command. The model is a relative simple CFA with five indicators (5 categories scales each). The missings should only occur on y3y5 and depend on y1 (for y3 and y4) and on y2 (y5). I tried to model my equations with the formulas provided by you, but i cannot figure out why it does not work. The follwoing is my model missing command: [y3@3.94]; [y4@3.94]; [y5@3.94]; y3 ON y1@1; y4 ON y1@1; y5 ON y2@1; With that I get missings for (more or less) 10% for each of my indicators. Now is there a way to determine the correct equations so I get the desired missing quotes? Thanks for the answer. 

Fred posted on Thursday, August 03, 2017  12:25 am



Update to post above: with the following model missing command I get the desired 5% missing for the three variables each: [y3@5.38]; [y4@5.38]; [y5@5.38]; y3 ON y1@1; y4 ON y1@1; y5 ON y2@1; Now the question: Can I be sure, that this is a MAR mechanism? Thank you for the answer Fred 


Yes, that is MAR because the missing data probability depends on y1 which doesn't have missing data. 

Joao Garcez posted on Wednesday, November 15, 2017  4:59 am



Hello Linda and Bengt Muthen, I have the following question that, hopefully, you'll be able to help with: I have a dataset with N = 160 and 60% missing data (MCAR) on some of the variables, and I wanted to do a Monte Carlo for power analysis, but I am a bit confused as to whether I should mention this pattern of missing data or not, because I intend to run my analyses using multiple imputation. So even if the full data (N = 160) is used in the analysis due to MI or FIML, do I still need to mention the PATMISS and PATPTOB when doing Monte Carlo for power analysis? Thank you for whatever help you can provide, Joao 


Yes, you should specify the pattern of missingness in your simulation. 

Joao Garcez posted on Sunday, November 19, 2017  4:30 am



Dear Dr. Muthen, Thank you for you help and prompt answer. Best, Joao. 

Back to top 