Message/Author 

Anonymous posted on Wednesday, February 28, 2001  7:42 pm



I have two question. First,in monte carlo output,what is the meaning of 95%Cover Second,if I select replication 1000,the output only find the average and std.Dev. Can I get the 1000 estimator? 


95% coverage is the proportion of replications for which the 95% confidence intervale contains the true parameter value. If you are using mixture Monte Carlo, you can save the results from each replication. If you are doing regular Monte Carlo, you cannot. If you generate data outside of Mplus, you can save all of the results. A Monte Carlo utility is available on the website for this purpose. 

Anonymous posted on Sunday, March 04, 2001  11:13 pm



My simulations were made for testing general SEM (continuous data) and WLSMV (ordinal data) models. Can I get the simulated dataset for each replication ? In a mplus Monte Carlo input file, mplus only needs the true mean and covariance matrices. How can it (Mplus) know what are the true parameter values of a specific parameter structure of a SEM model? 


No, you can save data from only the first replication in Mplus MONTECARLO. If, however, you generate data outside of Mplus, we provide a Monte Carlo utility on the Mplus website to aid in running the Monte Carlo simulation and saving results for further analysis. For the estimator coverage statistics to be correct, Mplus picks up the true population values of the parameters from the starting values of each parameter in the MODEL command. 


I am trying to follow the procedures listed in Muthen & Muthen, 2002, concerning monte carlo for power estimation. In that paper (p. 603), you desribe 3 steps for generating nonnormal data 1  generate data for 2 classes; 2  run analysis w/ 1 replication; 3  solve resid variances & use these as pop values for data generation. Can you clarify how many files are needed to accomplish this? For example, do I create data using 1 input file (using save command), analyze that saved data using a second input file (with replication = 1), and then use results of 2nd input to establish correct population values (for the aggregate sample) that are used as start values in the 'MODEL' portion of the first input file? If this is correct, I'm assuming that only the last of these 3 files was listed in the appendix of that paper (and is avail for download on your website)? Thanks for any clarification you might provide. 

bmuthen posted on Friday, January 07, 2005  5:53 pm



The 3 steps listed are used as initial steps based on which the Monte Carlo input file (for many replications) shown in the paper is done. The 3 initial steps can be done via a single input file that looks much like the one in the paper, but uses the large sample mentioned and only 1 replication. You save the data and check the skewness and kurtosis, etc. 


Hi, I am running a MonteCarlo analysis on a SEM. I am interested in testing the effects of construct reliability, Rsquared and sample size on the structural coefficients (2 parameters). I have finished my runs with the various test condistions and stored my results. However to run my ANOVAS I just need the values of only the two parameters I am interested in from my montecarlo results file. Is there a way I can specify the results command so that only these two parameters are stored? Girish http://www.personal.psu.edu/users/g/z/gzm108/ 


No, you can't do this. 


Appreciate the clarification. 


Hi, I am running Monte Carlo analysis on a SEM. I am interested in testing the effects of construct reliability, Rsquared and sample size on the structural coefficients (2 parameters). I am fixing Rsquared of my regression equations by fixing the residual variance of the dependent variable. i.e., If my dependent construct say is f1, is it alright to specify f1@0.3 in my model population, to fix the rsquared of the equation to 0.7? Thanks in advance. 


In MODEL POPULATION, by specifying f1@0.3 or f1*0.3, data will be generated with a residual variance of 0.3. So this is correct for an Rsquare of 0.7. 


I'm trying to run a Monte Carlo study to generate data for use in replicating my dissertation model which was underpowered. I've run into a glitch in that my model has a grouping variable (Clinic Yes/No) and I haven't been able to figure out the syntax for generating this variable under the Monte Carlo study. Any help you can give me is greatly appreciated! Kim 


If you mean a grouping variable as in multiple group analysis, see the Monte Carlo counterparts of Examples 5.155.18. If you mean a grouping variable as in a covariate, see Example 11.1. 


I'll try that  thanks so much! 


I now have another issue. I have saved the parameter estimates from a previous analysis to use as the population values in the MC simulation. However, I'm getting an error message that there's insufficient data in the POPULATION file. What would cause this error? Thanks! 


It may be that you changed something in the ANALYSIS command between Step 1 and Step 2 of the analysis. If this doesn't help, send all relevant files and your license number to support@statmodel.com. 


My apologies if this is a stupid question... This afternoon I was working with an Mplus MC input and found the following issue quite puzzling. I always thought that the MODEL MONTECARLO statement is the one used to specify the true population values for a model, while the MODEL option is used to specify the model to be estimated in the 500 or so MC replications (as described in the Muthén & Muthén 2002 SEM paper). However, what I found today was that Mplus actually seems to interpret the values provided in the MODEL statement as population values (when I provided different values following an asterisk (*) which I thought would be taken only as starting values under the MODEL statement, the population values in the output changed accordingly). On the other hand, population values in the output did not change when I changed the values following an asterisk in the MODEL MONTECARLO command. Am I missing something? 


The use of the values in MODEL MONTECARLO and MODEL are described in the first Monte Carlo example in Chapter 11. MODEL POPULATION provides the values for data generation. MODEL provides the values that are used as both the true population values for computing coverage and as starting values. The reason for this is so the models for data generation and analyis can be different. 


Hi  I am conducting a path analysis with count variables and receive an error message stating that I must use Monte Carlo integration in the analysis. Why is this a default in the program? See below for syntax and error message. Thanks! Mplus VERSION 3.12 MUTHEN & MUTHEN 05/16/2007 1:37 PM MISSING are all (999); USEVARIABLES ARE a5 h9 h17 recgende lngaccul relginf2 d1 needdepr forsrv instlng emolng; COUNT ARE h9 h17 FORSRV; ANALYSIS: TYPE=GENERAL MISSING; MODEL:forsrv on a5 recgende relginf2 lngaccul h9 h17 d1 needdepr emolng instlng; OUTPUT: SAMPSTAT RESIDUAL STANDARDIZED; INPUT READING TERMINATED NORMALLY 41107 mplus *** FATAL ERROR THIS MODEL CAN BE DONE ONLY WITH MONTECARLO INTEGRATION. 


Monte Carol integration is required when the number of dimensions of integration vary for individuals due to missing data. I recommend upgrading from Version 3.12 to the most recent version of Mplus. There have been many changes and improvements since Version 3.12. 


Thank you for the information. Is this due to the count nature of my mediatior and outcome variables? Is the FIML defualt turned off so to speak with use of Monte Carlo simulation in addressing the missing data? Thanks for the information as I am novice Mplus user. 


In this situation, Monte Carlo is type of numerical integation not a simulation. There is a brief description of numerical integration in Chapter 13 of the user's guide which is on the website. 


Hello, I am new to Mplus, and I am using the demo version in attempts to determine the sample size and power that would be necessary for my dissertation research. My hypothesized model posits mediatedmoderation, and there are 6 continuous variables involved (4 predictors (2 pairs of interacting variables) and two outcomes (one is a proposed mediator, the other the outcome). Which would be the best monte carlo simulation study to run using the demo version? 


Use the Monte Carlo counterpart of Example 3.11. 


I'm running a Monte Carlo study to examine sample size requirements. I have one continuous variable that is an interaction between X1 and X2. The define command does not seem to be available for monte carlo. (ie: if I were to create Z1 = X1*X2) Does it matter in the calculation of sample size if the variable Z1 is not defined as a function of X1*X2? If it doesn't then I assume I would just add Z1 to the names are command. 


In this situation, you would need to generate the data outside of Mplus and use external Monte Carlo in Mplus to analyze the data. 

Jon Elhai posted on Monday, March 10, 2008  3:46 pm



Drs. Muthen, In a Monte Carlo analysis, where I'm trying to estimate observed power after having conducted a confirmatory factor analysis... I'm wondering: 1) When freeing parameters with an asterisk, I assume the numbers I insert after an asterisk are the parameter estimates I obtained in my CFA? 2) If so, is this the standardized estimate, such as from the STDYX column in my CFA output? 3) If most of my Monte Carlo output’s estimates in the % Signif Coeff column are 1.000, could I have done something wrong, or is this possible? 


The values placed after an asterisk or @ symbol in the MODEL POPULATION command are used a population values for data generation. The values placed behind an asterisk in the MODEL command are used to compute coverage and as starting values. It sounds like you do not have values in the MODEL command. See Example 11.1. 


Hello, I would like to generate data for a simple SEM with two latent variables. However, how can I specify a nonlinear, e. g. concave and monotonically increasing, relationship between the variables? 


Mplus cannot specify nonlinearity of this type. It can do x squared or x1 times x2. 


Hi, I am trying to run a monte carlo simulation to determine the power needed for inclusion in our grant proposal. My model is a path analysis with 3 x's, a mediator (y1), and a categorical outcome (u1). I have read through the manual on monte carlo simulations and am using example 3.14 as a guide as well, but I have a couple of questions. 1) Am I correct in assuming that the path estimates to u1 are in logits (if I am using ML)? 2) If I am using ESTIMATOR = ML, do I have to estimate a residual variance for u1? Isn't it zero? 3) Is it possible to simulate patterns of missingness when using the ML estimator? Thank you. 


1. Yes, the default for ML is logistic regression. 2. With logistic regression, the residual variance is not estimated. It is fixed to pi squared divided by 3. 3. Yes. See Examples 11.1 and 11.2. 


Thank you for your help. I have now gotten the monte carlo simulation to run with missing patterns, although I had to add MONTECARLO = integration to get it to run. When looking at examples for this language I saw that those examples (e.g., 3.17) used ESTIMATOR = MLR. However, my model also runs with ESTIMATOR ML and I can't see what the difference is. My model is a path analysis with 3 x's, a mediator (y1), and a categorical outcome (u1). Is one of these estimators better than the other for my simulation? Thank you. 


The parameter estimates are the same for ML and MLR. The standard errors and fit statistics are different. ML has conventional standard errors and fit statistics. MLR has standard errors and fit statistics that are robust to nonnormality. Our default in most cases is MLR. 


Thank you for the clarification. Since my outcome variable is categorical and doesn't have a 50/50 split, MLR sounds like the better option for me. 


In the results of my monte carlo simulation using MLR, there is a log likelihood estimated across the replications for H0, but not one for H1. How do I assess model fit? Since I am doing this mainly for a power analysis of the paths I am not sure what the comparison model would be  where all of the paths are estimated at zero? Thanks for your help. 


With a combination of a continuous mediator and a categorical distal outcome plus ML estimation, there is no model fit assessment. This is because there is no relevant H1 model with an unrestricted covariance matrix. Such H1 models are relevant only for continuous outcomes. 


Hi. I am trying to generate binary data. For the further use of the generated data in IRT programs, I want to format the data without decimals and spaces. Monte carlo command does not have format subcommand. Is there any way to format the data? 


No, there is no way to change the format of the data saved from a Monte Carlo study. 

sorya posted on Wednesday, January 07, 2009  9:42 am



Dear Prof. Muthen, just a very short question: Conducting a Montecarlo study does it make any difference whether an asterisk or the @ symbol is used in the "Model Population" command? Thanks! 


No. But it does make a difference in the MODEL command. 


Dear Dr. Muthen, I am new to Mplus. I'm trying to run Monte Carlo now. After running, I saved the results in the text file. However, I don't know the order of results. The only guiding message I can see in the output file is "Parameter estimates (saved in order shown in Technical 1 output)". Can you tell me where I can get 'Technical 1 output', Please? How about technical 5 output? 


Hi Tae, Did you ask for the outputs? You can retrieve them by specifying the following: OUTPUT: TECH1 TECH5; For more information on this, see Chapter 17 in the User's Guide. Sincerely, Amir 


Hi, Thank you, Amir for your answer. I have another question. I ran simulations with Mplus v. 3.2. Recently, I reran the same simulations (I mean the syntax was identical) in Mplus v.5. I expected the same result. However, degree of freedom was different in two versions. Is there any reason for having two different dfs in the different versions of Mplus? 


Check that you have the same estimator in both runs and that the Tech1 outputs agree, and if not go with the one you want. 


Hi, can the multiple processors function be used with monte carlo analysis to speed up the process? 


Yes. 


May I ask a question for clarification regarding the Monte Carlo procedure. If I want to specify what the true values are in the population but don't want to save the data, I use the command MODEL MONTECARLO. If I want to specify what the true values are in the population as I did above, but this time I want to save the data, then I use the MODEL POPULATION command. Is that correct? 


MODEL POPULATION and MODEL MONTECARLO are the same and can be used interchangeably. The values given in them are used for data generation. 


OK, How would I code the multiple processors option to speed up the Monte Carlo analysis? Would I merely write processors=4 within the analysis option, or do I have to add something else? 


It would be faster if you put: PROCESSORS = 4 (STARTS); 


Thanks, I'll give it a shot. 


I have a question regarding the assessment of power with a Monte Carlo simulation. Suppose in the Model Population command I fix a path of F2 on F1@.60 and do the same in the Model command. The % Sig Coeff is the estimate of power and even with a large sample size it is showing at 0. Is this because the parameter is fixed in the Model command? 


Yes. You can't estimate power for a fixed parameter because it doesn't obtain an estimate/SE ratio that the power estimate is based on. 


Suppose when specifying a CFA model in the Model Population Command you have 3 indicators (x1x3) of Factor 1. If you set the residual variances of all three to .51, for example X2@.51 shouldn't it automatically set the loading of Lambda11, Lambda12, and Lambda 13 at a standardized value of .70? Given that 1  (.70^2) = .51? Currently it seems to require you to provide the loading. Thank you. 


No population parameter value is set automatically. The value zero is used for any parameter for which you do not give a population value 


To followup on the question regarding the Model Population command in a Monte Carlo simulation, I have 2 questions: (1) If I want to set a latent factor with 3 indicators to have an average rsquare of 60%, then should I set the population parameter values for the unstandardized residual variance of all three items at .40 (eg. x1@.40 x2@.40 and x3@.40) in Model Population . (2) If that is correct? Do I have to hand calculate what the unstandardized factor loading for each observed variable (item) would be or can I set the variance for the factor at 1 (F1@1) and then set the factor loading to start somewhere say F1 by x1x3*.75 assuming that the program will calculate the proper loading based on the residual variance being set at .40 and the factor variance being set at 1. At the end of the day, I want to be able to simulate data with different factors that have differing rsquare values. Thank you 


Rsquare = (lambda squared * factor variance)/(lambda squared * factor variance) + (residual variance) If the factor variance and factor loadings are one, then Rsquare = 1 / 1 + residual variance Solving for the residual variance Residual variance = 1  Rsquared/Rsquared For Rsquare of .6, the residua variance is 1.6/.6 = .667 


Am I reading this correctly from the manual page 346347 regarding external Monte Carlo. If data is generated using Mplus and saved as replist.dat, then used in the Type = Monte Carlo, the population values in the output come from the Model command (in the external Monte Carlo) and the average values come from the replist.dat. Is that correct? In this sense one could mispecify the model in the Model command and compare the results of population to average. 


The population values used for coverage are taken from the MODEL command for both internal and external Monte Carlo. The saved data sets are analyzed and the values are averaged across the data sets. 


I am running a simulation to look at the bias due to attenuation when averaging items. So I have 2 factors with 5 indicators each. In the population I have modelled them to correlate at .40. I average the using the Mean command in Define. Then when I run the external montecarlo (which I need because I am using the Define command) I put the original value (.40) in the Model: command F1 with F2@.40. In tech9 the average values will be those using the means of F1 and F2 comparing them against the population values. Is there any problems in reading the output that way when using the Define command? It seems right to me. 


Not sure I understand the intent here. Here are a couple of observations which may be of use: If the indicators are correlated 0.40, their averages won't correlate 0.40. I don't know why F1 with F2 is fixed at .40 in the MODEL command. Typically, the MODEL command would use *. You mention average values using the means of F1 and F2 comparing them against pop values  why would that be of interest? I thought you were getting at the attenuated correlation between means of indicators as compared to factor correlation. 


Sorry I wasn't very clear. What I have done is create two factors each with 5 indicators. In the population I have modelled them to correlate at .40 with varying sample sizes and 1000 reps. I want to examine the attenuation due to measurement error when you average the items in the factor to create one observed variable (call it OV) for each. In the external Monte Carlo, I use the define command to create the averages so factor 1 becomes OV1 = Mean(x1x5) and factor 2 becomes OV2 = Mean(x6x10). I put the original values in the Model command OV1 with OV2*.40 In tech9 I am assuming that the average values in the columns will be those using the means of OV1 and OV2 comparing them against the population values. Is there any problems in reading the output that way when using the Define command? It seems right to me. On the same topic, but different scenario, in the model command, is there anyway of specifying F1@(1.65)*Var(x) without specifying what the value Var(x) is? This would be needed if you are correcting for attenuation and are running external Monte Carlo with Type = Monte Carlo. Thank you, 


The average values in the output give you the average over the replications of the estimate for whatever parameter is printed. So for example the average estimate of OV1 WITH OV2. And also for the mean parameters of OV1 and OV2 which it sounds like you refer to. The answer to your V(x) question is no. Perhaps this can instead be done via Model Constraint. Perhaps using the Constraint = option in the VARIABLE command. 


That works great. I've never used this function before, but its great. One question. In example 5.20 and the discussion of it, it states that you specify "standardization" in the output command for the Rsqr values. But when I use the model constraint command it states that: "STANDARDIZED (STD, STDY, STDYX) options are not available when specific constraints are used in MODEL CONSTRAINT." One other question. Does using the model constraint command with the New function change the analysis model at all? 


When standardized with Rsquare is not provided you can do it yourself in Model Constraint. NEW parameters do not change the analysis model. 


Is there a way to save covariance matrices and/or correlation matrices for each replication (in a separate file) in a Monte Carlo run. For example, instead of generating the raw data for each replication it generates the matrix for each replication.  Thanks, 


No. 


Hi, I have a quick question concerning the PROCESSORS command with MonteCarloStudies. I'm currently running a MonteCarlo analysis on different computers using the TYPE=MONTECARLO subcommand under DATA:. I also used the PROCESSORS command in the ANALYSIS section. Now my problem is: while running these analyses on the 32bit version it actually uses multiple processors. However, when I run the same inputs on the 64bit version and I check the taskmanager I can see that the procedures simply skip from processor to processor and the CPUusage always hovers around 25% (on a computer with 4 processors). Am I missing something? I tried it with and without the (STARTS) subcommand but the result is always the same. Thanks in advance for your help! 


We have not had this experience. Can you send the input and data along with your license number to support@statmodel.com. We can run it on our 64bit computer and see if we have the same experience. 


Hello Drs. Muthen, I am using Mplus to generate and analyze data for a Monte Carlo study, but I would also like to import said datasets into SAS for additional analyses. Given that I am using NREPS = 10000, I would like to automate this process using a macro. Unfortunately, I have been unable to figure out a way to import the dat files produced by MPlus using a PROC IMPORT statement. I realize that this is primarily a SASrelated issue, however I was curious if there was any way to modify the format or type of output datafile produced by the Monte Carlo option in Mplus. Thanks 


There is no option to change the format of datasets saved via the REPSAVE and SAVE options of the MONTECARLO command. 


Hello, I want to specify R2 for a latent dependent variable in a mc simulation to 0.3. therefore I fix the residual variance to 0.7. correct? By doing this, do I also affect the reliability of the measurement construct of the DV? Can I specify residual variance and variance of the latent dependent variable separately? Thanks Alex 


If the variance of the variable is one, then a residual variance of .7 reflects an Rsquare of .3. You specify a variance or residual variance depending on the parameter that is estimated. For example, in a conditional model, a residual variance is specified. In an unconditional model, a variance is specified. 


Hello Linda, thanks for clarifying. Then i have a follow up question. my model dv on iv ; dv by v1v3@0.8; dv@0.7; !Rsquare = 0.3 v1v3@0.4 is then indicator reliability (0.8**2 * 0.3 )/(0.8**2 * 0.3 + 0.4)= 0.32 thanks in advance alex 


You need a few more values in MODEL POPULATION: iv*1; dv on iv*.55; dv by v1v3@0.8; dv@0.7; !Rsquare = 0.3 v1v3@0.4 This results in: var(dv)= (.55**2)*1 + .7 = 1 var(v) = (.8**2)*1 + .4 = 1.04 reliability (v) = .64/1.04 = .62 


Thanks again, I understand the condition that the variance of the dependent variable has to be 1 to specify the Rsquare=1residual variance. my path coefficient dv on iv*.55; will probably not be 0.55. so I will get a variance(dv) <> 1. how can I then specify Rsquare. Best Alex 


Then you have to figure total variance and variance explained and compute Rsquare from that. 


Hi Linda, so eg. in the case of 2 IV I use the formula for R2= (b1**2 * var(IV1) + b2**2 * var(IV2) + 2*b1*b2*cov(IV1IV2)) / (b**2 * var(IV1) + b2**2 * var(IV2) + 2*b1*b2*cov(IV1IV2) + residual(DV)) = R2 is that correct? and to calculate this from a normal Mplus output I take the unstandardized values? then I have another question belonging to the input parameters I specify in the monte carlo command. Are the parameter values standardized or unstandardized? I think they are unstandardized. is that correct? Thanks a lot alex 


Yes. Use the unstandardized values. You should specify population parameter values using unstandardized estimates. 


hi Linda, I am now conducting the MC Simulations. Is there a way to have an output with the Rsquare results. And if not, is there a special reason they cannot be calculated or something else? Best Alex 


Rsquare is not available for Monte Carlo. There is no particular reason. 


Hi Linda, I conducted the MC analysis and now I am a bit confused about the output. in the parameter specification and the population values the path coefficients are reported under the headline beta. 1)Are those standardized or unstandardized values? 2)Or are they both, depending what I specify as variances in the population and model part of the input. thanks Alex 


We do not standardize them. They depend on the population parameter values that you give. 


Linda, thanks a lot Alex 


Hello, I'm wondering whether Mplus has a way of exporting information on nonpositive definite theta matrices in the save data command. I have successfully saved out the results for each replication but I would like to also identify which ones are not positive definite. Thanks, Ginger 


You can see that if you add TECH9 to the OUTPUT command. 


Thanks Linda; I'm wondering, though, if this information can be included in the data file produced in the save data command, (for example, a binary term indicating whether the theta matrix for each replication was nonpositive definite). 


There is no way to request this if we don't save it with the other results and I don't think we do. 

ywang posted on Thursday, September 16, 2010  12:44 pm



Hello, To calculate the required sample size for grant proposals. Mplus needs specific information such as the mean and variance of slope and intercept of growth in the Monte Carlo study, but where can we get the information from the literature? The related paper usually has only part of the information, but not all information. If we do not have complete information, what can we do? Thanks a lot in advance! 


You need plausible population parameter values for all parmaeters in the model. If they are not provided by theory, you can estimate a model using your data and use these values as population parameter values. 


Hi, This might be a stupid question (sorry for that) but I cant seem to find the answer (maybe due to a cold I am stuck with ). When generating data under "Monte Carlo Population", what is the impact on the data that is generated of using @(fix) or *(start) when providing population model values ? Thnak you very much ! 


There is no difference between @ and * when they are used in MODEL POPULATION. Two places this is documented is under MODEL POPULATION and the first example in the Monte Carlo examples chapter. 


Hi, I would like to generate a series of data set using the monte carlo facility (lets say 500) and then to analyse them via the EXTERNAL facility. Why? because I want to run different types of models on the same data sets (and to save the time required to generate the data each time). For instance, I want to generate a set of data that fits a CFA model with cross loadings and see how to best recover the population parameters (ESEM, CFA,PVs, etc.). If there a way to still specify the population parameters somewhere to get the coverage and other Monte Carlo indices? Thanks 


The coverage values are taken from the MODEL command for both internal and external Monte Carlo. 


Hi, I would like to generate some simulated data sets to compare conventional regression analysis with SEM. When creating the factor structure underlying simulated data, in the past I have typically just set the variance of the factors equal to 1. As I am interested in comparing unstandardized coefficients in this particular set of simulations, I need to set the metric of the factors another way (otherwise my factors are standardized and so the regression coefficient relating them is also standardized and I am interested in unstandardized coefficients for this particular project). Am I correct that all I need to do as an alternative is set one of the factor loadings for each factor equal to one as in the syntax below? Model Population: f1 BY y1@1 y2y4*.707; f2 BY y5@1 y6y8*.707; f1 on f2*.5; y1y8*1; Or are there any other constraints I need to add to the model? Many thanks! 


That's it. 


thanks Bengt! And I have what is probably a very basic followup question. Using the above specification, how do I know what the variances of the factors are? 


Ask for TECH4 in the OUTPUT command. 


many thanks Bengt for the speedy and helpful response (as always)! 


I must be doing something wrong but can't figure out what. Unless I specify the variance of the factors in addition to the factor loadings in my model population statement, I run into a problem such that only a relatively small subset of the replications that I request are actually completed. As long as I also specify the variance of the factors, then the number of replications I request is the number completed. 


You have to give population values for the factor variances even if you don't fix them, so e.g. f1f2*1; 

Erika Wolf posted on Tuesday, December 07, 2010  2:23 pm



Can you provide more detail on using the Monte Carlo approach to generate nonnormal data for power analyses? I have read your 2002 paper on this but I am not clear on how to determine the appropriate population values to achieve the desired level of nonnormality in the generated data. In a message on this string dated 1/7/05, you state that the initial 3 steps in this process can be completed in 1 input file. Can you provide an example of that script? Thanks. 


The approach we used in the paper was using mixture modeling with trial and error. I think we describe this in the paper and the inputs are part of the paper and/or on the website. 

Erika Wolf posted on Monday, December 13, 2010  2:22 pm



I think there is only script available for the last step (the actual power analysis) in the paper. I'm trying to determine how you arrived at the values that you specified in the script. As I understand, you started with data generation for a 2 class model with .8 factor loadings, indicator error of .36 and factor intercorrelation of .25 for both classes. (a) Is this correct? The paper describes 3 steps before getting to the script provided in the paper. Step 1: Generate data for 10,000 cases with 2 latent classes (which will eventually be analyzed as 1). When you generate nonnormal data by having 2 latent classes, (b) are the only differences that you specify between the 2 classes that the factor 2 mean is set to 15 and the factor 2 variance is set to 5 in Class 1? Are all other parameters (factor loadings, indicator error, factor correlation) the same in the overall and classspecific models? I generated data for 10,000 cases with 2 classes (12% and 88%), each with .8 loadings, .36 indicator error, .25 factor correlation, and class 1 mean and variance of 15 and 5, respectively. But when I pulled the saved data into SPSS, the skewness and kurtosis values for the factor 2 indicators differed from what you report in the 2002 paper. I'm trying to replicate what you did before I move on to running the power analysis I need to run for my paper. Thanks! 


The two inputs for generating nonnormal data are shown on the examples page under the headings: CFA model with nonnormal continuous factor indicators without missing data. CFA model with nonnormal continuous factor indicators with missing data. The trick is to generate using two classes and analyze using one class. All of the steps are contained in the inputs. If you run those inputs, you should get the same results as in the paper. 

Yo In'nami posted on Friday, December 17, 2010  6:01 am



Dear Muthen and Muthen, Perusing the Muthen and Muthen (2002) and relevant materials using this method (the Mplus user's guide; Thoemmesa et al., 2010), I am planning to conduct a posthoc power analysis on a variety of models found in the field of language education. Since both sources illustrate many examples of input commands, I will use these examples as a starting guide. Is it correct to assume that power of any model can be calculated as long as the model can be programmed into Mplus? Or, in other words, is there any model whose power cannot be calculated using Mplus? Yo Muthen, L. K., & Muthen, B. O. (2002). How to use a monte carlo study to decide on sample size and determine power. Structural Equation Modeling, 9, 599620. Muthen, L. K., & Muthen, B. O. (2007). Chapter 11: Monte Carlo simulation studies. Mpus user's manual 5th edition. Thoemmesa, F., MacKinnon, D. P., & Reiser, M. R. (2010). Power analysis for complex mediational designs using Monte Carlo methods. Structural Equation Modeling, 17, 510534. 


You can use this approach on any model that can be specified using the MONTECARLO command. Note that the power is for one parameter of the model not the entire model. 

Yo In'nami posted on Friday, December 17, 2010  9:13 am



Linda, Thank you very much for a very useful comment! I have been considering this question for the past several months. Now I can go ahead with my work. Yo 

Yo In'nami posted on Wednesday, April 13, 2011  7:19 am



Muthen and Muthen (2002) explain that a Monte Carlo output labeled "% Sig Coeff" refers to power and this shows the proportion of replications for which the null hypothesis that a parameter is equal to zero is rejected for each parameter at the .05. I have conducted several posthoc power analyses on published models by specifying the MODEL POPULATION and the MODEL commands to be identical. If parameter estimates in published models are reported to be statistically significant and these estimates are specified in both the MODEL POPULATION and the MODEL commands, will the "% Sig Coeff" of these parameter estimates be always over .80? If so, conducting a posthoc (not a priori) power analysis just seems to be establishing/reconfirming what is already apparentthat there was sufficient power to detect statistical significance of parameters. In other words, is it correct to say that there is no need to conduct "posthoc" power analyses if parameters of interest have been already reported to be statistically significant? Yo Muthen, L. K., & Muthen, B. O. (2002). How to use a monte carlo study to decide on sample size and determine power. Structural Equation Modeling, 9, 599620. Muthen, L. K., & Muthen, B. O. (2007). Chapter 11: Monte Carlo simulation studies. Mpus user's manual 5th edition. 


I would think this type of power analysis is usually done in the planning of a study to determine the necessary sample size or perhaps after to see if nonsignificance is due to lack of power. However, even if you find significance, you don't know with what power you find it. Perhaps your power was .3. You may have been lucky to find significance in your sample but may not be so lucky in another sample of the same size. 


Hi, I am running three sets of Monte Carlo simulation studies for my dissertations: one for my measurement model, one for a followup structural portion and another one for a parallel growth curve process. I could make the CFA run but I do not see the commands for a parallel growth curve process in the Mplus guide. Any advice on how I can do this will be appreciated. Thanks. 


See UG ex 6.13. 


Hi, we're currently preparing a rather large simulation study and are desperately looking for ways to speed the process up. We had the idea that, while we wan't to have ChiSquare, SRMR, and RMSEA we're not really interested in CFI/TLI. Is there any way to switch off the baseline model estimation while keeping the H1 model estimation? Thank you so much in advance! Martin 


There is no way to do this. If you send your input and license number to support@statmodel.com, we will see if we can make other suggestions to speed things up. 

Yo In'nami posted on Saturday, May 21, 2011  2:33 am



I am using a Monte Carlo power analysis and received an error message that the model is unidentified although it is identified with other SEM programs. The degrees of freedom are 17 in Mplus but 19 in other programs. I have been unsuccessful rectifying the syntax. I am grateful for your generous help! MONTECARLO: NAMES ARE X1X8; NOBSERVATIONS = 259; ! SAMPLE SIZE OF INTEREST NREPS = 10000; SEED = 53567; MODEL POPULATION: IAVLE BY X1*.89 X2*.69 X3*.75; SRC BY X4*.81 X5*.83 X6*.71 X7*.79 X8*.43; SRC ON IAVLE*.67; X1@1 X4@1; X1*.21; X2*.52; X3*.44; X4*.34; X5*.31; X6*.50; X7*.38; X8*.82; SRC*.55; MODEL: (Note. Exactly the same as the Model Population above and thus omitted to shorten the message); ANALYSIS: ESTIMATOR = ML; OUTPUT: TECH9; 


Please send the full output including TECH1 and your license number to support@statmodel.com. 


You have freed all factor loadings but forgot to fix the factor variances to one. 

mpduser1 posted on Thursday, June 23, 2011  4:12 pm



I've been running some Monte Carlo power analyses in Mplus 6.11 and I'm wondering if the results make sense. My input file is: MONTECARLO: NAMES ARE Y T; NOBSERVATIONS = 280; NREPS = 1000; SEED = 4533; GENERATE = Y (n 2); NOMINAL ARE Y; CUTPOINTS = T(0); MODEL POPULATION: T*.50 ; [T*.25] ; Y#1 on T*.3; Y#2 on T*.3; [Y#1*.21]; [Y#2*1.41]; ANALYSIS: TYPE = GENERAL; ESTIMATOR = ML; MODEL: Y#1 on T; Y#2 on T; <results> 


The input looks fine. 

mpduser1 posted on Friday, June 24, 2011  2:04 pm



Okay, I guess I was thrown by the fact that Mplus doesn't report by the "population" parameter values, such that the "% Sig. Coeff" column provides the parameter specific power estimates, unless MODEL COVERAGE is specified. So, to get the estimated power figures directly, I have to specify my population model twice, once in MODEL POPULATION and once again in MODEL COVERAGE. Is this correct? 


The population parameter values for coverage should be in the MODEL command not MODEL POPULATION. See Example 12.1. 

Anonymous posted on Wednesday, June 29, 2011  11:04 am



BOOTSTRAP is not allowed with MONTECARLO. Is there any way to obtain the bootstrap standard errors and confidence interval for parameter estimates in a monte carlo study by using Mplus? 


Not unless you analyze each data set separately after data generation. 


Hi, I just ran a fairly simple path model and tried to save the estimates to use for a subsequent MC simulation. I am getting the following error: "Saving of ending values for the ESTIMATES option is not available for models with covariates. Request for ESTIMATES is ignored." Thanks, Leslie 


Please send the full output and your license number to support@statmodel.com. 

Jak posted on Monday, October 10, 2011  8:18 am



Hello, In a montecarlo analysis, 1 of the 500 replications was not completed. I saved the results to file, which has the results from the 499 completed replications. For each replication, I want to compare the fit of this model with the fit of another model, for which I have results for all 500 replications. How do I find out which replication is missing in the first file? Thanks in advance, Suzanne 


Ask for TECH9 in the OUTPUT command and you will see which replication had the problem. 

elementary posted on Friday, October 28, 2011  2:48 am



Hello, I want to run a power analysis using the MonteCarlo Approach for a planned survey using a twostage sampling procedure. As my hypotheses are on level1 only, I do not plan multilevelanalyses, but rather want to run a "normal" SEM, adjusting standard errors with cluster/type = complex. I tried to model this using the Mplus montecarlo option as described in the CFAtextbook of Brown (2006, p. 420ff)(To get started, at the moment I ignore the mean structure and the multiple group design, which have to be added in a later stage). When assuming that data are not nested, everything worked. However, I did not manage modeling the clustering as well. Type = complex seems not to be available within montecarlo. And using NCSIZES and CSIZES is obviously available only with type=twolevel, which does not seem to be applicable in this case. Thus, I thought of finally correcting the sample size I get under the assumption of a random sample, using a correction formula for effective sample size, (cf. the Multileveltextbook of Hox, 2002, p.5). Is this reasonable and are there alternatives to that? Thanks! 


You need to generate the data in one step using TWOLEVEL and analyze it using external Monte Carlo is a second step. This is described in Example 12.6. 

Eric Teman posted on Tuesday, January 24, 2012  1:12 pm



When running a Monte Carlo simulation, is there a way to monitor or output convergence failure information? 


Ask for TECH9 in the OUTPUT command. 

elementary posted on Wednesday, January 25, 2012  6:20 am



Thanks for your quick reply to our posting from October 28 regarding (external) montecarlo! We first tried to understand the examples (which took us some time). For this purpose, we reran example 9.12 and subsequently tried to identify the numbers in the output that have been used as input for example 12.6. Unfortunately, we often were not able to identify them. For example, in the unstandardized output for example 9.12, the residual variance for sw was 0.473 and for sb it was 0.214, while in example 12.6 input, it was entered as 0.2 for sw and 0.1 for sb . These differences generalize to other parameters. Do you have any suggestions on that? That might help us to locate possible further errors we made in subsequent steps of our analyses. Thanks! 


Please send an output that shows your question along with your license number to support@statmodel.com. Specify exactly where in the output the numbers you refer to can be found. 

Lois Downey posted on Thursday, February 09, 2012  9:26 pm



Chapter 12 of the User's Manual includes a discussion of Monte Carlo output related to the chisquare test of model fit (pp. 3623). In the example, the critical value of chisquare at the 0.05 level was exceeded in 0.058 of the replications. The discussion indicates that 0.058 is close to the expected value of 0.05, thus indicating that the chisquare distribution is well approximated. I'm not clear on how much greater than .05 the value can be, and still be acceptable. What would you use as an upper limit before concluding that the chisquare distribution is NOT well approximated? Thanks. 


I would say less than .10 but it is really your choice as to how precise you want to be. 


Dear MPlus We are studying DIF under small sample conditions to determine the degree of bias in parameter and se values. We are using a MIMIC model for DIF detection. In the simulation we have varied the sample size (400 to 1000) and expected the bias values to decrease as sample size increased which was not the case. Consequently, we are concerned that we may have misspecified our simulation model. We have split the population as being either high=2 or low=1 motivation (r_att_1). The factor Improvement (Improv) has 6 items each of which has ordered categorical responses (6 options, with 5 thresholds). We appreciate your comments as to whether the parameter values (esp. the residual variances) have been specified correctly. MODEL POPULATION: [r_att_1 @1.5]; ! mean of the IV r_att_1 @.25 ; ! variance of the IV Improv BY i1@1 i2i6*.6; Improv ON r_att_1*.8; i1 ON r_att_1*.24; ! i6 not included i2 ON r_att_1*.24; i3 ON r_att_1*.24; i4 ON r_att_1*.24; i5 ON r_att_1*.24; i1i6 *.76; Improv*.6; The threshold values for all items are: ! thresholds [i1$1*.5]; [i1$2*0]; [i1$3*.5]; [i1$4*1]; [i1$5*1.5]; 


If you use the WLSMV estimator you have to make sure that the population values of the variances of the DVs conditional on the IVs are 1 in order to get the parameters in the metric that WLSMV uses. Your DVs are the i variables and your IV is r_att. If this doesn't help, send complete output to Support. 


Hi Bengt. we checked with documentation as to how to calculate the variance for the i variables and found this: residual variance = 1R2. Since each i variable has 2 predictors (r_att_1 and Improv) we have squared the beta values and then summed them before subtracting from 1 to get the amount for the error variance. In the example provided, e have beta paths of .6 (Improv) and .24 (r_att_1) so the R2= .36 and .0576 respectively. When summed we get R2=.4176 Subtract from 1=.5824 which would lead to i1i6 *.58; is that right? thanks for clarifying 


That would only be correct if Improv and r_att_1 are uncorrelated, but r_att_1 influences Improv. Instead, express the i variables in terms of r_att_1, so Improv is no longer in the picture. That is, using the fact that i is a function of Improv which is a function of r_att_1 and a residual. 


Bengt thanks. I agree that my calculation depends on the assumption of independence which is not the case in our model. However, your explanation still leaves me uncertain as to how to calculate an appropriate value for the i variable residual. Should we adjust the Improv beta weight (.6) by the beta weight from r_att_1 to Improv (.8) before determining residual variance? This is what I understand your comment means: (square(.6*.8))=.23 as the new effect of Improv and add it to .0576 the effect of r_att_1. This would give us a residual of .7124. This seems awfully high to me as a residual but if it's right then we go with it. Please advise if I've understood correctly. 


You have in general notation (1) y = lam*f + gam*x + e (2) f = beta*x + d You insert (2) into (1) and get the "reducedform" expression y = lam*beta*x+gam*x+lam*d+e = (lam*beta+gam)*x+lam*d+e from which you get the variance of y easily in the usual way because all 3 terms are uncorrelated (x, d, and e are uncorrelated). 


I should add that the variance you want to be one is the conditional variance V(y  x) = V(lam*d+e) = lam*lam*V(d)+V(e). 


Hi Bengt Please check that we have understood this correctly from your instructions. The formula for e is e=1(lam*beta+gamma)*x(lam*d) lam=regression from f to I var = .60 beta=regression from x to Improv = .80 gamma=regression from x to I var = .24 x=variance of x =.25 d=residual of f = .36 Thus, e=1(.6*.8+.24)*.25(.60*.26)=.604 Please advise if we got this right Thanks 


It is not e that you want, but its variance V(e): V(e) = V(yx)  lam*lam*V(d) = 1  lam*lam*V(d) = 1 0.60*0.60*0.6 = 0.784. 


Dear Dr. Muthen, I have run a complex two wave model (df=800, N = 595) with a second order factor. The ratio of the sample size to estimated parameters is about 5. This is not consistent with rules of thumbs as given be Kline (1998). Would it be correct to use a monte carlo anaysis, such as example 12.7 to prove that estimates are "correct"? thanks Christoph Weber 


Yes, you can use a Monte Carlo study to do this. 

burak aydin posted on Wednesday, September 19, 2012  5:10 pm



Hello, I generate the data externally. I use monte carlo facility to analyze them. I save both results and the output. I can see which iterations did not run, but I would like to automate this process. I would like to see an iteration indicator on the results file or a more efficient way to see failed iterations rather than visual check (ctrl+f "did not"). Is this possible with Mplus? or do you know any other way? Thank you very much. 


The results file includes a replication number. If you ask for TECH9, you will see error messages related to each replication. 

burak aydin posted on Thursday, September 20, 2012  12:59 pm



Hello Dr. Linda, We use version 6.11, the results file does not include a rep number. These are what it saves in our case RESULTS SAVING INFORMATION Order of data Parameter estimates (saved in order shown in Technical 1 output) Standard errors (saved in order shown in Technical 1 output) Chisquare : Value Chisquare : Degrees of Freedom Chisquare : PValue CFI TLI H0 Loglikelihood H0 Scaling Correction Factor for MLR H1 Loglikelihood H1 Scaling Correction Factor for MLR Number of Free Parameters Akaike (AIC) Bayesian (BIC) SampleSize Adjusted BIC RMSEA : Estimate SRMR : Between level SRMR : Within level Condition Number and yes I ask for tech9 and find which iteration did not terminate normally. I do it by cntrl+f "did not". What I need to accomplish is having j rows in the result file if I run j iterations. If iteration number i did not run, I want to insert 99s in row i. Thank you 


Try Version 6.12 which is the latest version. I am 99% sure there is a replication number saved. 


Dear Dr. Muthen, I am using the Monte Carlo facility in MPLUS just to generate data but I am not fitting any model to the data. However, I get the following warning for all replications when I have a sample size of 100: THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NONPOSITIVE DEFINITE FIRSTORDER DERIVATIVE PRODUCT MATRIX. Since I am not fitting any model to the data I am not sure how to interpret the warning. I wonder if it is a default warning when the number of parameters is larger than the sample size. Thank you. 


Please send the output and your license number to support@statmodel.com. 

Ray Cheung posted on Tuesday, January 08, 2013  11:16 pm



Hi, I generate 20 indicators using montecarlo but I want to use only the first 18 for subsequent analysis. Is there a way for me to ask Mplus to ignore the last 2 indicators? Thanks! 


You can use external Monte Carlo where you generate the data in the first step and analyze it in the second step. See Example 12.6. 

Ray Cheung posted on Thursday, January 10, 2013  12:00 am



Thank you very much. In addition, I understand that TYPE=MONTECARLO and SAVEDATA: SAVE=FScore together. I would like to ask if Mplus can save factor scores in each dataset analyzed. Thank you. 


I don't think you can do this. Try adding it to the input to be sure. 


Is it possible to obtain the average covariance matrix or correlation matrix for all replications as opposed to just the first replication? Thanks, 


No, this is not an option at this time. 

Ray Cheung posted on Thursday, January 10, 2013  5:43 pm



Hi Linda, When I use TYPE=montecarlo and request tech9, there are condition codes in some of the replications. Is that a way to ask Mplus to report the average results ignoring those conditions with error code? Thank you 


The averages are over all replications that converge. There is no way to change this. 


Hello Drs. Muthen, I am interested in saving the means and standard deviations for the raw data in each replication of a multigroup Monte Carlo analysis with 10 variables. I would like to be able to produce a matrix of this information for use in an external program. Is this currently possible at this time? Thank you. 


This is not possible. You can save the data from each replication but not the means and standard deviations. 

Geumju LEE posted on Sunday, March 03, 2013  9:02 pm



Hello. I¡¯m trying to simulate an unconstrained latent interaction model by Monte Carlo. First I generated the 1000 data sets of y1y3, x1x3, z1z6. And the population values of main effects of X and Z are set to 0.4 both, the population correlation between X and Z is set to 0.3, and the population interaction effect is manipulated as 0.2. And then, I tried to analyze the 1000 data sets I generated. This is the part of output. MODEL RESULTS ESTIMATES S. E Population Average Std. Dev. Average X BY X1 1.000 1.0166 0.0685 0.0668 X2 1.000 0.8779 0.0616 0.0612 X3 1.000 0.8275 0.0594 0.0591 . . . . . . . Y ON X 0.000 0.4077 0.1011 0.0951 Z 0.000 0.4077 0.0973 0.0940 XZ 0.000 0.2005 0.0955 0.0895 X WITH Z 0.000 0.2960 0.0734 0.0711 I don¡¯t understand what the column labeled ¡®POPULATION¡¯ means. I didn¡¯t set either 1.000 or 0.000. How did I get these values? Please explain the meaning of these. Thanks in advance. 


The population values are taken from the MODEL command. They are used to compute coverage. 

Geumju LEE posted on Wednesday, March 06, 2013  6:04 am



Thank you for your reply. However it couldn't answer my question. So I'm writing my question again with more detailed information. Actually, the average values are similar with the values that I set in DATA GENERATION stage. (I conducted monte carlo simulation in separate stages of 'data generation' and 'analysis of unconstrained approach'.) I set from x1 to x3 as 1.02, 0.88, and 0.83 respectively when I generate data, and I got the AVERAGE values of the ANALYSIS output. The other values of indicators that I set in data generation are also similar with the average values. Although I've set the POPULATION values neither 1s nor 0s in Model command, I've got 1s and 0s in unconstrained ANALYSIS output. I don't understand what 'POPULATION' means and how I got these. By any chance, aren't these values just for filling the blanks? Again, Thank you so much. 


Please send the output and your license number to support@statmodel.com. 

John Plake posted on Monday, July 29, 2013  9:39 am



I am running a monte carlo simulation for sample size and power (Muthen & Muthen, 2002) on a hypothesized fourfactor CFA model with 12 indicators. When I run it according to the description in the literature, it works flawlesly. Parameter and SE bias along with power indicate that N > 44 should work. However, when trying to extend that simulation to include a single secondorder factor in place of the firstorder correlations, the output gets wonky. Standard error bias is off the charts, even with N = 2,000. I'm sure I'm misspecifying something, but I can't find any sample syntax for a secondorder CFA in a montecarlo simulation. The specific problem I'm having is in the theta matrix. Here is the syntax I'm using... MODEL POPULATION: CPERF by CP_P1CP_P3*0.8; PARTNER BY PA_P1PA_P3*0.8; TPERF BY TP_P1TP_P3*0.8; TWORK BY TW_P1TW_P3*0.8; MJP BY CPERFTWORK*.6; MJP@1; CP_P1TW_P3*.36; CPERFTWORK*.64; MODEL: CPERF by CP_P1CP_P3*0.8; PARTNER BY PA_P1PA_P3*0.8; TPERF BY TP_P1TP_P3*0.8; TWORK BY TW_P1TW_P3*0.8; MJP BY CPERFTWORK*.6; MJP@1; CP_P1TW_P3*.36; CPERFTWORK*.64; 


It looks like all of the firstorder factors have all factor loadings free. In this case, you must fix the factor variances to one. 

John Plake posted on Monday, July 29, 2013  1:52 pm



Thanks, Linda! Somehow I forgot that you can't estimate both loadings and variances at the same time. [facepalm] 

Wang Shan posted on Monday, October 28, 2013  8:59 pm



Hi Linda, I am doing a CFA Monte Carlo simulation study. And I have a question here. As we know,in the model population part, the values such as factor loadings are fixed to true values. But I'm not sure in the model part, I should give true values as the starting values for analysis or just use ordinary values such as 1£¬1 and so on. If we use true values as the starting values, will it ifluence the estimated results ? So here is my syntax, I'm not sure which one to choose. (1)Use ordinary values as starting values MODEL POPULATION: ...... Trait1 BY y1@0.397 y2@0.559; ...... y1@.58 y2@.497; ...... MODEL: Trait1 BY y1*1 y2*1; ...... y1@1; y2@1; ...... (2)Use true values MODEL POPULATION: ...... Trait1 BY y1@0.397 y2@0.559; ...... y1@.58 y2@.497 ......; MODEL: Trait1 BY y1*0.397 y2*0.559; ...... y1@.58 y2@.497 ......; 


In the MODEL command you should give the population values. The values given in the MODEL command are the values that are used for coverage. See Example 12.1 where the MODEL POPULATION and MODEL command are described. 

Wang Shan posted on Tuesday, October 29, 2013  8:40 pm



I have checked the user's guide. And find that I misunderstood the MODEL command before. Thank you so much for your reply! 


Hello, I am running a Monte Carlo simulation for a growth model with 10 time points (y1y10) and a single dichotomous predictor (tx). I am attempting to use the MODEL MISSING command to generate a steady increase in missing data that ends in roughly 20% missing data by the final time point. However, in the section of the output that lists the summary of missing data for the first replication, a few of the missing data patterns appear to show close to 90% or 100% missing across all time points. Is this right, or have I completely misunderstood how to go about coding for missingness? Thanks for your help! MODEL MISSING: [y1y10@15]; y1 on tx@0; y2 on tx*10.405; y3 on tx*11.108; y4 on tx*11.523; y5 on tx*11.821; y6 on tx*12.248; y7 on tx*12.5576; y8 on tx*13.0075; y9 on tx*13.3417; y10 on tx*13.613; 


Perhaps your tx slopes are too high. And are you sure you used Cutpoints on tx. Try it out for one replication with a huge sample like 10,000. 

Tracy Zhao posted on Thursday, January 23, 2014  3:04 pm



Hi, I am trying to learn how to do Monte Carlo studies using Mplus. I read the user guide, version 7, and have a question. Chapter 12, page 418, you have the example: MODEL POPULATION: [x1x2@0]; x1x2@1; f BY y1@1 y2y4*1; f*.5; y1y4*.5; f ON x1*1 x2*.3; MODEL: f BY y1@1 y2y4*1; f*.5; y1y4*.5; f ON x1*1 x2*.3; I wanna know why "f BY y1@1 y2y4*1;" in both MODEL POPULATION and MODEL command? What would be the difference if I write it as "f BY y1y4*1;" in both commands? Thanks! 


Either one factor loading or the factor variance must be fixed at one to set the metric of the factor. 

Tracy Zhao posted on Thursday, January 23, 2014  6:23 pm



Oh, so two questions follow: 1. If I write it as "f BY y1*1;", it is possible that 1 is just a starting value that can change to any other number in the modeling process, right? But if I write it as "@1" then it will be fixed to 1 instead of other value? Is my understanding correct? 2. Why do I need starting value in MODEL POPULATION? Am I not just specifying the model parameters? Under what circumstance would I want the model parameters I specified to change to other numbers? Perhaps you can recommend some readings if this is too hard to explain here. Thanks again! 


In MODEL POPULATION there is no difference between @ and *. In MODEL there is. * designates a starting value for a free parameter. @ fixes a parameter to the value that follows. MODEL POPULATION gives the population parameter values for data generation. See Example 12.1. All of the commands and options are explained. See also the MONTECARLO command in the user's guide. 

Tracy Zhao posted on Friday, January 24, 2014  10:22 am



I see. Thanks a lot Dr. Muthen! 

Tracy Zhao posted on Tuesday, January 28, 2014  9:11 am



Hi, I don't know where this question should go, so I am just going to ask it here: if I am batch running Mplus for a large simulation study, can I still run Mplus for a different project (using the editor and run from the editor)? Thanks! 


I would not recommend this. 

Jan posted on Sunday, June 29, 2014  12:26 pm



In a simple twolevel model with WLSMV estimation I obtain the message: *** WARNING in SAVEDATA command Saving of ending values for the ESTIMATES option is not available for models with covariates. Request for ESTIMATES is ignored. How to solve this problem? 

Jan posted on Sunday, June 29, 2014  12:28 pm



I would like to save these estimates for a monte carlo study. When I enter values 'manually' to the monte carlo input, I obtain the message: *** FATAL ERROR A POPULATION VARIANCE FOR A COVARIATE IS ZERO. However, the twolevel regression model works fine. I would appreciate any suggestion. 


Use the SVALUES option of the OUTPUT command. You will receive input with the final estimates as starting values. A model is estimated conditioned on the covariates. Their means, variances, and covariances are not model parameters. However, you need to specify them in MODEL POPULATION to generate data. Do a TYPE=TWOLEVEL BASIC with no MODEL command to find these values. 

Jan posted on Monday, June 30, 2014  6:45 am



Thank you very much Linda! 

Jan posted on Monday, June 30, 2014  7:21 am



Linda, I entered variances/means to the Model Population accordingly with your suggestion, but how to enter the covariances if 2 variables belong to the within level and one to the between level? I get this error message: *** ERROR in MODEL POPULATION command Betweenlevel variables cannot be used on the within level. Betweenlevel variable used: g *** ERROR in MODEL POPULATION command Betweenlevel variables cannot be used on the within level. Betweenlevel variable used: g *** ERROR The following MODEL POPULATION statements are ignored: * Statements in the WITHIN level: STIMTYPL WITH g ORD WITH g 


Put the covariance between the 2 within variables in the within part of the model. The between part of the model will have only a mean and variance for the betweenlevel covariate. 

Jan posted on Monday, June 30, 2014  9:51 am



Perfect, thanks very much. 

mpduser1 posted on Wednesday, August 13, 2014  3:16 pm



I am trying to do a power analysis for a negative binomial regression model in Mplus. I am using the following syntax: MONTECARLO: NAMES ARE y x; NOBSERVATIONS = 275; NREPS = 10000; CUTPOINTS = x(0); COUNT = y (nb); MODEL POPULATION: [y@1]; y@2; [x@0]; x@1; y on x@.35; MODEL: y on x*.35; The error message I get is: *** ERROR in MONTECARLO command A COUNT(nb) variable in the analysis must be generated as a negative binomial variable. Variable cannot be analyzed as COUNT(nb): Y Specifying the mean and dispersion of y on the POPULATION command does not work either. Is this type of power analysis not possible in Mplus? 


Your COUNT statement specifies how to analyze Y. You need to specify how to generate Y, adding GENERATE = u1(nb); This is like in the second part of UG ex 3.8. Note that all the UG examples have Monte Carlo versions which are posted on our website under Mplus User's Guide Examples. 

mpduser1 posted on Wednesday, August 13, 2014  7:03 pm



That's very helpful. Thank you. 

mpduser1 posted on Thursday, August 14, 2014  9:11 am



Professor Muthen, Via the error message I noted above, am I correct in understanding that I cannot introduce a model misspecification where I specify a negative binomial for a POPULATION model, but a normallydistributed Y for the analytic model? 


You can do that using Mplus Monte Carlo in two steps: First generate the data by one model; then analyze the data by another (called "external Monte Carlo"). See the User's Guide chapter 12 for examples of 2step Monte Carlo approaches. 

Yueqi Yan posted on Monday, August 25, 2014  5:35 pm



Hi Dr. Muthen, I am doing a Monte Carlo simulation on the efficiency of planned missing data designs. In the attached syntax, there are 84 missing data patterns, and I got a error message as follows. *** ERROR in MONTECARLO command The number of sets of PATMISS variables does not match the number of patterns in PATPROBS. I tried to reduce the number of patterns, and found that the when there are less than or equal to 64 patterns there seems to be no problem, but when there are more than 64, the error message comes back. I was wondering if Mplus has a limit on the number of missing data patterns. Or there are something wrong with my code? Because in my study I also have a design that needs 2*64 patterns. 


Are you using Version 7.2. I think that has been increased. 


Hi Dr. Muthen, i want your help to simulate data with multiple group nonlinear SEMs with the following model. NAMES = Y1Y10; ANALYSIS: TYPE = RANDOM; ALGORITHM = INTEGRATION; ANALYSIS: ESTIMATOR = ML; MODEL: x1 BY Y1 Y2 Y3 Y4; x2 BY Y5 Y6; x3 BY Y7 Y8; x4 BY Y9 Y10; x4 on x1 x2 x3; x1 with x2; x2 with x3; X x1x2  x1 XWITH x2; X x1x1  x1 XWITH x1; X x2x2  x2 XWITH x2; x4 ON Xx1x2; x4 ON Xx1x1; x4 ON Xx2x2; where all values of lambda are 0.8 all values of gama are 0.6 all values of mu.y1y10 are 0. i hope to help me to do that because i am new on mplus. i appreciate your help and your time. 


All examples come with a Monte Carlo counterpart. See mcex5.13.inp as a starting point. 


Hi Dr. Muthen I am trying to write the solve the code below montecarlo: names = y1y10; generate y1y10(1); categorical = y1y10; ngroups = 2; nobs = 200 200; nreps = 100; SEED = 53487; save = mplus.dat; analysis: type = random; model population: g1 x1 by y1@1 y2y4*0.8; x2 by y5@1 y6*0.8; x3 by y7@1 y8*0.8; x4 by y9@1 y10*0.8; x4 ON x1*0.6; x4 ON x2*0.6; x4 ON x3*0.6; y1y10*0.0; X x1x2  x1 XWITH x2; X x1x1  x1 XWITH x1; X x2x2  x2 XWITH x2; x4 ON Xx1x2*0.6; x4 ON Xx1x1*0.6; x4 ON Xx2x2*0.6; model populationg2: model: g2 x1 by y1@1 y2y4*0.8; x2 by y5@1 y6*0.8; x3 by y7@1 y8*0.8; x4 by y9@1 y10*0.8; x4 ON x1*0.6; x4 ON x2*0.6; x4 ON x3*0.6; y1y10*0.0; X x1x2  x1 XWITH x2; X x1x1  x1 XWITH x1; X x2x2  x2 XWITH x2; x4 ON Xx1x2*0.6; x4 ON Xx1x1*0.6; x4 ON Xx2x2*0.6; output: tech9; many thanks in advance 


You didn't say what the problem was. 


Hi Dr. Muthen I changed the previous mplus code to this code to get on montecarlo simulation is that correct? and when i implemented this code i got on this error *** ERROR in ANALYSIS command ALGORITHM=INTEGRATION is not available for multiple group analysis. Try using the KNOWNCLASS option for TYPE=MIXTURE. i appreciate your help and your time.many thanks 


Your model has categorical outcomes and continuous factors and XWITH. This means that ML needs to be used with numerical integration. Multiplegroup analysis in this case is done with Type=Mixture and Knownclass  see examples in the User's Guide for how to do this. 


thank you so much for your help actually i want to get on simulation data without applying any estimator. after putting the type= mixture and knownclass i still have error *** ERROR in MONTECARLO command Unknown option: KNOWNCLASS **************************************** title: this is an example of a multiple group nonlinear SEM with categorical variables montecarlo: names = y1y10 group; generate y1y10(1); categorical = y1y10; ngroups = 2; nobs = 200 200; nreps = 100; SEED = 53487; save = mplus.dat; CLASSES = c(2); KNOWNCLASS = c(group = 12); many thanks again 


The Mplus Version 7.1 Language Addendum says: The NGROUPS option of the MONTECARLO command has been extended for use with TYPE=MIXTURE. It is used to specify the number of classes to be used for data generation and in the analysis. The program automatically assigns the label %g#1% to the first class, %g#2% to the second class, etc. These labels are used in the MODEL POPULATION and MODEL commands. So you can say for example: MONTECARLO: NAMES = y1y10; ngroups = 40; NOBSERVATIONS = 40(500); NREPS = 1; ANALYSIS: TYPE =MIXTURE; ESTIMATOR = ml; MODEL POPULATION: %OVERALL% f1 BY y1y10*1; [y1y10*0]; [f1*0]; f1*1; y1y10*.5; %g#1% f1 BY y1*.7 y2y10*1 (lam1_1lam1_10); [y1*.5 y2y10*0] (nu1_1nu1_10); [f1*0]; f1*1; etc 


thank you for your help but still my program not work untill now can you help me to correct it. 


Please send your output and license number to support@statmodel.com. 

Mika S. posted on Tuesday, November 25, 2014  6:23 am



Hi! I was asked by a reviewer to conduct a posthoc power analysis for bivariate crosslagg models because some of the crosslagged coefficients were of rather small magnitude but significant (e.g., STDXY crosslagged betas were around .08 and p was below .05). My sample size was, in my opinion, quite in a normal range for such studies (N = 700). And my first question is whether a posthoc power analysis really makes sense in this case. My initial thought was, that adding 95% CI's to the betas would be a more suitable way "to better understand the small but significant effects" (core request of the reviewer)!? Anyway, a first look at posthoc power analysis (EX 12.7.)revealed some problems for me and my specific data set. My initial bivariate crosslagged model was based on complex/cluster analysis (data are clustered in schools). I do not want to conduct multilevel analysis because school effects are not the main aim of my study. My second question thus is: Are there any chances to conduct the 2 step EX 12.7 with the original complex/cluster analysis (since monte carlo, at a first try, does not want "complex/cluster" in step 2) and how should I do this!? Third question: Is there any chance to use a subset of variables in EX 12.7 for analyses or do I have to "cut" the datasets for these analyses in SPSS? "Usev" does not seem to work. Many thanks! 


You would have to generate the data using a Type=twolevel model in step 1 to get the complex survey features, then analyze in step 2 using Type=Complex. So that doesn't sound desirable since step 1 would then go beyond your initial model (which is not twolevel). If the Type=Complex SEs in your realdata run aren't that different from a run where you ignore Complex you are in a better position. I don't know that the simulation adds much. I guess it can tell you: If I generate data with my sample size assuming my analysis model and parameter values are correct, do I get these low SEs/these pvalues? If you don't, that's telling you that your model has some misspecification that causes small SEs. 


Can I do an internal montecarlo for a simple path model where the population model specifies an exogenous variable with a nonnormal distribution? Specifically, I am thinking about a power study that included age as an exogenous variable and there is a fairly uniform distribution of 5 (60 mo.) to 7 (84 mo) year old kids. Or, where the exogenous variable is a proportion and the distribution is slightly Ushaped? I can come up with a variance, but I do not want a normal distribution in either case. 


I don't know offhand how you would generate a uniform distribution. A Ushape can be obtained using a mixture of two normals with means sufficiently apart. Nonnormals can also be generated using the new skewt techniques discussed in the paper on our website: Asparouhov, T. & Muthén B. (2014). Structural equation models and mixture models with continuous nonnormal skewed distributions. Web Note 19. Version 2. Forthcoming in Structural Equation Modeling. But, I wouldn't think your power results really are very strongly dependent on such deviations from a normal covariate. 


Thanks. I'll look at #19, and also think about just planning power with normal covariate. 


I am going to assume normally distributed covariates as you suggested earlier. It keeps things much simpler. I am now wondering about the power for testing a hypothesis of no effect. I am using a path model with 3 tests regressed on a single covariate. The tests are correlated because they all measure the same capability. The covariate captures the predominance of language#1 or language#2 in bilingual hoseholds. Two of the tests are expected to be slightly biased by the match between the test's language and the dominant language in the household. I have chosen effect sizes of interest and used them in my data generation model, and can easily determine power from the rightmost column in the monte carlo output. However I used a near zero effect for the third test because it is supposed to be immune to household language dominance. So I really want to know the power that this coefficient is zero. In an ordinary path model I could use MODEL TEST, but I am not sure how to do this in the monte carlo model. I considered using MODEL CONSTRAINT to make two new parameters that expressed the difference of the 3rd test with each of the others  but this is almost the same as the power testing I was already doing. If I pick a deviation from zero that is of clinical interest, how can I estimate the power to detect that the regression of test#3 on the covariate is less than that deviation? Is there any relevant example? 


If the true coefficient is zero, then the power column gives you the Type I error rate. Model Test is not available with Monte Carlo. You may want to ask this general Monte Carlo question on SEMNET. 


I have run a power analysis for a single parameter with the Monte Carlo simulation and also by estimating the noncentrality parameter as described at http://statmodel.com/power. I'm getting a higher power estimate (.98 vs. .85) in the Monte Carlo analysis and I'm just wondering if this is to be expected, or if I have done something wrong? 


Please disregard my previous post; I answered my own question (I did something wrong)! 


Is there a way to save the true factor score values for generated cases using Mplus' monte carlo utility? 


Factor scores cannot be saved with Monte Carlo. 


Dear Dr. Muthen, In a two level Monte Carlo study with categorical outcomes (WLSM estimator), how can I get WRMR fit indices in the output? When I generate and analyze the data in the same input file, the output only provides ChiSquare and RMSEA. However, if the data first generated, then analyzed by using generated data sets, the output gives, ChiSquare, RMSEA, CFI, TLI, SRMRW and SRMRB, but does not give WRMR. My first question: How can I get WRMR in two level SEM using WLSM? Second: Is there any way to see all fit indices in the output file when using just one input file to simulate and analyze all replication? Thank you. 


1. WRMR has not been sufficiently studied for twolevel. 2. These are not available due to an output glitch. 

QianLi Xue posted on Thursday, May 21, 2015  9:46 am



On page 803 of User's Guide, there is this sentence about the SAVE command within the Monte Carlo Statement: "The variables are not always saved in the order that they appear in the NAMES statement." If there are no clear rules on how the variables simulated in the Monte Carlo are stored, how to tell which is which in the output dataset? 


This information is given at the end of the output where you generate and save the data sets. 


Hi Dr Muthén, I would like to know if there is some package on Mplus on how to generate nonnormal variables from an implied variance covariance matrix of a SEM model. Thank you in advance! Regards 

Hasina posted on Monday, August 24, 2015  7:01 am



Following the previous message on Monte Carlo Simulation "how to generate nonnormal variables from an implied variance covariance matrix of a SEM model. ", I would like to know if there is a package on Mplus for categorical variables. Many thanks!! 


Mplus generates nonnormal data using either mixtures of normals or using Distribution = t, skewnormal, skewt. There is not a way to generate according to a covariance matrix. Instead the model's relationships between the variables generates the data. Mplus also generates data on categorical variables. 


Hi, a colleague of mine has analyzed a longitudinal (2 time point) dataset (Long Format) with missing data using generalized linear model (ML estimator). A reviewer is asking to compute the observed power associated with the interaction between Time and Conditions (3 groups). I thought of running a monte carlo simulation. Would you suggest an example with long format dataset and repeated measures? Thank you 


See the Monte Carlo counterpart to Example 9.16. You can find this on the website. It is also downloaded when Mplus is installed. 

Tor Neilands posted on Thursday, February 25, 2016  8:09 am



I would like to simulate data to perform a power analysis for a multiple linear regression model with 7 x variables included as main effects. I will also need to simulate the interactions of the first 3 x variables with the 7th to obtain the power for investigating moderation (10 predictors total in the model). In 2007, someone asked above about a similar situation and Linda noted the data would need to be generated outside of Mplus. However, I'm wondering if one could now use some of the newer features as demonstrated in, e.g., Ex 3.18, where data with interaction are generated via TYPE = RANDOM? If so, can anyone point me to a worked example and also an explanation of how using TYPE = RANDOM to generate the data yields an interaction variable? I am trying to understand not only how to do the simulation, but also how it works. Thanks.  Tor Neilands 


Such simulations can use a random slope approach. I will send you a pdf that describes this. 


Hi: I have a question about using Monte Carlo studies to determine sample sizes for ESEMs: Do the criteria recommended in the Muthén & Muthén 2002 article (pp 605606, e.g., biases < 10%) apply to the EFA portion of ESEMs? I ask because I have run a set of Monte Carlo studies to determine the sample size for an ESEM that has an EFA component in which 60 items are allowed to freely load onto 6 factors. For the EFA portion of the model, I have not been able to find sample sizes that produce biases within the recommended ranges. I have had success with the CFA and structural portions of the model. Many thanks. 


Yes I think so. EFA is not always easy to handle in Monte Carlo simulations. Check out our FAQ: Lambda is not compatible with the notion of simplicity of the rotation criterion 


Ah! That explains why some pattern coefficient biases were getting worse with increasing observation numbers. Many thanks. 

sfhellman posted on Friday, August 26, 2016  1:29 pm



I am trying to simulate a crosslagged model in order to determine sample size needed and power where I expect group to moderate the crosslagged relations. How can I get a chisquare difference test between a model where the paths are constrained to be equal across groups versus a model in which the bidirectional paths are allowed to vary across groups? Sample syntax: MODEL MONTECARLO: [CINT1@7 CINT2@8 CINT3@9]; [PDEP1@10 PDEP2@12 PDEP3@14]; CINT1@1 CINT2@.4 CINT3@.4; PDEP1@1 PDEP2@.4 PDEP3@.4; CINT3 on CINT2*.5 PDEP2*.2; PDEP3 on PDEP2*.5 CINT2*.2; CINT2 on CINT1*.5 PDEP1*.2; PDEP2 on PDEP1*.5 CINT1*.2; CINT3 WITH PDEP3*.3; CINT2 WITH PDEP2*.3; CINT1 WITH PDEP1*.3; MODEL: [same as above] MODEL G2: CINT3 on CINT2*.5 PDEP2*.01; PDEP3 on PDEP2*.5 CINT2*.01; CINT2 on CINT1*.5 PDEP1*.01; PDEP2 on PDEP1*.5 CINT1*.01; CINT3 WITH PDEP3*.01; CINT2 WITH PDEP2*.01; CINT1 WITH PDEP1*.01; OUTPUT: TECH9; 


You can get the power for each constraint by using Model Constraint to express each group difference using parameter labels given in the Model command. 


Is it possible to generate data using the model population command and then estimate two models instead one just one? I want to use the first model as a reference model and then test the second (nested) model's fit compared to the first model using a chisquare test? I found out that you can do this by generating and saving multiple datasets and then estimating the models sequentially and then testing the decrease in fit but it would be nice if there was some way this could be done within the monte carlo functionality as this is much faster 


This cannot be done in one step. 


Hi Linda Thank you for your quick response. 

Eunsoo Lee posted on Sunday, November 13, 2016  9:26 pm



Hi Linda, I'm trying to simulate 3level growth model using "TYPE = threelevel random". This is the equation: Level 1 : Ytij = π0ij + π1ij*TIME + etij Level 2: π0ij = b00j + b01j*Xij + r0ij π1ij= b10j + b11j*Xij + r1ij Level 3: b00j=r000+r001*Zj+u00j b10j=r100+r101*Zj+u10j And I want to generate data with "long format". school student time x z 1 1 0 2.4 3.6 1 1 1 2.4 3.6 1 1 2 2.4 3.6 1 2 0 2.1 3.6 1 2 1 2.1 3.6 1 2 2 2.1 3.6 ... 2 1 0 2.8 4.0 2 1 1 2.8 4.0 However, I cannot find Monte Carlo counterpart to Example 9.16. Could you recommend any paper which explains generating longformat time data? Thank you!  Eunsoo 


Just use the s  approach for the slope on time. And let s vary on both the second and third level. 

Eunsoo Lee posted on Tuesday, November 15, 2016  9:52 am



Thank you for your response! 

Back to top 