95% coverage is the proportion of replications for which the 95% confidence intervale contains the true parameter value.
If you are using mixture Monte Carlo, you can save the results from each replication. If you are doing regular Monte Carlo, you cannot. If you generate data outside of Mplus, you can save all of the results. A Monte Carlo utility is available on the website for this purpose.
Anonymous posted on Sunday, March 04, 2001 - 11:13 pm
My simulations were made for testing general SEM (continuous data) and WLSMV (ordinal data) models. Can I get the simulated dataset for each replication ?
In a mplus Monte Carlo input file, mplus only needs the true mean and covariance matrices. How can it (Mplus) know what are the true parameter values of a specific parameter structure of a SEM model?
No, you can save data from only the first replication in Mplus MONTECARLO. If, however, you generate data outside of Mplus, we provide a Monte Carlo utility on the Mplus website to aid in running the Monte Carlo simulation and saving results for further analysis.
For the estimator coverage statistics to be correct, Mplus picks up the true population values of the parameters from the starting values of each parameter in the MODEL command.
I am trying to follow the procedures listed in Muthen & Muthen, 2002, concerning monte carlo for power estimation.
In that paper (p. 603), you desribe 3 steps for generating non-normal data 1 - generate data for 2 classes; 2 - run analysis w/ 1 replication; 3 - solve resid variances & use these as pop values for data generation.
Can you clarify how many files are needed to accomplish this?
For example, do I create data using 1 input file (using save command), analyze that saved data using a second input file (with replication = 1), and then use results of 2nd input to establish correct population values (for the aggregate sample) that are used as start values in the 'MODEL' portion of the first input file?
If this is correct, I'm assuming that only the last of these 3 files was listed in the appendix of that paper (and is avail for download on your website)?
Thanks for any clarification you might provide.
bmuthen posted on Friday, January 07, 2005 - 5:53 pm
The 3 steps listed are used as initial steps based on which the Monte Carlo input file (for many replications) shown in the paper is done.
The 3 initial steps can be done via a single input file that looks much like the one in the paper, but uses the large sample mentioned and only 1 replication. You save the data and check the skewness and kurtosis, etc.
I am interested in testing the effects of construct reliability, R-squared and sample size on the structural coefficients (2 parameters).
I have finished my runs with the various test condistions and stored my results. However to run my ANOVAS I just need the values of only the two parameters I am interested in from my monte-carlo results file.
Is there a way I can specify the results command so that only these two parameters are stored?
I am running Monte- Carlo analysis on a SEM. I am interested in testing the effects of construct reliability, R-squared and sample size on the structural coefficients (2 parameters).
I am fixing R-squared of my regression equations by fixing the residual variance of the dependent variable. i.e., If my dependent construct say is f1, is it alright to specify email@example.com in my model population, to fix the r-squared of the equation to 0.7?
I'm trying to run a Monte Carlo study to generate data for use in replicating my dissertation model which was under-powered. I've run into a glitch in that my model has a grouping variable (Clinic Yes/No) and I haven't been able to figure out the syntax for generating this variable under the Monte Carlo study. Any help you can give me is greatly appreciated! Kim
I now have another issue. I have saved the parameter estimates from a previous analysis to use as the population values in the MC simulation. However, I'm getting an error message that there's insufficient data in the POPULATION file. What would cause this error? Thanks!
This afternoon I was working with an Mplus MC input and found the following issue quite puzzling. I always thought that the MODEL MONTECARLO statement is the one used to specify the true population values for a model, while the MODEL option is used to specify the model to be estimated in the 500 or so MC replications (as described in the Muthén & Muthén 2002 SEM paper). However, what I found today was that Mplus actually seems to interpret the values provided in the MODEL statement as population values (when I provided different values following an asterisk (*) which I thought would be taken only as starting values under the MODEL statement, the population values in the output changed accordingly). On the other hand, population values in the output did not change when I changed the values following an asterisk in the MODEL MONTECARLO command. Am I missing something?
The use of the values in MODEL MONTECARLO and MODEL are described in the first Monte Carlo example in Chapter 11. MODEL POPULATION provides the values for data generation. MODEL provides the values that are used as both the true population values for computing coverage and as starting values. The reason for this is so the models for data generation and analyis can be different.
Hi - I am conducting a path analysis with count variables and receive an error message stating that I must use Monte Carlo integration in the analysis. Why is this a default in the program? See below for syntax and error message.
Mplus VERSION 3.12 MUTHEN & MUTHEN 05/16/2007 1:37 PM
MISSING are all (999); USEVARIABLES ARE a5 h9 h17 recgende lngaccul relginf2 d1 needdepr forsrv instlng emolng;
Monte Carol integration is required when the number of dimensions of integration vary for individuals due to missing data. I recommend upgrading from Version 3.12 to the most recent version of Mplus. There have been many changes and improvements since Version 3.12.
Thank you for the information. Is this due to the count nature of my mediatior and outcome variables? Is the FIML defualt turned off so to speak with use of Monte Carlo simulation in addressing the missing data?
Thanks for the information as I am novice Mplus user.
Hello, I am new to Mplus, and I am using the demo version in attempts to determine the sample size and power that would be necessary for my dissertation research. My hypothesized model posits mediated-moderation, and there are 6 continuous variables involved (4 predictors (2 pairs of interacting variables) and two outcomes (one is a proposed mediator, the other the outcome).
Which would be the best monte carlo simulation study to run using the demo version?
I'm running a Monte Carlo study to examine sample size requirements. I have one continuous variable that is an interaction between X1 and X2. The define command does not seem to be available for monte carlo. (ie: if I were to create Z1 = X1*X2)
Does it matter in the calculation of sample size if the variable Z1 is not defined as a function of X1*X2? If it doesn't then I assume I would just add Z1 to the names are command.
In this situation, you would need to generate the data outside of Mplus and use external Monte Carlo in Mplus to analyze the data.
Jon Elhai posted on Monday, March 10, 2008 - 3:46 pm
Drs. Muthen, In a Monte Carlo analysis, where I'm trying to estimate observed power after having conducted a confirmatory factor analysis... I'm wondering: 1) When freeing parameters with an asterisk, I assume the numbers I insert after an asterisk are the parameter estimates I obtained in my CFA? 2) If so, is this the standardized estimate, such as from the STDYX column in my CFA output? 3) If most of my Monte Carlo output’s estimates in the % Signif Coeff column are 1.000, could I have done something wrong, or is this possible?
The values placed after an asterisk or @ symbol in the MODEL POPULATION command are used a population values for data generation. The values placed behind an asterisk in the MODEL command are used to compute coverage and as starting values. It sounds like you do not have values in the MODEL command. See Example 11.1.
Hi, I am trying to run a monte carlo simulation to determine the power needed for inclusion in our grant proposal. My model is a path analysis with 3 x's, a mediator (y1), and a categorical outcome (u1). I have read through the manual on monte carlo simulations and am using example 3.14 as a guide as well, but I have a couple of questions.
1) Am I correct in assuming that the path estimates to u1 are in logits (if I am using ML)?
2) If I am using ESTIMATOR = ML, do I have to estimate a residual variance for u1? Isn't it zero?
3) Is it possible to simulate patterns of missingness when using the ML estimator?
Thank you for your help. I have now gotten the monte carlo simulation to run with missing patterns, although I had to add MONTECARLO = integration to get it to run. When looking at examples for this language I saw that those examples (e.g., 3.17) used ESTIMATOR = MLR. However, my model also runs with ESTIMATOR ML and I can't see what the difference is. My model is a path analysis with 3 x's, a mediator (y1), and a categorical outcome (u1). Is one of these estimators better than the other for my simulation?
The parameter estimates are the same for ML and MLR. The standard errors and fit statistics are different. ML has conventional standard errors and fit statistics. MLR has standard errors and fit statistics that are robust to non-normality. Our default in most cases is MLR.
In the results of my monte carlo simulation using MLR, there is a log likelihood estimated across the replications for H0, but not one for H1. How do I assess model fit? Since I am doing this mainly for a power analysis of the paths I am not sure what the comparison model would be - where all of the paths are estimated at zero?
With a combination of a continuous mediator and a categorical distal outcome plus ML estimation, there is no model fit assessment. This is because there is no relevant H1 model with an unrestricted covariance matrix. Such H1 models are relevant only for continuous outcomes.
Hi. I am trying to generate binary data. For the further use of the generated data in IRT programs, I want to format the data without decimals and spaces. Monte carlo command does not have format subcommand. Is there any way to format the data?
I am new to Mplus. I'm trying to run Monte Carlo now. After running, I saved the results in the text file. However, I don't know the order of results. The only guiding message I can see in the output file is "Parameter estimates (saved in order shown in Technical 1 output)".
Can you tell me where I can get 'Technical 1 output', Please? How about technical 5 output?
I have a question regarding the assessment of power with a Monte Carlo simulation.
Suppose in the Model Population command I fix a path of F2 on F1@.60 and do the same in the Model command. The % Sig Coeff is the estimate of power and even with a large sample size it is showing at 0. Is this because the parameter is fixed in the Model command?
Suppose when specifying a CFA model in the Model Population Command you have 3 indicators (x1-x3) of Factor 1. If you set the residual variances of all three to .51, for example X2@.51 shouldn't it automatically set the loading of Lambda11, Lambda12, and Lambda 13 at a standardized value of .70? Given that 1 - (.70^2) = .51? Currently it seems to require you to provide the loading.
To follow-up on the question regarding the Model Population command in a Monte Carlo simulation, I have 2 questions:
(1) If I want to set a latent factor with 3 indicators to have an average r-square of 60%, then should I set the population parameter values for the unstandardized residual variance of all three items at .40 (eg. x1@.40x2@.40 and x3@.40) in Model Population .
(2) If that is correct? Do I have to hand calculate what the unstandardized factor loading for each observed variable (item) would be or can I set the variance for the factor at 1 (F1@1) and then set the factor loading to start somewhere say F1 by x1-x3*.75 assuming that the program will calculate the proper loading based on the residual variance being set at .40 and the factor variance being set at 1.
At the end of the day, I want to be able to simulate data with different factors that have differing r-square values.
Am I reading this correctly from the manual page 346-347 regarding external Monte Carlo.
If data is generated using Mplus and saved as replist.dat, then used in the Type = Monte Carlo, the population values in the output come from the Model command (in the external Monte Carlo) and the average values come from the replist.dat. Is that correct? In this sense one could mispecify the model in the Model command and compare the results of population to average.
I am running a simulation to look at the bias due to attenuation when averaging items. So I have 2 factors with 5 indicators each. In the population I have modelled them to correlate at .40. I average the using the Mean command in Define. Then when I run the external montecarlo (which I need because I am using the Define command) I put the original value (.40) in the Model: command F1 with F2@.40. In tech9 the average values will be those using the means of F1 and F2 comparing them against the population values. Is there any problems in reading the output that way when using the Define command? It seems right to me.
Not sure I understand the intent here. Here are a couple of observations which may be of use:
If the indicators are correlated 0.40, their averages won't correlate 0.40. I don't know why F1 with F2 is fixed at .40 in the MODEL command. Typically, the MODEL command would use *. You mention average values using the means of F1 and F2 comparing them against pop values - why would that be of interest? I thought you were getting at the attenuated correlation between means of indicators as compared to factor correlation.
Sorry I wasn't very clear. What I have done is create two factors each with 5 indicators. In the population I have modelled them to correlate at .40 with varying sample sizes and 1000 reps. I want to examine the attenuation due to measurement error when you average the items in the factor to create one observed variable (call it OV) for each.
In the external Monte Carlo, I use the define command to create the averages so factor 1 becomes OV1 = Mean(x1-x5) and factor 2 becomes OV2 = Mean(x6-x10). I put the original values in the Model command OV1 with OV2*.40
In tech9 I am assuming that the average values in the columns will be those using the means of OV1 and OV2 comparing them against the population values. Is there any problems in reading the output that way when using the Define command? It seems right to me.
On the same topic, but different scenario, in the model command, is there anyway of specifying F1@(1-.65)*Var(x) without specifying what the value Var(x) is? This would be needed if you are correcting for attenuation and are running external Monte Carlo with Type = Monte Carlo.
The average values in the output give you the average over the replications of the estimate for whatever parameter is printed. So for example the average estimate of OV1 WITH OV2. And also for the mean parameters of OV1 and OV2 which it sounds like you refer to.
The answer to your V(x) question is no. Perhaps this can instead be done via Model Constraint. Perhaps using the Constraint = option in the VARIABLE command.
That works great. I've never used this function before, but its great.
One question. In example 5.20 and the discussion of it, it states that you specify "standardization" in the output command for the R-sqr values. But when I use the model constraint command it states that:
"STANDARDIZED (STD, STDY, STDYX) options are not available when specific constraints are used in MODEL CONSTRAINT."
One other question. Does using the model constraint command with the New function change the analysis model at all?
Is there a way to save covariance matrices and/or correlation matrices for each replication (in a separate file) in a Monte Carlo run. For example, instead of generating the raw data for each replication it generates the matrix for each replication. - Thanks,
Hi, I have a quick question concerning the PROCESSORS command with Monte-Carlo-Studies. I'm currently running a Monte-Carlo analysis on different computers using the TYPE=MONTECARLO subcommand under DATA:. I also used the PROCESSORS command in the ANALYSIS section. Now my problem is: while running these analyses on the 32-bit version it actually uses multiple processors. However, when I run the same inputs on the 64-bit version and I check the task-manager I can see that the procedures simply skip from processor to processor and the CPU-usage always hovers around 25% (on a computer with 4 processors).
Am I missing something? I tried it with and without the (STARTS) subcommand but the result is always the same.
Hello Drs. Muthen, I am using Mplus to generate and analyze data for a Monte Carlo study, but I would also like to import said datasets into SAS for additional analyses. Given that I am using NREPS = 10000, I would like to automate this process using a macro. Unfortunately, I have been unable to figure out a way to import the dat files produced by MPlus using a PROC IMPORT statement. I realize that this is primarily a SAS-related issue, however I was curious if there was any way to modify the format or type of output datafile produced by the Monte Carlo option in Mplus. Thanks
If the variance of the variable is one, then a residual variance of .7 reflects an R-square of .3.
You specify a variance or residual variance depending on the parameter that is estimated. For example, in a conditional model, a residual variance is specified. In an unconditional model, a variance is specified.
is that correct? and to calculate this from a normal Mplus output I take the unstandardized values?
then I have another question belonging to the input parameters I specify in the monte carlo command. Are the parameter values standardized or unstandardized? I think they are unstandardized. is that correct?
I conducted the MC analysis and now I am a bit confused about the output. in the parameter specification and the population values the path coefficients are reported under the headline beta. 1)Are those standardized or unstandardized values? 2)Or are they both, depending what I specify as variances in the population and model part of the input.
Hello, I'm wondering whether Mplus has a way of exporting information on non-positive definite theta matrices in the save data command. I have successfully saved out the results for each replication but I would like to also identify which ones are not positive definite. Thanks, Ginger
Thanks Linda; I'm wondering, though, if this information can be included in the data file produced in the save data command, (for example, a binary term indicating whether the theta matrix for each replication was non-positive definite).
There is no way to request this if we don't save it with the other results and I don't think we do.
ywang posted on Thursday, September 16, 2010 - 12:44 pm
To calculate the required sample size for grant proposals. Mplus needs specific information such as the mean and variance of slope and intercept of growth in the Monte Carlo study, but where can we get the information from the literature? The related paper usually has only part of the information, but not all information. If we do not have complete information, what can we do?
You need plausible population parameter values for all parmaeters in the model. If they are not provided by theory, you can estimate a model using your data and use these values as population parameter values.
Hi, This might be a stupid question (sorry for that) but I cant seem to find the answer (maybe due to a cold I am stuck with ). When generating data under "Monte Carlo Population", what is the impact on the data that is generated of using @(fix) or *(start) when providing population model values ? Thnak you very much !
Hi, I would like to generate a series of data set using the monte carlo facility (lets say 500) and then to analyse them via the EXTERNAL facility. Why? because I want to run different types of models on the same data sets (and to save the time required to generate the data each time). For instance, I want to generate a set of data that fits a CFA model with cross loadings and see how to best recover the population parameters (ESEM, CFA,PVs, etc.). If there a way to still specify the population parameters somewhere to get the coverage and other Monte Carlo indices? Thanks
Hi, I would like to generate some simulated data sets to compare conventional regression analysis with SEM. When creating the factor structure underlying simulated data, in the past I have typically just set the variance of the factors equal to 1. As I am interested in comparing unstandardized coefficients in this particular set of simulations, I need to set the metric of the factors another way (otherwise my factors are standardized and so the regression coefficient relating them is also standardized and I am interested in unstandardized coefficients for this particular project). Am I correct that all I need to do as an alternative is set one of the factor loadings for each factor equal to one as in the syntax below? Model Population: f1 BY y1@1 y2-y4*.707; f2 BY y5@1 y6-y8*.707; f1 on f2*.5; y1-y8*1; Or are there any other constraints I need to add to the model? Many thanks!
I must be doing something wrong but can't figure out what. Unless I specify the variance of the factors in addition to the factor loadings in my model population statement, I run into a problem such that only a relatively small subset of the replications that I request are actually completed. As long as I also specify the variance of the factors, then the number of replications I request is the number completed.
You have to give population values for the factor variances even if you don't fix them, so e.g.
Erika Wolf posted on Tuesday, December 07, 2010 - 2:23 pm
Can you provide more detail on using the Monte Carlo approach to generate non-normal data for power analyses? I have read your 2002 paper on this but I am not clear on how to determine the appropriate population values to achieve the desired level of non-normality in the generated data. In a message on this string dated 1/7/05, you state that the initial 3 steps in this process can be completed in 1 input file. Can you provide an example of that script? Thanks.
The approach we used in the paper was using mixture modeling with trial and error. I think we describe this in the paper and the inputs are part of the paper and/or on the website.
Erika Wolf posted on Monday, December 13, 2010 - 2:22 pm
I think there is only script available for the last step (the actual power analysis) in the paper. I'm trying to determine how you arrived at the values that you specified in the script.
As I understand, you started with data generation for a 2 class model with .8 factor loadings, indicator error of .36 and factor intercorrelation of .25 for both classes. (a) Is this correct?
The paper describes 3 steps before getting to the script provided in the paper. Step 1: Generate data for 10,000 cases with 2 latent classes (which will eventually be analyzed as 1). When you generate non-normal data by having 2 latent classes, (b) are the only differences that you specify between the 2 classes that the factor 2 mean is set to 15 and the factor 2 variance is set to 5 in Class 1? Are all other parameters (factor loadings, indicator error, factor correlation) the same in the overall and class-specific models?
I generated data for 10,000 cases with 2 classes (12% and 88%), each with .8 loadings, .36 indicator error, .25 factor correlation, and class 1 mean and variance of 15 and 5, respectively. But when I pulled the saved data into SPSS, the skewness and kurtosis values for the factor 2 indicators differed from what you report in the 2002 paper. I'm trying to replicate what you did before I move on to running the power analysis I need to run for my paper. Thanks!
The two inputs for generating non-normal data are shown on the examples page under the headings:
CFA model with non-normal continuous factor indicators without missing data.
CFA model with non-normal continuous factor indicators with missing data.
The trick is to generate using two classes and analyze using one class. All of the steps are contained in the inputs. If you run those inputs, you should get the same results as in the paper.
Yo In'nami posted on Friday, December 17, 2010 - 6:01 am
Dear Muthen and Muthen,
Perusing the Muthen and Muthen (2002) and relevant materials using this method (the Mplus user's guide; Thoemmesa et al., 2010), I am planning to conduct a post-hoc power analysis on a variety of models found in the field of language education. Since both sources illustrate many examples of input commands, I will use these examples as a starting guide. Is it correct to assume that power of any model can be calculated as long as the model can be programmed into Mplus? Or, in other words, is there any model whose power cannot be calculated using Mplus?
Muthen, L. K., & Muthen, B. O. (2002). How to use a monte carlo study to decide on sample size and determine power. Structural Equation Modeling, 9, 599-620.
Muthen, L. K., & Muthen, B. O. (2007). Chapter 11: Monte Carlo simulation studies. Mpus user's manual 5th edition.
Thoemmesa, F., MacKinnon, D. P., & Reiser, M. R. (2010). Power analysis for complex mediational designs using Monte Carlo methods. Structural Equation Modeling, 17, 510-534.
You can use this approach on any model that can be specified using the MONTECARLO command. Note that the power is for one parameter of the model not the entire model.
Yo In'nami posted on Friday, December 17, 2010 - 9:13 am
Thank you very much for a very useful comment! I have been considering this question for the past several months. Now I can go ahead with my work.
Yo In'nami posted on Wednesday, April 13, 2011 - 7:19 am
Muthen and Muthen (2002) explain that a Monte Carlo output labeled "% Sig Coeff" refers to power and this shows the proportion of replications for which the null hypothesis that a parameter is equal to zero is rejected for each parameter at the .05. I have conducted several post-hoc power analyses on published models by specifying the MODEL POPULATION and the MODEL commands to be identical. If parameter estimates in published models are reported to be statistically significant and these estimates are specified in both the MODEL POPULATION and the MODEL commands, will the "% Sig Coeff" of these parameter estimates be always over .80? If so, conducting a post-hoc (not a priori) power analysis just seems to be establishing/reconfirming what is already apparent--that there was sufficient power to detect statistical significance of parameters. In other words, is it correct to say that there is no need to conduct "post-hoc" power analyses if parameters of interest have been already reported to be statistically significant?
Muthen, L. K., & Muthen, B. O. (2002). How to use a monte carlo study to decide on sample size and determine power. Structural Equation Modeling, 9, 599-620.
Muthen, L. K., & Muthen, B. O. (2007). Chapter 11: Monte Carlo simulation studies. Mpus user's manual 5th edition.
I would think this type of power analysis is usually done in the planning of a study to determine the necessary sample size or perhaps after to see if non-significance is due to lack of power. However, even if you find significance, you don't know with what power you find it. Perhaps your power was .3. You may have been lucky to find significance in your sample but may not be so lucky in another sample of the same size.
I am running three sets of Monte Carlo simulation studies for my dissertations: one for my measurement model, one for a follow-up structural portion and another one for a parallel growth curve process. I could make the CFA run but I do not see the commands for a parallel growth curve process in the Mplus guide. Any advice on how I can do this will be appreciated. Thanks.
we're currently preparing a rather large simulation study and are desperately looking for ways to speed the process up. We had the idea that, while we wan't to have Chi-Square, SRMR, and RMSEA we're not really interested in CFI/TLI. Is there any way to switch off the baseline model estimation while keeping the H1 model estimation?
There is no way to do this. If you send your input and license number to firstname.lastname@example.org, we will see if we can make other suggestions to speed things up.
Yo In'nami posted on Saturday, May 21, 2011 - 2:33 am
I am using a Monte Carlo power analysis and received an error message that the model is unidentified although it is identified with other SEM programs. The degrees of freedom are 17 in Mplus but 19 in other programs. I have been unsuccessful rectifying the syntax. I am grateful for your generous help!
MONTECARLO: NAMES ARE X1-X8; NOBSERVATIONS = 259; ! SAMPLE SIZE OF INTEREST NREPS = 10000; SEED = 53567; MODEL POPULATION: IAVLE BY X1*.89 X2*.69 X3*-.75; SRC BY X4*.81 X5*.83 X6*.71 X7*.79 X8*.43; SRC ON IAVLE*.67; X1@1X4@1; X1*.21; X2*.52; X3*.44; X4*.34; X5*.31; X6*.50; X7*.38; X8*.82; SRC*.55; MODEL: (Note. Exactly the same as the Model Population above and thus omitted to shorten the message); ANALYSIS: ESTIMATOR = ML; OUTPUT: TECH9;
mpduser1 posted on Friday, June 24, 2011 - 2:04 pm
Okay, I guess I was thrown by the fact that Mplus doesn't report by the "population" parameter values, such that the "% Sig. Coeff" column provides the parameter specific power estimates, unless MODEL COVERAGE is specified.
So, to get the estimated power figures directly, I have to specify my population model twice, once in MODEL POPULATION and once again in MODEL COVERAGE. Is this correct?
Hi, I just ran a fairly simple path model and tried to save the estimates to use for a subsequent MC simulation. I am getting the following error: "Saving of ending values for the ESTIMATES option is not available for models with covariates. Request for ESTIMATES is ignored."
Ask for TECH9 in the OUTPUT command and you will see which replication had the problem.
elementary posted on Friday, October 28, 2011 - 2:48 am
Hello, I want to run a power analysis using the Monte-Carlo Approach for a planned survey using a two-stage sampling procedure.
As my hypotheses are on level-1 only, I do not plan multilevel-analyses, but rather want to run a "normal" SEM, adjusting standard errors with cluster/type = complex.
I tried to model this using the Mplus montecarlo option as described in the CFA-textbook of Brown (2006, p. 420ff)(To get started, at the moment I ignore the mean structure and the multiple group design, which have to be added in a later stage). When assuming that data are not nested, everything worked.
However, I did not manage modeling the clustering as well. Type = complex seems not to be available within montecarlo. And using NCSIZES and CSIZES is obviously available only with type=twolevel, which does not seem to be applicable in this case.
Thus, I thought of finally correcting the sample size I get under the assumption of a random sample, using a correction formula for effective sample size, (cf. the Multilevel-textbook of Hox, 2002, p.5). Is this reasonable and are there alternatives to that?
elementary posted on Wednesday, January 25, 2012 - 6:20 am
Thanks for your quick reply to our posting from October 28 regarding (external) montecarlo! We first tried to understand the examples (which took us some time). For this purpose, we reran example 9.12 and subsequently tried to identify the numbers in the output that have been used as input for example 12.6. Unfortunately, we often were not able to identify them. For example, in the unstandardized output for example 9.12, the residual variance for sw was 0.473 and for sb it was 0.214, while in example 12.6 input, it was entered as 0.2 for sw and 0.1 for sb . These differences generalize to other parameters. Do you have any suggestions on that? That might help us to locate possible further errors we made in subsequent steps of our analyses. Thanks!
Please send an output that shows your question along with your license number to email@example.com. Specify exactly where in the output the numbers you refer to can be found.
Lois Downey posted on Thursday, February 09, 2012 - 9:26 pm
Chapter 12 of the User's Manual includes a discussion of Monte Carlo output related to the chi-square test of model fit (pp. 362-3). In the example, the critical value of chi-square at the 0.05 level was exceeded in 0.058 of the replications. The discussion indicates that 0.058 is close to the expected value of 0.05, thus indicating that the chi-square distribution is well approximated. I'm not clear on how much greater than .05 the value can be, and still be acceptable. What would you use as an upper limit before concluding that the chi-square distribution is NOT well approximated?
Dear MPlus We are studying DIF under small sample conditions to determine the degree of bias in parameter and se values. We are using a MIMIC model for DIF detection. In the simulation we have varied the sample size (400 to 1000) and expected the bias values to decrease as sample size increased which was not the case. Consequently, we are concerned that we may have misspecified our simulation model. We have split the population as being either high=2 or low=1 motivation (r_att_1). The factor Improvement (Improv) has 6 items each of which has ordered categorical responses (6 options, with 5 thresholds). We appreciate your comments as to whether the parameter values (esp. the residual variances) have been specified correctly. MODEL POPULATION: [r_att_1 @1.5]; ! mean of the IV r_att_1 @.25 ; ! variance of the IV Improv BY i1@1 i2-i6*.6; Improv ON r_att_1*.8; i1 ON r_att_1*.24; ! i6 not included i2 ON r_att_1*.24; i3 ON r_att_1*.24; i4 ON r_att_1*.24; i5 ON r_att_1*.24; i1-i6 *.76; Improv*.6; The threshold values for all items are: ! thresholds [i1$1*-.5]; [i1$2*0]; [i1$3*.5]; [i1$4*1]; [i1$5*1.5];
If you use the WLSMV estimator you have to make sure that the population values of the variances of the DVs conditional on the IVs are 1 in order to get the parameters in the metric that WLSMV uses. Your DVs are the i variables and your IV is r_att.
If this doesn't help, send complete output to Support.
Hi Bengt. we checked with documentation as to how to calculate the variance for the i variables and found this: residual variance = 1-R2. Since each i variable has 2 predictors (r_att_1 and Improv) we have squared the beta values and then summed them before subtracting from 1 to get the amount for the error variance. In the example provided, e have beta paths of .6 (Improv) and .24 (r_att_1) so the R2= .36 and .0576 respectively. When summed we get R2=.4176 Subtract from 1=.5824 which would lead to i1-i6 *.58; is that right? thanks for clarifying
That would only be correct if Improv and r_att_1 are uncorrelated, but r_att_1 influences Improv. Instead, express the i variables in terms of r_att_1, so Improv is no longer in the picture. That is, using the fact that i is a function of Improv which is a function of r_att_1 and a residual.
Bengt thanks. I agree that my calculation depends on the assumption of independence which is not the case in our model. However, your explanation still leaves me uncertain as to how to calculate an appropriate value for the i variable residual. Should we adjust the Improv beta weight (.6) by the beta weight from r_att_1 to Improv (.8) before determining residual variance? This is what I understand your comment means: (square(.6*.8))=.23 as the new effect of Improv and add it to .0576 the effect of r_att_1. This would give us a residual of .7124.
This seems awfully high to me as a residual but if it's right then we go with it. Please advise if I've understood correctly.
Hi Bengt Please check that we have understood this correctly from your instructions. The formula for e is e=1-(lam*beta+gamma)*x-(lam*d) lam=regression from f to I var = .60 beta=regression from x to Improv = .80 gamma=regression from x to I var = .24 x=variance of x =.25 d=residual of f = .36 Thus, e=1-(.6*.8+.24)*.25-(.60*.26)=.604 Please advise if we got this right Thanks
Dear Dr. Muthen, I have run a complex two wave model (df=800, N = 595) with a second order factor. The ratio of the sample size to estimated parameters is about 5. This is not consistent with rules of thumbs as given be Kline (1998). Would it be correct to use a monte carlo anaysis, such as example 12.7 to prove that estimates are "correct"? thanks Christoph Weber
burak aydin posted on Wednesday, September 19, 2012 - 5:10 pm
Hello, I generate the data externally. I use monte carlo facility to analyze them. I save both results and the output. I can see which iterations did not run, but I would like to automate this process. I would like to see an iteration indicator on the results file or a more efficient way to see failed iterations rather than visual check (ctrl+f "did not"). Is this possible with Mplus? or do you know any other way? Thank you very much.
The results file includes a replication number. If you ask for TECH9, you will see error messages related to each replication.
burak aydin posted on Thursday, September 20, 2012 - 12:59 pm
Hello Dr. Linda, We use version 6.11, the results file does not include a rep number. These are what it saves in our case
RESULTS SAVING INFORMATION
Order of data
Parameter estimates (saved in order shown in Technical 1 output) Standard errors (saved in order shown in Technical 1 output) Chi-square : Value Chi-square : Degrees of Freedom Chi-square : P-Value CFI TLI H0 Loglikelihood H0 Scaling Correction Factor for MLR H1 Loglikelihood H1 Scaling Correction Factor for MLR Number of Free Parameters Akaike (AIC) Bayesian (BIC) Sample-Size Adjusted BIC RMSEA : Estimate SRMR : Between level SRMR : Within level Condition Number
and yes I ask for tech9 and find which iteration did not terminate normally. I do it by cntrl+f "did not". What I need to accomplish is having j rows in the result file if I run j iterations. If iteration number i did not run, I want to insert -99s in row i. Thank you
Dear Dr. Muthen, I am using the Monte Carlo facility in MPLUS just to generate data but I am not fitting any model to the data. However, I get the following warning for all replications when I have a sample size of 100:
THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX.
Since I am not fitting any model to the data I am not sure how to interpret the warning. I wonder if it is a default warning when the number of parameters is larger than the sample size.
Ray Cheung posted on Thursday, January 10, 2013 - 5:43 pm
When I use TYPE=montecarlo and request tech9, there are condition codes in some of the replications. Is that a way to ask Mplus to report the average results ignoring those conditions with error code? Thank you
I am interested in saving the means and standard deviations for the raw data in each replication of a multi-group Monte Carlo analysis with 10 variables. I would like to be able to produce a matrix of this information for use in an external program. Is this currently possible at this time? Thank you.
This is not possible. You can save the data from each replication but not the means and standard deviations.
Geumju LEE posted on Sunday, March 03, 2013 - 9:02 pm
Hello. I¡¯m trying to simulate an unconstrained latent interaction model by Monte Carlo. First I generated the 1000 data sets of y1-y3, x1-x3, z1-z6. And the population values of main effects of X and Z are set to 0.4 both, the population correlation between X and Z is set to 0.3, and the population interaction effect is manipulated as 0.2. And then, I tried to analyze the 1000 data sets I generated. This is the part of output.
ESTIMATES S. E Population Average Std. Dev. Average X BY X1 1.000 1.0166 0.0685 0.0668 X2 1.000 0.8779 0.0616 0.0612 X3 1.000 0.8275 0.0594 0.0591
. . . . . . .
Y ON X 0.000 0.4077 0.1011 0.0951 Z 0.000 0.4077 0.0973 0.0940 XZ 0.000 0.2005 0.0955 0.0895
X WITH Z 0.000 0.2960 0.0734 0.0711
I don¡¯t understand what the column labeled ¡®POPULATION¡¯ means. I didn¡¯t set either 1.000 or 0.000. How did I get these values? Please explain the meaning of these. Thanks in advance.
The population values are taken from the MODEL command. They are used to compute coverage.
Geumju LEE posted on Wednesday, March 06, 2013 - 6:04 am
Thank you for your reply. However it couldn't answer my question. So I'm writing my question again with more detailed information.
Actually, the average values are similar with the values that I set in DATA GENERATION stage. (I conducted monte carlo simulation in separate stages of 'data generation' and 'analysis of unconstrained approach'.) I set from x1 to x3 as 1.02, 0.88, and 0.83 respectively when I generate data, and I got the AVERAGE values of the ANALYSIS output. The other values of indicators that I set in data generation are also similar with the average values.
Although I've set the POPULATION values neither 1s nor 0s in Model command, I've got 1s and 0s in unconstrained ANALYSIS output. I don't understand what 'POPULATION' means and how I got these. By any chance, aren't these values just for filling the blanks?
John Plake posted on Monday, July 29, 2013 - 9:39 am
I am running a monte carlo simulation for sample size and power (Muthen & Muthen, 2002) on a hypothesized four-factor CFA model with 12 indicators. When I run it according to the description in the literature, it works flawlesly. Parameter and SE bias along with power indicate that N > 44 should work.
However, when trying to extend that simulation to include a single second-order factor in place of the first-order correlations, the output gets wonky. Standard error bias is off the charts, even with N = 2,000. I'm sure I'm mis-specifying something, but I can't find any sample syntax for a second-order CFA in a montecarlo simulation.
The specific problem I'm having is in the theta matrix. Here is the syntax I'm using...
MODEL POPULATION: CPERF by CP_P1-CP_P3*0.8; PARTNER BY PA_P1-PA_P3*0.8; TPERF BY TP_P1-TP_P3*0.8; TWORK BY TW_P1-TW_P3*0.8; MJP BY CPERF-TWORK*.6; MJP@1; CP_P1-TW_P3*.36; CPERF-TWORK*.64;
MODEL: CPERF by CP_P1-CP_P3*0.8; PARTNER BY PA_P1-PA_P3*0.8; TPERF BY TP_P1-TP_P3*0.8; TWORK BY TW_P1-TW_P3*0.8; MJP BY CPERF-TWORK*.6; MJP@1; CP_P1-TW_P3*.36; CPERF-TWORK*.64;
It looks like all of the first-order factors have all factor loadings free. In this case, you must fix the factor variances to one.
John Plake posted on Monday, July 29, 2013 - 1:52 pm
Thanks, Linda! Somehow I forgot that you can't estimate both loadings and variances at the same time. [facepalm]
Wang Shan posted on Monday, October 28, 2013 - 8:59 pm
Hi Linda, I am doing a CFA Monte Carlo simulation study. And I have a question here. As we know,in the model population part, the values such as factor loadings are fixed to true values. But I'm not sure in the model part, I should give true values as the starting values for analysis or just use ordinary values such as 1£¬-1 and so on. If we use true values as the starting values, will it ifluence the estimated results ? So here is my syntax, I'm not sure which one to choose.
In the MODEL command you should give the population values. The values given in the MODEL command are the values that are used for coverage. See Example 12.1 where the MODEL POPULATION and MODEL command are described.
Wang Shan posted on Tuesday, October 29, 2013 - 8:40 pm
I have checked the user's guide. And find that I misunderstood the MODEL command before.
I am running a Monte Carlo simulation for a growth model with 10 time points (y1-y10) and a single dichotomous predictor (tx). I am attempting to use the MODEL MISSING command to generate a steady increase in missing data that ends in roughly 20% missing data by the final time point. However, in the section of the output that lists the summary of missing data for the first replication, a few of the missing data patterns appear to show close to 90% or 100% missing across all time points. Is this right, or have I completely misunderstood how to go about coding for missingness? Thanks for your help!
MODEL MISSING: [y1-y10@-15]; y1 on tx@0; y2 on tx*10.405; y3 on tx*11.108; y4 on tx*11.523; y5 on tx*11.821; y6 on tx*12.248; y7 on tx*12.5576; y8 on tx*13.0075; y9 on tx*13.3417; y10 on tx*13.613;
Either one factor loading or the factor variance must be fixed at one to set the metric of the factor.
Tracy Zhao posted on Thursday, January 23, 2014 - 6:23 pm
Oh, so two questions follow:
1. If I write it as "f BY y1*1;", it is possible that 1 is just a starting value that can change to any other number in the modeling process, right? But if I write it as "@1" then it will be fixed to 1 instead of other value? Is my understanding correct?
2. Why do I need starting value in MODEL POPULATION? Am I not just specifying the model parameters? Under what circumstance would I want the model parameters I specified to change to other numbers? Perhaps you can recommend some readings if this is too hard to explain here.
In MODEL POPULATION there is no difference between @ and *. In MODEL there is. * designates a starting value for a free parameter. @ fixes a parameter to the value that follows.
MODEL POPULATION gives the population parameter values for data generation. See Example 12.1. All of the commands and options are explained. See also the MONTECARLO command in the user's guide.
Tracy Zhao posted on Friday, January 24, 2014 - 10:22 am
I see. Thanks a lot Dr. Muthen!
Tracy Zhao posted on Tuesday, January 28, 2014 - 9:11 am
Hi, I don't know where this question should go, so I am just going to ask it here: if I am batch running Mplus for a large simulation study, can I still run Mplus for a different project (using the editor and run from the editor)? Thanks!
Use the SVALUES option of the OUTPUT command. You will receive input with the final estimates as starting values.
A model is estimated conditioned on the covariates. Their means, variances, and covariances are not model parameters. However, you need to specify them in MODEL POPULATION to generate data. Do a TYPE=TWOLEVEL BASIC with no MODEL command to find these values.
Linda, I entered variances/means to the Model Population accordingly with your suggestion, but how to enter the covariances if 2 variables belong to the within level and one to the between level?
I get this error message:
*** ERROR in MODEL POPULATION command Between-level variables cannot be used on the within level. Between-level variable used: g *** ERROR in MODEL POPULATION command Between-level variables cannot be used on the within level. Between-level variable used: g *** ERROR The following MODEL POPULATION statements are ignored: * Statements in the WITHIN level: STIMTYPL WITH g ORD WITH g
Your COUNT statement specifies how to analyze Y. You need to specify how to generate Y, adding
GENERATE = u1(nb);
This is like in the second part of UG ex 3.8. Note that all the UG examples have Monte Carlo versions which are posted on our website under Mplus User's Guide Examples.
mpduser1 posted on Wednesday, August 13, 2014 - 7:03 pm
That's very helpful. Thank you.
mpduser1 posted on Thursday, August 14, 2014 - 9:11 am
Via the error message I noted above, am I correct in understanding that I cannot introduce a model misspecification where I specify a negative binomial for a POPULATION model, but a normally-distributed Y for the analytic model?
You can do that using Mplus Monte Carlo in two steps: First generate the data by one model; then analyze the data by another (called "external Monte Carlo"). See the User's Guide chapter 12 for examples of 2-step Monte Carlo approaches.
Yueqi Yan posted on Monday, August 25, 2014 - 5:35 pm
Hi Dr. Muthen,
I am doing a Monte Carlo simulation on the efficiency of planned missing data designs. In the attached syntax, there are 84 missing data patterns, and I got a error message as follows.
*** ERROR in MONTECARLO command The number of sets of PATMISS variables does not match the number of patterns in PATPROBS.
I tried to reduce the number of patterns, and found that the when there are less than or equal to 64 patterns there seems to be no problem, but when there are more than 64, the error message comes back.
I was wondering if Mplus has a limit on the number of missing data patterns. Or there are something wrong with my code? Because in my study I also have a design that needs 2*64 patterns.
Hi Dr. Muthen, i want your help to simulate data with multiple group nonlinear SEMs with the following model. NAMES = Y1-Y10; ANALYSIS: TYPE = RANDOM; ALGORITHM = INTEGRATION; ANALYSIS: ESTIMATOR = ML; MODEL: x1 BY Y1 Y2 Y3 Y4; x2 BY Y5 Y6; x3 BY Y7 Y8; x4 BY Y9 Y10; x4 on x1 x2 x3; x1 with x2; x2 with x3; X x1x2 | x1 XWITH x2; X x1x1 | x1 XWITH x1; X x2x2 | x2 XWITH x2; x4 ON Xx1x2; x4 ON Xx1x1; x4 ON Xx2x2;
where all values of lambda are 0.8 all values of gama are 0.6 all values of mu.y1-y10 are 0.
i hope to help me to do that because i am new on mplus. i appreciate your help and your time.
Hi Dr. Muthen I am trying to write the solve the code below montecarlo: names = y1-y10; generate y1-y10(1); categorical = y1-y10; ngroups = 2; nobs = 200 200; nreps = 100; SEED = 53487; save = mplus.dat; analysis: type = random; model population: g1
Your model has categorical outcomes and continuous factors and XWITH. This means that ML needs to be used with numerical integration. Multiple-group analysis in this case is done with Type=Mixture and Knownclass - see examples in the User's Guide for how to do this.
thank you so much for your help actually i want to get on simulation data without applying any estimator. after putting the type= mixture and knownclass i still have error *** ERROR in MONTECARLO command Unknown option: KNOWNCLASS ****************************************
title: this is an example of a multiple group nonlinear SEM with categorical variables
The NGROUPS option of the MONTECARLO command has been extended for use with TYPE=MIXTURE. It is used to specify the number of classes to be used for data generation and in the analysis. The program automatically assigns the label %g#1% to the first class, %g#2% to the second class, etc. These labels are used in the MODEL POPULATION and MODEL commands.
Mika S. posted on Tuesday, November 25, 2014 - 6:23 am
Hi! I was asked by a reviewer to conduct a post-hoc power analysis for bivariate cross-lagg models because some of the cross-lagged coefficients were of rather small magnitude but significant (e.g., STDXY cross-lagged betas were around .08 and p was below .05). My sample size was, in my opinion, quite in a normal range for such studies (N = 700). And my first question is whether a post-hoc power analysis really makes sense in this case. My initial thought was, that adding 95% CI's to the betas would be a more suitable way "to better understand the small but significant effects" (core request of the reviewer)!?
Anyway, a first look at post-hoc power analysis (EX 12.7.)revealed some problems for me and my specific data set. My initial bivariate cross-lagged model was based on complex/cluster analysis (data are clustered in schools). I do not want to conduct multilevel analysis because school effects are not the main aim of my study. My second question thus is: Are there any chances to conduct the 2 step EX 12.7 with the original complex/cluster analysis (since monte carlo, at a first try, does not want "complex/cluster" in step 2) and how should I do this!?
Third question: Is there any chance to use a subset of variables in EX 12.7 for analyses or do I have to "cut" the datasets for these analyses in SPSS? "Usev" does not seem to work. Many thanks!
You would have to generate the data using a Type=two-level model in step 1 to get the complex survey features, then analyze in step 2 using Type=Complex. So that doesn't sound desirable since step 1 would then go beyond your initial model (which is not twolevel). If the Type=Complex SEs in your real-data run aren't that different from a run where you ignore Complex you are in a better position.
I don't know that the simulation adds much. I guess it can tell you: If I generate data with my sample size assuming my analysis model and parameter values are correct, do I get these low SEs/these p-values? If you don't, that's telling you that your model has some misspecification that causes small SEs.
Can I do an internal montecarlo for a simple path model where the population model specifies an exogenous variable with a non-normal distribution?
Specifically, I am thinking about a power study that included age as an exogenous variable and there is a fairly uniform distribution of 5 (60 mo.) to 7 (84 mo) year old kids. Or, where the exogenous variable is a proportion and the distribution is slightly U-shaped?
I can come up with a variance, but I do not want a normal distribution in either case.
I don't know off-hand how you would generate a uniform distribution. A U-shape can be obtained using a mixture of two normals with means sufficiently apart. Non-normals can also be generated using the new skew-t techniques discussed in the paper on our website:
Asparouhov, T. & Muthén B. (2014). Structural equation models and mixture models with continuous non-normal skewed distributions. Web Note 19. Version 2. Forthcoming in Structural Equation Modeling.
But, I wouldn't think your power results really are very strongly dependent on such deviations from a normal covariate.
I am going to assume normally distributed co-variates as you suggested earlier. It keeps things much simpler.
I am now wondering about the power for testing a hypothesis of no effect.
I am using a path model with 3 tests regressed on a single covariate. The tests are correlated because they all measure the same capability. The covariate captures the predominance of language#1 or language#2 in bilingual hoseholds.
Two of the tests are expected to be slightly biased by the match between the test's language and the dominant language in the household. I have chosen effect sizes of interest and used them in my data generation model, and can easily determine power from the rightmost column in the monte carlo output.
However I used a near zero effect for the third test because it is supposed to be immune to household language dominance.
So I really want to know the power that this coefficient is zero.
In an ordinary path model I could use MODEL TEST, but I am not sure how to do this in the monte carlo model. I considered using MODEL CONSTRAINT to make two new parameters that expressed the difference of the 3rd test with each of the others - but this is almost the same as the power testing I was already doing.
If I pick a deviation from zero that is of clinical interest, how can I estimate the power to detect that the regression of test#3 on the covariate is less than that deviation? Is there any relevant example?
I have run a power analysis for a single parameter with the Monte Carlo simulation and also by estimating the noncentrality parameter as described at http://statmodel.com/power. I'm getting a higher power estimate (.98 vs. .85) in the Monte Carlo analysis and I'm just wondering if this is to be expected, or if I have done something wrong?
In a two level Monte Carlo study with categorical outcomes (WLSM estimator), how can I get WRMR fit indices in the output? When I generate and analyze the data in the same input file, the output only provides Chi-Square and RMSEA. However, if the data first generated, then analyzed by using generated data sets, the output gives, Chi-Square, RMSEA, CFI, TLI, SRMR-W and SRMR-B, but does not give WRMR.
My first question: How can I get WRMR in two level SEM using WLSM?
Second: Is there any way to see all fit indices in the output file when using just one input file to simulate and analyze all replication?
1. WRMR has not been sufficiently studied for two-level.
2. These are not available due to an output glitch.
QianLi Xue posted on Thursday, May 21, 2015 - 9:46 am
On page 803 of User's Guide, there is this sentence about the SAVE command within the Monte Carlo Statement: "The variables are not always saved in the order that they appear in the NAMES statement." If there are no clear rules on how the variables simulated in the Monte Carlo are stored, how to tell which is which in the output dataset?