Monte Carlo Simulation PreviousNext
Mplus Discussion > Categorical Data Modeling >
 Anonymous posted on Thursday, February 03, 2000 - 12:16 pm
Studies based on small samples are very common in social and behavioral sciences. One seldom knows the power of those findings. Could it be possible to incorporate Monte Carlo procedures in a study with a small sample? Has anyone done so in "content" studies rather than methodological studies? Can you give me some guideline, especially on using Mplus?
 Bengt O. Muthen posted on Wednesday, February 09, 2000 - 3:34 pm
I think it is a very good idea to include a small Monte Carlo study also in "content" studies. This would give great insights into the quality of the results. Often, parameter estimates are good even at small samples but the quality of the standard errors, parameter coverage, the power of detecting effects, and the quality of overall tests of fit may be in question. When the parameter estimates are likely to be dependable they can be taken as rough population values for a Monte Carlo simulation. The population mean vector and covariance matrix can be computed for any model by fixing each parameter at its population value and requesting RESIDUAL (see estimated mean vector and covariance matrix). I have not, however, seen Monte Carlo approaches taken in content studies, but it is possible that this idea has been used. In my 1997 Psych Methods article with Curran, we did something akin to this to study power in our real-data application (see also the McCallum article referred to in that article). In that article, power was computed both via Monte Carlo simulation and using population values (Satorra-Saris approach).

Mplus allows Monte Carlo simulations in an automated fashion (data are generated, analyzed, and result summaries presented by Mplus) for several analysis types. See Chapter 29 of the User's Guide. Exceptions include twolevel and mixture analysis and for such cases, Monte Carlo simulated data can be generated outside Mplus as my research group often does. It would be good if articles including Monte Carlo were published to show the usefulness of the approach.
 Anonymous posted on Tuesday, October 03, 2000 - 3:15 pm
I try the Monde Carlo examples in the Chapter 29 of the User's Guide,but it comes out error message-Insufficient data in "monte.dat".Why?How can I fix it?
 Linda K. Muthen posted on Tuesday, October 03, 2000 - 4:02 pm
You must add a line to the data for the means of the dependent variables. This was left out in the first printing of the User's Guide. See the description of the FILE statement for Monte Carlo in Chapter 12.
 Subert Wu posted on Friday, October 06, 2000 - 2:52 pm
I use Mplus to do Monte Carlo simulation study.I want to generate 1000 replications data,but SAVE command just allows me to save the first one.Please tell me how to save the rest 999 replications data.
 Linda K. Muthen posted on Friday, October 06, 2000 - 6:12 pm
You can save data from only the first replication. There is no way to save the other replications.
 allison tracy posted on Wednesday, August 28, 2002 - 3:47 pm
I have previous conducted a 1-factor model with about 20 dichotomous indicators. I am finding that the item difficulty values (threshold/loading) are spotty in the lower portion of the factor score continuum. In order to create viable factor scores outside the context of Mplus, I have composited the items into 4 continuous level indicators by averaging over a set of dichotomous items with widely varying item difficulty levels. This way, I can use the factor score coefficient matrix to estimate the factor scores in a straightforward way without iterative procedures. This model fit the data very well but the constructed factor scores using the factor score coefficient matrix are very susceptible to the underidentification in the lower end of the factor - the distribution is very skewed. I plan to collect data on the original indicators as well as a number of new dichotomous indicators I hope will adequately measure the same factor and fill in the needed item difficulty levels. I am thinking I need to construct the "testlets" as before and set the measurement model parameters of these testlets to equal the results I obtained in my current dataset, then estimate the new indicators' measurement parameters freely. This is a very long-winded way of saying that I am trying the Monte Carlo feature of Mplus for the first time to try to determine the sample size needed to obtain stable and unbiased measurement parameter estimates for new dichotomous items with a variety of factor loading and threshold values, holding the rest of the factor model to the original obtained values from the continuous indicator model. How do I generate a set of data of this sort, where population parameters drive the generation of continuous and dichotomous data? Must I try to construct a reasonable variance/covariance matrix (out of thin air?) of the 4 continuous and 4 hypothetical dichotomous indicators in order to generate the data or can I do something akin to the approach in the Monte Carlo examples in the most recent addendum? My concern is that the mixture modeling approach used in the addendum example will not allow me to use the CUTPOINT and CATEGORICAL options I need. As always, I am very grateful for your help and amazed with the attention you give to us Mplus groupies.
 bmuthen posted on Thursday, August 29, 2002 - 9:49 am
Yes, to do Monte Carlo with categorical outcomes and continuous latent variable, the current Mplus requires you to go through the older Monte Carlo track (not the mixture track) and therefore construct a population covariance matrix form which the data are drawn. But not constructed out of thin air. The covariance matrix is for the y* variables, the continuous outcomes before categorization. The covariance matrix elements are obtained from the parameter values you hypothesize for the loadings, the factor (co-)variances, and the residual variances. You can get this matrix in a run where you pretend you have continuous outcomes, inputting say an identity covariance matrix as your "sample matrix", and fixing all parameters at the values you desire. The RESIDUAL output then gives you the estimated covariance matrix.

Note that this older Monte Carlo track does not give you the same output as the mixture Monte Carlo track. You don't get power information. But you do get chi-square information.
 Michael Conley posted on Thursday, February 13, 2003 - 6:03 am
Am I correct in thinking that Mplus Monte Carlo can be useful in assessing if a particular pattern of partial measurement equivalence permits reasonable estimation of parameters for two group MACS analysis? That is, if I find a particular pattern of partial measurement equivalence, I could make that pattern the population values in Monte Carlo. Then I could run Mplus specifying a model that has that partial invariance pattern across the groups. Then I could see in the output the coverage and the reasonableness of the estimated standard errors. This seems to me at first look to be useful, but maybe its not really informative? Perhaps it will always look good no matter how weak the partial invariance?
 bmuthen posted on Thursday, February 13, 2003 - 8:08 am
I think you are right in expecting the Monte Carlo results to come out looking good even with only partial invariance. This is probably true for factor means for instance - because you have good information on the means even with only partial invariance. The deterioration of the statistical qualities happens much later - as more and more items are non-invariant - than the deterioration of the plausibility that you measure the same construct. The only parameters vulnerable to non-invariance are those that are not invariant since they are only estimated from one group. But even here, a large enough sample will give good results.
 Michael Conley posted on Thursday, February 13, 2003 - 9:17 am
Thanks for your very helpful response. I would like to followup with an example of my situation. I am working with, say, accommodated math test scores where the accommodation is reading of questions due to low English reading skills. The content specialists/cognitive psychologists endorse that the items, even though read, still engage math ability. My preliminary results indicate well over half of the items are invariant with standard administration math items in a two group run. Given this and the endorsement of content specialists, would it be reasonable to say that there is a strong argument that the same construct is being measured? If that is the case, would the Monte Carlo analysis then give me support that the statistical qualities are there to estimate the non invariant parameters for the accommodated students?

To complicate it further, to be true to the real world scenario, I should probably run the standard administration students as a single group and determine the estimated item parameters (as test scoring would really be done). Then in the two group run wire those in as fixed values for the nonaccommodated folks and then determine which items are noninvariant with those values in the accommodated group. Is that correct? Thanks so much. I am hopeful this type of analysis will be useful in the study of these important testing issues.
 bmuthen posted on Thursday, February 13, 2003 - 9:33 am
Yes to your question in the first paragraph. Regarding the second paragraph, you certainly want to do a separate analysis of each group. But to test the (non-) invariance I think a better way is to analyze the two groups jointly, either with accomodation status as a covariate or in a 2-group run - you can then study/test (non-) invariance of each item or sets of items.
 Tao Xin posted on Friday, August 29, 2003 - 10:22 am
I want to use Mplus to do a simulation study. The populaion model is a CFA model that include 9 binary indicators and three latent variables (continuous). I saw a similar mplus code made by Linda and Bengt on the paper named as " sample size and power". I tried to modify that code to match my research situation, but it didn't work after I added up the commands (cutpoints & categorical) related to categoriacal variable. I am wondering if the Mplus can generate the binary data directly for the simulation purpose, or should I generate the binary data using other software first?
Thanks in advance,
 Linda K. Muthen posted on Friday, August 29, 2003 - 10:38 am
Yes, Mplus can generate such data but not using the approach that was given in the paper. If you are generating such data to get informaton on power for categorical outcomes, the power information is not printed. The current version of Mplus has two approaches to Monte Carlo simulation. Verson 3 will have only one and you will be able to easily do what you want. See pages 141-142 for a brief description of the current Monte Carlo facilities in Mplus. See Example 29.1A for an example of how to generate categorical outcomes in the current version of Mplus.
 Tao Xin posted on Friday, August 29, 2003 - 1:14 pm
Hi Linda,

Thank you very much for your quick response. I saw Example 29.1A, but I still have some questions for that example. In my study, I surpose that each of three latent variables can been measured by three binary indictors, and residuals of indictors are correlated in some way. So it's easy for my to propose the population parameters in this situation, but hard to make a correlation matrix among indictors. Example 29.1A requires a correlation matrix among observed variables. I am wondering if there is a way to decide the correlation matrix based on the population parameters.
Thanks in advance again,

 bmuthen posted on Friday, August 29, 2003 - 6:56 pm
If you don't want to compute the population covariance/corr matrix elements by usual expectation rules, you can use Mplus to generate the population covariance matrix and then simply get the correlation matrix in the usual way by dividing by the standard deviations. To get the population covariance matrix, do an ML run assuming continuous outcomes, where the input is a covariance matrix - for simplicity the unity matrix:

0 1
0 0 1

In this run you fix all the parameters at the values you decide. So there is no free parameter. Ask for RESIDUAL - this will give you the "estimated" covariance matrix which is the population cov matrix.
 Tao Xin posted on Saturday, August 30, 2003 - 9:55 am
I tried to practice the monte carlo analysis using Mplus. So I typed the Mplus code and data file listed in Example 29.1A, but it didn't work. The output showed a error message:

*** ERROR in Montecarlo command
(Err#: 59)
Invalid data in file C:\Mplus\monte\monte.dat

I re-checked my Mplus code and data file many times, and I didn't find any difference between my typed code and data file and those listed in the Mplus manual. Would please tell me if there is a bug in Mplus software? how should I fix it? thank you very much.
 Linda K. Muthen posted on Saturday, August 30, 2003 - 10:10 am
You need to send the files to for me to see what is wrong.
 Tao Xin posted on Monday, September 08, 2003 - 10:44 am
I used the Mplus to simulate the CFA models with categorical observed variables. In the mplus code I fixed all parameters with the values used to generate covariance matrix. The estimated parameters were very close to the population values. The only thing I feel confused is that the average of standard error and 95% covers are all zero. I am wondering if this is acceptable, and what kind of model fit indices could be used to assess the effect of sample size in this situation?

Thanks very much in advance.
 Linda K. Muthen posted on Tuesday, September 09, 2003 - 9:59 am
The problem is that you have the following statement:

f1 BY y1-y4*.8;

It should be y1 y2-y4*.8;

With your statement, the y1 factor loading is a free parameter and the model is not identified so you get no standard errors. You need to also change your other BY statements in the same way.
 Tao Xin posted on Thursday, October 23, 2003 - 9:16 am
Dear Linda,

As you know, I used the Mplus to simulate the CFA models with categorical observed variables. It seems that Mplus doesn't provide the model-fit indices for this type of simulations, such as chi-square, CFI, NFI and MRMEA. Would you please tell me how to get these indices? Thank you very much.

 Linda K. Muthen posted on Thursday, October 23, 2003 - 9:27 am
You should get a table for chi-square. The other fit indices will be available for Monte Carlo in Version 3.
 Helen Dennis posted on Monday, November 17, 2003 - 1:34 pm
Per bmuthen's message of 2/9/2000 3:34 pm, "twolevel and mixture analysis . . . in such cases, monte carlo simulated data can be generated outside mplus"
Can you suggest a reference that would indicate how to get started on this? I want to generate multilevel data using parameters of an actual multi-level data set that I have.
 Linda K. Muthen posted on Monday, November 17, 2003 - 2:22 pm
In the current version of Mplus, you can generate multilevel data inside of Mplus. See the Addendum to the Mplus User's Guide at under Product Support.
 John Painter posted on Thursday, June 17, 2004 - 1:52 pm

I am using Mplus v3 and would like to use the Monte Carlo command to generate a data file containing categorical variables with 5 categories. I am using the program MCEX5.2.INP as a starting point. Following the example on p 477 I change "generate = u1-u6(1);" to "generate = u1-u6(4);" which produces data with values of 0 or 4. What am I missing?
 bmuthen posted on Thursday, June 17, 2004 - 3:31 pm
You are on the right track, but you have probably not given population threshold values in Model Population (Model montecarlo). You need to give values for each of the 4 thresholds: u$1, u$2, u$3, u$4. The example you mention draws on the default of a zero threshold and there is only one in that example so it works without mentioning them.
 John Painter posted on Friday, August 20, 2004 - 10:26 am
I am using Mplus v3 and would like to use the Monte Carlo command to generate a data file containing categorical variables with 5 categories. I am using the program MCEX5.2.INP as a starting point. Following the example on p 477 I change "generate = u1-u6(1);" to "generate = u1-u6(4);" which produces data with values of 0 or 4.
RESPONSE FROM JUNE 17:You are on the right track, but you have probably not given population threshold values in Model Population (Model montecarlo). You need to give values for each of the 4 thresholds: u$1, u$2, u$3, u$4. The example you mention draws on the default of a zero threshold and there is only one in that example so it works without mentioning them.
How can I implement the solution described above in the syntax provided below?

names = u1-u4;
generate = u1-u4(2);
categorical = u1-u4;
nobs = 500;
nreps = 1;
SAVE = C:\MyDocuments\1research\factor\f3\F1CAT100L6_*.DAT ;

model population:
f1 by u1-u4*.7;
u1-u4*.51 ;
!V*u*) = 1 so that the parameter metric matches
!that of the Delta parameterization

f1 by u1-u4*.7;

 bmuthen posted on Friday, August 20, 2004 - 1:21 pm
In Model Population you should include statements such as

[u1$1*.... ];

where * should be followed by a value that you choose. See the User's Guide for more information about threshold parameters.
 Istvan posted on Tuesday, February 01, 2005 - 9:57 am
Dear Linda & Bengt,

I would like to carry out a simulation study using MPLUS. My data are generated with R. The problem that I would like to save the output (estimates, fit indices etc.) for each data set, and, as far as to my understanding, it cannot be done in MPLUS using the MONTECARLO command (as it gives only averaged values, not separate ones for each data set). Is my interpretation correct? Is there a way to do it anyhow?

Thank you very much in advance.

All the best,
 Linda K. Muthen posted on Tuesday, February 01, 2005 - 7:07 pm
You can save the results for each replication of a Monte Carlo study in Version 3. You can also save each data set. See the MONTECARLO command in the Mplus User's Guide.
 Fridtjof Nussbeck posted on Tuesday, May 24, 2005 - 7:43 am
Hi Linda and Bengt,

I am wondering how Mplus can compare the expected and observed chi-square p-values in a simulation study using the WLSMV option with only one table.

As I learned from the Muthén, duToit, and Spisic paper the df may change from replication to replication depending partly on sample properties.

How may I compare different chi-square values and their p-values to their theoretical counterparts in one analysis if there are more than one theoretical distribution involved?
 bmuthen posted on Wednesday, May 25, 2005 - 11:01 am
There are 4 columns in the chi-square output (see also the Mplus User's Guide discussing these 4 columns). The second column is the observed proportions column, which for WLSMV is based on the p values for the replications (proportion of p values above a certain value is recorded). The 3rd column is the expected percentile, which for WLSMV is based on the across-replication average percentile. Hope that helps.
 bmuthen posted on Wednesday, May 25, 2005 - 6:44 pm
Actually, after some more investigation, the 3rd column is based on the expected percentiles of the chi-square when using the average df across the replications.
  Anonymous posted on Tuesday, September 27, 2005 - 5:26 pm
I have N=124 with three different treatment groups and am trying to estimate power for a path model (x=treatment group status, y1, y2, and y3). Since there are not much studies out there on the subject we're trying to study, we don't really know about the effect sizes.
Is it okay to assume moderate effect sizes for the paths (.2) and do the MC runs like below? I'm especially not sure about the error variances for the variables.
Thank you very much!

[x1@0]; x1@1;
[x2@0]; x2@1;
y1 ON x1 *.2 x2 *.2;
y2 ON x1 *.2 x2 *.2;
Y3 on x1 *.2;
Y3 on x2 *.2;
y3 on y1 *.32;
y3 on y2 *.32;
[y1*0 y2*0 y3*0];
 Linda K. Muthen posted on Tuesday, September 27, 2005 - 5:28 pm
You might want to consider using your data to generate the population values for data generation. See Example 11.7 in the Mplus User's Guide.
 bmuthen posted on Tuesday, September 27, 2005 - 6:14 pm
Regarding the residual variances, it looks like you are getting an R-square greater than 50%, which may be high depending on the application area. If so, you might want to reduce the residual variances.
 Blaze Aylmer posted on Friday, October 28, 2005 - 9:23 am
I'm trying to generate a sample of likert data with five points. This data will then be used as input into a markov model.
The output is below
What does the eror mean? How do I get the data

names = u1-u6;
generate = u1-u6(2);
categorical = u1-u6;
nobs = 1000;
nreps = 1;

*** WARNING in Model command
All variables are uncorrelated with all other variables in the model.
Check that this is what is intended.
*** ERROR in Model Population command
No MODEL statements for MODEL POPULATION. True values must be specified.
 Linda K. Muthen posted on Friday, October 28, 2005 - 10:39 am
That is not a full Monte Carlo input. Most of the examples in the user's guide come with a Monte Carlo example also. Find the example in the user's guide closest to the model you want to estimate and use the Monte Carlo counterpart input as a start. Also, see Chapter 11 and the MONTECARLO command in the user's guide.
 Scott R. Colwell posted on Wednesday, June 14, 2006 - 11:36 am
The examples in the users guide all specify the Analysis: and the Model: as well as the Model Population: commands.

If I just want to create the data so that I can analyze it in a number of different ways in Mplus, can I just stop after the Model Population is specified.

For example, if I find a published paper that specifies a specific LV SEM model and I want to create a dataset based on the published parameters so that I can look at different ways of specify the model, can I stop at Model Population, then use the saved data set as I would normally with any dataset?

Also, if I want to create clustered data does specifying the MODEL: differently than is specified in MODEL POPULATION command when using TYPE = TWOLEVEL, change the data that is generated?
 Linda K. Muthen posted on Wednesday, June 14, 2006 - 2:04 pm
Example 11.6 shows how to save data for a subsequent external Monte Carlo. You don't need the MODEL command if you are only saving the data. You do need the ANALYSIS command. Nothing in the MODEL command affects data generation.
 Scott R. Colwell posted on Friday, June 16, 2006 - 6:00 am
I'm a little fuzzy when you refer to it as an external monte carlo. Is it termed external becuase it is outside of the original data generation simulation?

If I create a data set using say for example:

NREPS = 1000;


F1 BY Y1-Y5*.60
F2 BY Y6-Y10*.60
F1 WITH F2@.25;




F1 BY Y1-Y3;
F2 BY Y4-Y7;
F3 BY Y8-Y10;
 Linda K. Muthen posted on Friday, June 16, 2006 - 6:35 am
Yes, you can do this.
 Scott R. Colwell posted on Tuesday, July 11, 2006 - 8:33 am
I am looking for references (ie: book or journal) on:

(a) assessing model mispecification using monte carlo simulations

(b) specifying the model to match exactly (or close to it) the parameters of an existing published study.

Do you know of any?


 Bengt O. Muthen posted on Friday, July 14, 2006 - 4:58 pm
I am afraid that none comes to mind.
 Tor Neilands posted on Saturday, July 29, 2006 - 9:20 am
Hi, Bengt and Linda.

I am planning to build a Monte Carlo program to examine the power associated with testing direct and indirect effects in a structural equation model containing both continuous and ordered categorical indicators as well as an interaction between two latent factors. The structure of the model is quite similar to that depicted in Example 5.13 of the user's guide, except that indicators y10-y12 would be binary rather than continuous.

I have at my disposal scale alphas from prior literature that I can use as reliability inputs to the Monte Carlo program for indicators y1-y9. I also have one-way frequency tables available for the binary indicators y10-y12 (I'm guessing I'd need their bivariate/crosstabular information to be able to fully specify the Monte Carlo model, however).

On page 601 of the Muthen and Muthen SEM Journal (2002) article on sample size planning via Monte Carlo simulation, you compute the factor loadings and residual variance values based on expected reliability values (or vice versa) using formula (1), which expresses the reliability as variance explained divided by [variance explained + residual variance].

To provide a concrete example, if I knew that the previously published alpha value of indicator y1 was .70, I'd set the factor variance to 1.00, the F1-y1 loading to sqrt(.70) = .49 and the residual variance of y1 to .30. Is this a correct understanding of your recommended procedure for continuous indicators?

I have two other questions. My first question is whether it is OK for me to use this same method to set the factor loading and residual variance values for my continuous indicators in my Monte Carlo program given that some of the other indicators will be categorical?

My second question is even more basic, but pragmatic. What is the syntax I would need to change in the Monte Carlo version of example 5.13 to alter indicators y10-y12 from continuous to binary or continuous to ordered categorical with three or more levels? Perhaps you have another user's guide example or Web note example you'd recommend that I look at to locate the relevant syntax?

With best wishes and many thanks,

Tor Neilands
 Bengt O. Muthen posted on Sunday, July 30, 2006 - 5:59 pm
Regarding your concrete example, the variance of the indicator is 0.49+0.30=0.80 so the reliability is 0.5/0.8=0.63, right?

Regarding using this formula for categorical outcomes, that is probably less well motivated. You would have had to obtain your reliability by such a factor model. On the other hand, working off reliability as estimated by alpha, is rather approximate as it is (see the lit on alpha in an SEM framework), so maybe this is ok as a rough approximation.

Regarding the syntax, have a look at the Monte Carlo version of User's Guide example 5.2, which are on the Mplus CD.
 Tor Neilands posted on Sunday, July 30, 2006 - 11:12 pm
Thanks, Bengt.

Your comment on the example showed me that my calculation was wrong: I'd written that sqrt(.7) = .49. Actually, the square of .7 is .49. The square root is instead ~= .837, so .837*.837 + .30 ~= 1.00, which is what I'd intended. The loading would therefore be set to .837 with the residual variance equal to .30 to yield an approximate unit variance of the continuous indicator. I hope I got it correct this time.

Thanks also for pointing me to example 5.2 and for your comments regarding the usefulness (or lack thereof) in using alpha for continuous and categorical indicators for Monte Carlo simulations, especially w/respect to the categorical indicators. I've read the Raykov and Hancock articles on reliability estimation within the SEM framework vs. alpha. As well, your comments in the Mplus Discussion forum to a previous question of mine regarding computing optimal reliability for categorical y variables vs. underlying latent y* variables have been helpful as well.

The purpose of this particular simulation is to estimate the minimum detectable effect size for structural direct and indirect effects given a specific, known N (567). The investigator is writing a grant proposal to analyze secondary data, so the N of the parent data set is known. As well, she knows the previously published alphas for the continuous scale scores that will appear in her model.

Unforatunately, she does not have access to the data itself, so we must make educated guesses regarding correlations among the categorical indicators in the model. In your work, when you contemplate establishing values for categorical indicators, what criteria (aside from substantive area knowledge) do you use to set the values of categorical indicators' factor loadings and residual variances? Are there typical ranges you select for factor laodings and residual variances? Do those criteria shift depending on whether you're performing simulations with WLMSV vs. ML estimators?

Regards and thanks,

 Bengt O. Muthen posted on Monday, July 31, 2006 - 6:44 pm
With categorical indicators and working in the probit metric of WLSMV, I find that a binary item with relatively high reliability has around lambda=0.7 when the factor variance = 1. That's then 50% reliability in the "u* metric" (underlying continuous response variable). With a single binary item, I don't think one should expect higher reliability than that. I just looked at the classic LSAT6 and 7 results (Bock's classic example) and a more common loading there is around 0.4 - the highest was 0.7. In logit metric you multiply the loadings by about 1.8.
 Ilona posted on Saturday, February 17, 2007 - 6:35 am
Hi Drs. Muthen,

I am attempting to do a MonteCarlo simulation by first generating the continuous data model, and then generating the corresponding categorical data model (in order to have categorical y values as well as the underlying y* values).

In trying to generate both data sets, I am doing separate montecarlo programs, but using the same seed, and basically the same model (except residual variances are specified in the continuous model, but not in the categorical model (and these residual variances are=(1-(loading)^2) so that the loadings are standardized.)

So, I expected that the item response data generated in the continuous case would simply be categorized using the defined thresholds in my categorical model... but that does not appear to be the case.

Is there some way to do this? (other than categorizing the continuous data in some other software?) Is rounding error in my standardized loadings/error variance causing the differences?

Thanks you,
 Linda K. Muthen posted on Saturday, February 17, 2007 - 7:26 am
Please send your inputs, outputs, and license number to
 Stephan Golla posted on Monday, April 30, 2007 - 3:00 am
in an older post (from 2003) Linda wrote -"See pages 141-142 for a brief description of the current Monte Carlo facilities in Mplus. See Example 29.1A for an example of how to generate categorical outcomes in the current version of Mplus"-
How can I get example 29.1A. Have not found it on the website/cd etc.
Thank you very much,
 Linda K. Muthen posted on Monday, April 30, 2007 - 6:48 am
This is a reference to an example in an old version of the user's guide. The closest thing we now have to that example is the Monte Carlo counterpart of Example 5.2. You can find this on the Mplus CD or the website.
 Stephan Golla posted on Tuesday, May 01, 2007 - 2:50 am
o.k., thank you for help.
 Erika Wolf posted on Wednesday, May 09, 2007 - 8:22 am
I'm running MPlus v. 3.11 and I am running a Monte Carlo simulation study for the purposes of power analysis. I am generating 10,000 datasets for an SEM model that includes 2 latent variable interaction terms. Mplus has been running for over a day and my task manager says that it is still actually running and using 50% of the CPU. Is this really possible? Should I let it run or restart the program? I recognize the interaction terms and the 10,000 datasets is a lot for the program to run, but when I ran a similar analysis in the past (without the interaction terms), it never took this long. Thanks for your help.
 Linda K. Muthen posted on Wednesday, May 09, 2007 - 8:38 am
Adding latent variable interactions requires numerical integration so this could definitely make the estimation more complex. You are also using an old version of Mplus. You would need to send your input and license number to but I doubt that your upgrade and support contract is current if you are using Version 3.11.
 Erika Wolf posted on Wednesday, May 09, 2007 - 9:57 am
Thanks for your fast reply. And yes, unfortunately our support contract is not current. I'll let the program continue to run.
 Janke C. ten Holt posted on Wednesday, April 23, 2008 - 6:17 am

I am running an *external* Monte Carlo (MC) analysis (data sets were generated by an external program). I use Mplus to analyze the data sets and I would like to save the analysis results for each dataset in separate files.
Can this be done?

I am aware of the 'results' option in the 'montecarlo' command, but I do not think this option should be used in an *external* MC analysis.

In an older post (from 2005), in reply to a similar question, Linda stated that it can be done, referring to the User's guide. I have not found a suitable example there, unfortunately.
Could you perhaps shed some more light on this issue?

Thank you in advance,
 Linda K. Muthen posted on Wednesday, April 23, 2008 - 10:14 am
See the RESULTS option of the SAVEDATA command. The results are saved in one file.
 Janke C. ten Holt posted on Thursday, April 24, 2008 - 3:33 am
That works. Thank you for clearing that up for me!

 Janke C. ten Holt posted on Friday, July 18, 2008 - 5:27 am

I would like to run a Monte Carlo simulation with a misspecified model. The data are generated as categorical and analyzed as continuous with a linear factor model.
Here is my code:

montecarlo: names = v1-v5;
generate = v1-v5(4 p);
nobs = 200;
nreps = 100;
repsave = 1;
save = mc1_generatedData.dat;
seed = 4539;

model population:
f by v1-v2@0.8 v3-v4@0.5 v5@0.3;
[v1$1-v5$1@-1.2816 v1$2-v5$2@-0.3853 v1$3-v5$3@0.3853 v1$4-v5$4@1.2816];

f by v1-v5*;

When I run this in Mplus I get an error:
However, when I add the two lines:
categorical = v1-v5;
, it does run. When I subsequently analyze the saved generated dataset in Mplus with a linear factor analysis, treating the variables as continuous, I do not get an error.
How can this be?

Thanks in advance for any help,
 Linda K. Muthen posted on Friday, July 18, 2008 - 4:39 pm
Please send the relevant files and your license number to
 Bill Dudley posted on Thursday, August 28, 2008 - 11:39 am
I need to estimate power of a mediation model in which the effect of X on u is mediated by Y. similar to example 3.17.

MISSING IS y (999);
MODEL: y ON x;
u ON y x;

However this inp file does not include a INDIRECT command. I assumed that I could estimate the mediation using:

MODEL: y ON x;
u ON y x;

u IND y x;

But I get an error indicating that MODEL INDIRECT is not available with ALGORITHM - -INTEGRATION.

If I eliminate the ANALYSIS command entirely the program runs.
AND I see that the ESTIMATOR = WLSMV.

If I then use the Monte Carlo counterpart, w/o an ANALYSIS Command BUT include the MODEL INDIRECT, I encounter a fatal error that the population covariance matrix is not positive definite my assumption is that I have not modeled the indirect effect in the MODEL population and or that I am making an error by excluding the ANALYSIS command

1) In the modified 3.17 in which I have eliminated the ANALYSIS command, I wonder if the WLSMV estimates are appropriate or if I should model the data otherwise.

2) How should I specific the POPULATION parameters in the MODEL POPULATION section to reflect the indirect effect? (Hoping that this will eliminate the NPD error?

 Bengt O. Muthen posted on Friday, August 29, 2008 - 8:42 am
It looks like you have missing data on the mediator y. If that is not the case, things are more straightforward, but let's discuss as if you have missing on y.

In this case I think the ML estimator is better than WLSMV because ML can do MAR. With ML you then need montecarlo integration and with montecarlo integration you don't get model indirect results. You can, however, always create your own indirect effects as a*b using Model Constraint and defining a "New" parameter ind, where

ind = a*b;

where a and b are parameter labels in the Model paragraph. Here, a is u on y and b is y on x.

Regarding (2) - which you won't encounter by my approach - this happens with the WLSMV estimator when you don't give a residual variance in the population statement. See Monte Carlo input for such modeling in examples that mirror those of the UG examples (either on your Mplus CD, or on our web site).
 Bill Dudley posted on Friday, August 29, 2008 - 10:09 am
Thanks Bengt
I will give this a try.
I greatly enjoyed the workshop in Charm City.
 Yew Kwan Tong posted on Thursday, September 04, 2008 - 6:59 pm
Dear Linda or Bengt,

I am using the Monte Carlo function to simulate CFA models with categorical variables. It seems that the output does not give results for CFI, which is my statistic of interest. In response to a similar question much earlier, you had said that the CFI stats in Monte Carlo would be available from Mplus version 3. I am using Mplus 4... is there a specific command I need to insert to call forth results on CFI?
Thanks very much in advance, Yew Kwan
 Linda K. Muthen posted on Friday, September 05, 2008 - 8:08 am
We have not added CFI and TLI because they require the baseline model be estimated for each replication. You can save the Monte Carlo data sets and then run each one separately to obtain CFI.
 Ben Spycher posted on Tuesday, February 17, 2009 - 7:01 am
Dear Linda or Bengt,

For a simulation study I generated replications externally and am fitting various models to this data in Mplus using "type is montecarlo" in the data command. Some of my models do not converge for all replications. However from my saved results I cannot find out which ones did converge. I know the results option in the Montecarlo command does this, but can I use it if I am not generating the data in Mplus.
Thank you in advance
Ben Spycher
 Linda K. Muthen posted on Tuesday, February 17, 2009 - 7:34 am
If you ask for TECH9 in the OUTPUT command, you will see which replications had problems. With external Monte Carlo, there is no way to tie a particular data set to a replication number as in internal Monte Carlo.
 Ben Spycher posted on Tuesday, February 17, 2009 - 9:09 am
Thanks for this help, it works. I will just have to write them out manually, but thats no big deal.
 Kathryn Degnan posted on Thursday, January 28, 2010 - 2:29 pm
I am trying to run a Monte Carlo simulation to test the power I have to test a mediational model in my known sample size. It would be very helpful to be able to report the power of the indirect or total effects. I tried adding MODEL INDIRECT to the input and got the error saying that is not available.
Does anyone know if you can use model indirect in a monte carlo simulation? I see that there is no MC example for the indirect example in chapter 3 of the UG. Is there an example somewhere else for the syntax?
 Linda K. Muthen posted on Thursday, January 28, 2010 - 5:36 pm
MODEL INDIRECT can be used in a Monte Carlo simulation. I can't remember when it was added but it is available now.
 Chenshu Zhang posted on Tuesday, September 28, 2010 - 11:21 am
When doing montecarlo simulation study, how to define a new variable based on the existing random variables? For example, x and y are existing variables, I want to have a new variable as z=x*y.
 Linda K. Muthen posted on Tuesday, September 28, 2010 - 12:52 pm
The DEFINE command is not part of the MONTECARLO command. You would need to generate the data outside of Mplus if you want this feature.
 ywang posted on Tuesday, September 28, 2010 - 1:15 pm
Dear Drs. Muthen:

For the corresponding Montecarlo simulation study for example 6.4, there is the following statement in the input file. Can you detail how you get the scale factors based on the output of example 6.4? Thanks!

{u11@1 u12*.913 u13*.745 u14*.598};
! this sets the scale factors at the inverted SDs for the u* variables, so that the estimates are in the metric of the Delta parametrizations
 ywang posted on Tuesday, September 28, 2010 - 2:06 pm
Dear Drs. Muthen:

This is the follow-up for the montecarlo simulation. For the Mplus example 6.4, the ESTIMATED COVARIANCE MATRIX FOR PARAMETER ESTIMATES shows that the variance for u12 is 0.022, u13 is 0.037 and u14 is 0.024. Take u12 as an example, the scale for u12 should be 1/sqrt(0.022)=6.74. Why is it 0.913 as specified in the example 6.4 montecarlo counterpart?

Thanks a lot for your patience!
 Bengt O. Muthen posted on Tuesday, September 28, 2010 - 3:48 pm
The scale factors are not based on the output for ex6.4, but are based on the Model Population values from mcex6.4.

You first figure out the u* population variance at each time point. For example, for the second time point you have

u*_2 = i + 1*s + epsilon_2,

V(u*_2) = V(i) + V(s) + 2cov(i,s) + V(epsilon_2).

Model population gives

V(i) = 0.5
V(s) = 0.1
Cov(i,s)= 0
V(epsilon_2) = 0.6.

So, V(u*_2) = 1.2 and therefore the scale factor is 0.913. This is then given as a starting value in the Model statement so you get the correct population value for coverage reporting.
 ywang posted on Wednesday, September 29, 2010 - 11:55 am
Thank you very much for the detailed instruction. Now I am clear about how the scale factors were calculated. However, when I ran the ex6.4 using the dataset generated from the corresponding montecarlo simulation. The scale factors shown in the output file are not the same as specified in the montecarlo simulation
{u11@1 u12*.913 u13*.745 u14*.598}. Instead they are as follows {u11@1 u12*1.060 u13*1.012 u14*0.772}. Why are the differences?
U11 1.000 0.000 999.000 99.000
U12 1.060 0.149 7.133 0.000
U13 1.012 0.192 5.259 0.000
U14 0.772 0.153 5.029 0.000
 Bengt O. Muthen posted on Wednesday, September 29, 2010 - 4:29 pm
They are estimates and therefore have a sampling distribution. If you run many replications, the average value should get close to the population value.
 Jane Robertson posted on Wednesday, April 06, 2011 - 11:47 am
Hello. I am trying to run a Monte Carlo simulation to estimate power for a path model that contains 5 continuous variables and 1 binary (categorical) variable. I used the code: CUTPOINTS = y3(0); to indicate that y3 is the binary variable. The % Sig Coeff, or estimates of power, are much lower for y3, the binary variable, than for the continuous variables. Have I used the correct code to indicate that y3 is a binary variable?
 Linda K. Muthen posted on Wednesday, April 06, 2011 - 12:14 pm
For dependent variables in the model, use the GENERATE option to indicate that a variable is binary. For independent variables, use the CUTPOINTS option.
 Jak posted on Friday, May 13, 2011 - 7:03 am
Dear Linda or Bengt,

I would like to generate data and then analyze it using both MLR estimation and WLSMV estimation on each dataset.

Is this possible without saving the generated datasets to file?

Thanks in advance!
 Bengt O. Muthen posted on Friday, May 13, 2011 - 7:45 am
No, you can't do 2 analyses in one run (yet).
 Jak posted on Friday, May 20, 2011 - 7:28 am
Dear Linda or Bengt,

I am saving the results and datasets of an internal monte carlo run, and then I evaluate a second model in an external run on the saved datasets.

In the second run, the correction factors for MLR estimation are saved, but in the first run they are not.

Is there a way to save the correction factors for the first model as well?

Thanks in advance!
 Linda K. Muthen posted on Friday, May 20, 2011 - 8:04 am
There is no way to save anything that is not saved automatically. If you send the files and your license number to, I can look into this further.
 Xu, Man posted on Thursday, June 09, 2011 - 4:35 pm
Dear Dr. Muthen,

I would like to do a power analysis for a MIMIC model with a latent variable outcome. The items for the latent variable are binary, therefore I guess the data need to be generated to be like this. 12.7 example looks the prefect for me, but it is for continuous measures of the items. I need syntax for ordinal CFA analysis. I was wondering if you could give me some suggestions in order to get started please?

Thanks a lot!

 Linda K. Muthen posted on Thursday, June 09, 2011 - 5:19 pm
Each example comes with a Monte Carlo counterpart where the data for the example are generated. Look at Chapter 5 and find the closest thing to what you want and start from there. The Monte Carlo counterpart for Example 5.2 is mcex5.2.inp.
 Xu, Man posted on Wednesday, July 06, 2011 - 11:00 am
Dear Linda,

Thank you very much for your guide. I have had a look at the relevant examples.

My model is simple. It has got six binary indicators forming a continuous factor. The factor is predicted by a continuous predictor. It is a 2 group analysis, and following the example in Mplus, I use delta parameterization.

I would like to use unstandardised empricial value for the predictor, but stanardised, emprical values for the factor loading, threshhold, and ridisual variance. I think this is easier for me to vary the effect size (regression path coefficient from the predictor) in the simulation study.

I have not figured out how to calculate the residual variance of the items as I suspect it is not the same as the situation in continuous items. In the later, to get residual variance of an item, I just need to substract the square fo the standadrdised factor loading from one. And similarly for factor residual variance and path coefficient. but I don' think it is done the same way for binary items.

It would be great if you could give some advice as to how to set item related parameter values for the standardised factor soluation.


 Bengt O. Muthen posted on Wednesday, July 06, 2011 - 1:39 pm
You will need a statistical consultant to help you with this.
 WJCAO posted on Wednesday, October 12, 2011 - 11:51 pm
Dear Linda,
I didnot know why the below procedure cannot work.
It says that:
Number of replications
Requested 10
Completed 0
Title: a monte carlo simulation study for an factor analysis with categorical indicators

names are y1-y6;
nobservations = 500;
nreps = 10;
seed = 12345;
generate = y1-y6(2);
categorical = y1-y6;
SAVE = M1rep*.dat;


Model population:
[y1$1*0.5 y2$1*0.5 y3$1*0.5 y4$1*1 y5$1*1 y6$1*1] ;
[y1$2*2 y2$2*2 y3$2*2 y4$2*-0.5 y5$2*-0.5 y6$2*-0.5] ;
f1 by y1-y3* .4;
f2 by y4-y6* .4;
y1-y6* 1;

 Emil Coman posted on Thursday, October 13, 2011 - 7:03 am
Wjcao, Just a thought: is it because the 2nd thresholds for y4-y6 are smaller than the 1st thresholds? If you reverse them e.g., it runs fine. Thanks for posting syntax, Emil
 Linda K. Muthen posted on Thursday, October 13, 2011 - 7:26 am
The TECH9 output should tell you what the problem is.
 Sierra Bainter posted on Thursday, October 20, 2011 - 9:21 am
I would like to run an external simulation study of a LCA with training data where I save out both the results AND the asymptotic covariance matrix (tech3) from each replication for subsequent analysis.

Is it possible to do this using external montecarlo options, or do I need to do this using a batch file of some sort?
 Linda K. Muthen posted on Friday, October 21, 2011 - 10:42 am
TECH3 cannot be saved with the MONTECARLO command or external Monte Carlo. If you wanted to do this, you would need to run each generated data set separately and save TECH3. On the website, see Using Mplus via R. This may help you.

You cannot generate data using the TRAINING option but would have to do it as illustrated in Example 7.24. Example 7.23 is exactly the same but it uses the TRAINING option.
 Emil Coman posted on Friday, October 21, 2011 - 11:55 am
I noticed that in the examples accompanying the guide in chapter 7, only this ex7.23.dat data file is missing, I wanted to see how the training variables were specified... is there another place where I can get it? Thanks, Emil
 Linda K. Muthen posted on Friday, October 21, 2011 - 2:23 pm
There is no data for that example because we cannot generate data with training data. The data for the examples come from Monte Carlo studies of the example model.

See Slide 48 from the Topic 5 course handout on the website. This shows what the training data look like.
 Khaled Alkherainej  posted on Wednesday, October 26, 2011 - 7:20 pm

I need help with my syntax. I always get an error message indicating that the population matrix is not positive. I am using Monte Carlo analysis to estimate the sample size and misspecification on my model. The indicators are categorical; the response scale of the questionnaire is 3 points. Here is the syntax that I used :
Names are cm1-ps6;
categorical are all;
Nobservations = 100;
NREPS = 10000;
SEED = 0;
Generate = cm1-cm6(2) gm1-gm6(2) fm1-fm6(2) cg1-cg6(2) ps1-ps6(2);

Analysis: Estimator = WLSMV;

Model Population: [cm1$1*1.71 cm1$2*1 ...... ps6$2*.5];
F1 by cm1@1 cm2-cm6*.82;
F2 by gm1@1 gm2-gm6*.81;
F3 by fm1@1 fm2-fm6*.80;
F4 by cg1@1 cg2-cg6*.70;
F5 by ps1@1 ps2-ps6*.70; F1-F5@1;
F1 with F2*.50;
F1 with F3*.50;
F1 with F4*.50;
F1 with F5*.50;
F2 with F3*.50;
F2 with F4*.50;
F2 with F5*.50;
F3 with F4*.50;
F3 with F5*.50;
F4 with F5*.50;

same as the information of population model
Output: Tech9;
 Bengt O. Muthen posted on Wednesday, October 26, 2011 - 8:34 pm
You need to give residual variances for the factor indicators. See the Monte Carlo setups that correspond to each example in the UG.
 Damien Hoffman posted on Friday, October 28, 2011 - 1:12 pm
I would like to generate a set of theta scores along with their respective standard errors for an item parameter drift simulation study I'm working on. How would I go about doing that??

I created my own set of parameter estimates for the analysis, so I'm not really interested in generating those in MPlus. I'd really appreciate some help!
 Bengt O. Muthen posted on Friday, October 28, 2011 - 8:18 pm
It sounds like you want to fix all parameter estimates to the values you have and only estimate what we would call factor scores plus their SEs.
 Heike B. posted on Thursday, November 10, 2011 - 12:49 am
Dear Dres. Muthen,

as per my previous e-mails I am working on a manifest path model with categorical, non-normal data. I estimated the model using WLSMV, now I would like to assess the test power of my individual effects.

1. Is it possible to use the Monte Carlo Method you described in a paper from 2002 ("How to use a Monte Carlo method to decide on sample size and determine power")?

2. As I have to provide population paramters and it is an a-posteriori analysis - can I use the effect parameters from the model estimation?

3. Is it necessary also to assess test power on the model level? And what would be the strategy?

Many thanks in advance & many thanks for all the helpful answers to my previous postings.

 Linda K. Muthen posted on Thursday, November 10, 2011 - 1:52 pm
1. This approach can be applied to any model.

2. This is probably the best you can do although there are issues. See the following paper:

O'Keefe, J. Post Hoc Power, Observed Power,A Priori Power, Retrospective Power,
Prospective Power, Achieved
Power: Sorting Out Appropriate
Uses of Statistical Power Analyses. COMMUNICATION METHODS AND MEASURES, 1(4), 291299

3. The approach we write about is for one parameter in the model not the entire model.
 Heike B. posted on Tuesday, November 15, 2011 - 1:27 pm
Hello Linda,

when assessing test power of a path model through the Monte Carlo approach - is it sufficient to provide the population parameters for regression coefficients and explicitely defined covariances between dependent variables?

Or do I also have to provide the population parameters for means / thresholds / intercepts as well?

Many thanks for your help.
 Linda K. Muthen posted on Tuesday, November 15, 2011 - 2:08 pm
You must provide population parameter values for all model parameters or zero is used.
 Heike B. posted on Wednesday, November 23, 2011 - 5:30 am
I try to run a Monte Carlo simulation to assess test power for a path model containing one continous dependend variable and 6 categorical dependend variables. I use WLSMV, and MPLUS aborts with:


1. Which MPLUS system matrix does this message refer to?

2. Through which output can I see this matrix? (MPLUS aborts before a RESIDUAL output is printed)

3. What could I try? (I have provided means and variances for the independend variables and residual variances for the dependend variables).

Thanks a lot in advance.
 Linda K. Muthen posted on Wednesday, November 23, 2011 - 1:42 pm
1. The population covariance matrix.
2. It is not in the output. It is the numbers you give in MODEL POPULATION.
3. Any parameter not given a population parameter value is given the value zero. You have probably not given all variances population parameter values.
 Heike B. posted on Thursday, November 24, 2011 - 10:32 am
Thank you, Linda. That helped.

 Melanie Wall posted on Thursday, January 19, 2012 - 5:05 am
1. Is it possible to use the Monte Carlo functionality in Mplus to get a distribution of eigenvalues out (as are output when one runs a Type = EFA). This would be nice as it would then provide a way (within Mplus) to code up a parallel analysis test of the number of factors and use the fact that MPlus uses polychoric correlations.

2. Is it possible to incorporate complex sampling weights, strata, and cluster into the data generation process within the Monte Carlo functionality? We could generate externally, but it would be nice if it could be done internally.
 Linda K. Muthen posted on Thursday, January 19, 2012 - 10:26 am
1. This is not currently possible. Version 7 will contain parallel analysis.

2. The only complex survey data feature available with the MONTECARLO command is clustering. The two relevant options are CSIZES and NSIZES.
 Mauricio Garnier-Villarreal posted on Tuesday, February 14, 2012 - 9:40 pm

I am running a montecarlo simulation with categorical variables, i am testing the effect of generating the indicators as ordinal and analyzing then as continuous

NAMES ARE V1 V2 V3 V4 V5 V6;
GENERATE = V1-V6(2);
NREPS = 200;
SEED = 3141593;
RESULTS = results_50_0.7_.txt

but using this code the variables are generate and analyzed as continous, the save data sete shows the as continous, only adding the CATEGORICAL = V1-V6; command the indicators are generate as categorical, but also analyzed as continous

How can i generate the data as categorial and analyze it as continous?

Thank you
 Linda K. Muthen posted on Wednesday, February 15, 2012 - 10:27 am
The GENERATE option controls the generation of the data. The CATEGORICAL option controls the analysis. I'm not sure why you think this is not happening. In your input, you are generating categorical. You have no CATEGORICAL option so the variables will be analyzed as continuous.
 Mauricio Garnier-Villarreal posted on Wednesday, February 15, 2012 - 5:20 pm

when i use the following code

NAMES ARE V1 V2 V3 V4 V5 V6;
GENERATE = V1-V6(2);
NREPS = 200;
SEED = 3141593;
RESULTS = results_50_0.7_.txt

the dat set file TEST1.DAT shows conntinous variables like this
-0.345664 -0.103435 -2.310518

I check this after seing the results

When i used the CATEGORICAL option does create categorical variables
 Linda K. Muthen posted on Friday, February 17, 2012 - 2:08 pm
Please send the full output and your license number to
 Jacky Luo posted on Tuesday, May 01, 2012 - 1:06 pm
I am trying to run a Monte Carlo simulation of multigroup cfa to test for measurement invariance. My indicators are all categorical, and I use WLSMV as the estimator. If I understand correctly, difftest has to be saved for the chi-square test. Would it be possible to run such a Monte Carlo study in Mplus without using any other software? Suppose I've generated data externally.
Thanks very much
 Linda K. Muthen posted on Wednesday, May 02, 2012 - 12:02 pm
You can first save the Monte Carlo data sets. Then you would need to run each data set separately using DIFFTEST first in the SAVEDATA command and then in the ANALYSIS command. See if the Monte Carlo Utility under How To on the website can help.
 Jan Stochl posted on Monday, June 25, 2012 - 4:21 am
Dear Linda,
I have finished simulation of data from multilevel CFA model (30 items, categorical data, 5 correlated factors). I am now writing a paper and one of the reviewers asked to provide between and within population covariance matrices that have been used to generate the simulated datasets. Is there any way how to get population correlation/covariance matrices for within and between level in Mplus? I cannot see them in the output...

f1 by y1@0.7 y2-y7*0.7;
f2 by y8@0.7 y9-y16*0.7;
f3 by y17@0.7 y18-y21*0.7;
f4 by y22@0.7 y23-y25*0.7;
f5 by y26@0.7 y27-y30*0.7;
f1-f5 WITH f1-f5@0.4;

[y1$1*0]; [y1$2*0.5]; [y1$3*1]; [y1$4*1.5]; [y1$5*2]; [y1$6*2.5];
....I have specified other thresholds in a similar fashion, but do not copy them here
 Linda K. Muthen posted on Monday, June 25, 2012 - 10:30 am
We don't use the between and within population covariance matrices to generate the data. We use population parameter values. I can't think of a way to get these matrices other than to create them using the population parameter values. This seems unnecessary because you know the population parameter values which is more detailed information that is given in the covariance matrices.
 Jan Stochl posted on Tuesday, June 26, 2012 - 3:36 am
thanks a lot Linda for useful info
 Richard E. Zinbarg posted on Friday, July 05, 2013 - 10:44 am
I want to simulate a CFA with categorical indicators and have 2 questions. First, how do I control what the thresholds are? I'd like half the items with low thresholds and the other half with high thresholds.

2nd, I want to analyze the data both continuously and categorically and can't get the categorical analyses to run. How do I get the categorical analyses? The syntax I have tried is as follows:
Names are y1-y8;
Generate = y1-y8(1);
Nobservations = 200;
NREPS = 100;
Repsave = All;
Save = GerardMontePilot1*.dat
Seed = 0000001;
Generate = y1-y8 (1);
Model Population:
f By y1-y8*1;
f By y1*1 (L1)
y2*1 (L2)
y3*1 (L3)
y4*1 (L4)
y5*1 (L5)
y6*1 (L6)
y7*1 (L7)
y8*1 (L8);
y1-y8*1 (R1-R8);

Analysis: parameterization = theta;

But the output says that whereas I requested 100 replications, 0 were completed.
 Linda K. Muthen posted on Saturday, July 06, 2013 - 11:43 am
See the Monte Carlo counterpart of Example 5.17 as a starting point.
 Richard E. Zinbarg posted on Sunday, July 07, 2013 - 7:32 am
thanks for the speedy reply Linda but the only CFA Monte Carlo examples I see are for continuous indicators.
 Linda K. Muthen posted on Sunday, July 07, 2013 - 7:41 am
See the Monte Carlo counterpart for Example 5.17. It is called mcex5.17.inp. It is for categorical variables and the Theta parametrization. It is for two groups but you can adjust for that.
 Richard E. Zinbarg posted on Sunday, July 07, 2013 - 10:29 am
thanks Linda - where does one find the Monte Carlo counterpart examples? I don't see them in the User's Guide or in the examples one can access from the webstie
 Richard E. Zinbarg posted on Sunday, July 07, 2013 - 10:33 am
please ignore my last post - I just found the example. Thanks Linda!
 Andrew Grotzinger posted on Wednesday, August 21, 2013 - 3:28 pm

We are trying to generate data using Monte Carlo that specifies a negative slope over time, but never generates outcomes that go below zero. Currently, we are running into a lot of negative numbers in the outputs, which lacks practical application for what we are wanting to test. Any suggestions?
 Linda K. Muthen posted on Thursday, August 22, 2013 - 10:52 am
Please send the output and your license number to
 Walid Talash posted on Saturday, June 07, 2014 - 5:21 am

i have a question regarding the threshold concept in mplus. I want to generate data with 5 categories, so I need 4 thresholds. Now I want that the resulting dataset resembles a normal distribution as close as possible (skewness and kurtosis ~ 0). My problem is, that I have to choose the thresholds on my own, so I can't be sure if the arbitrary chosen thresholds could be optimized. Is it somehow possible to let mplus choose the thresholds on its own? Or maybe another way to automatically generate "normally distributed" categorical data?

Thank you very much

 Bengt O. Muthen posted on Saturday, June 07, 2014 - 5:52 pm
No automatic way. And I don't know of a precise way unless you look into the literature on numerical integration with 5 quadrature points.

Otherwise, take a look at the paper on my UCLA website that you get to via "About Us" on this website:

12) Muthn, B., & Kaplan D. (1985). A comparison of some methodologies for the factor analysis of non-normal Likert variables. British Journal of Mathematical and Statistical Psychology, 38, 171-189. [Available as PDF]
 Alvin  posted on Tuesday, September 02, 2014 - 5:23 pm
Hi Dr Muthen, I am attempting to simulate a monte carlo path model (with continous variables) - my understanding is that I should specify population values for the path coefficients, variances, and residuals. I notice you use the following population values (0.8 for factor loadings, 0.36 for residua variances, 0.25 for factor correlations) in your 2002 paper. How do I determine the best population values in this case? Many thanks
 Bengt O. Muthen posted on Tuesday, September 02, 2014 - 6:19 pm
It's up to you. Whatever values are motivated by your real-data situation.
 Alvin  posted on Thursday, September 04, 2014 - 8:51 pm
Hi Dr Muthens, I ran into this error while running a MC path model: *** FATAL ERROR
Based on the comments above, this is because I have not provided population values for all parameters in my model? I have tried (including all parameters, path coefficients, variances, covariances, residual variances, mean) but still having the same problem.. Do I need to include intercepts as well (if so, how) ? is this the correct way of specifying values for mean [var1*.8]?
 Alvin  posted on Thursday, September 04, 2014 - 10:45 pm
I've included the relevant syntax here -
epdstot2 on ptsdavg*.8 comstress*.7 dvcat*.7 fireshelter*.3;
ptsdavg on hrtraum*.8 witness*.8 dvcat*.7 comstress*.7;
witness with hrtraum*.8;

fireshelter with hrtraum*.5 witness*.3;
comstress with hrtraum*.8 witness*.2 fireshelter*.8;
dvcat with hrtraum*.1 witness*.1 fireshelter*.1 comstress*.3;

hrtraum@1; !variance

ptsdavg*.8; !residual variance

[hrtraum*.8]; !mean
 Linda K. Muthen posted on Friday, September 05, 2014 - 5:16 am
Please send the full output and your license number to
 Emil Coman posted on Wednesday, April 22, 2015 - 8:42 am
Hi, I can't seem to find the description of the data format in the 'SAVEDATA: ESTIMATES ARE" option to save data. I found the 12.7 example, with its ex12.7estimates.dat data, and I looked at it but cannot guess which is which; found also the excellent , but didn't clarify the estimates format itself. The Step 1 output ex12.7step1.html that generated these ESTIMATES only says: Save file \ex12.7estimates.dat \ Save format Free.
Thanks! Emil
 Linda K. Muthen posted on Wednesday, April 22, 2015 - 12:18 pm
The file contains the parameters estimates from the analysis in free format delimited by a space.
 Birhanu Worku posted on Saturday, July 30, 2016 - 5:29 am
is there other method to generate non-normal data in MPlus apart from the Muthen and Muthen Monte Carlo paper in SEM? My question in this paper:
1. a mixture approach was used to get non-normal data, but is possible to use the generated data for type=random under analysis command?
2. is it Ok if we generate normal data using type=random instead of type=mixture and then to get non-normal type=mixture ?

 Bengt O. Muthen posted on Saturday, July 30, 2016 - 4:22 pm
1. Yes. You generate in "internal Monte Carlo" and analyze any way you want in "external Monte Carlo" (DATA Type=MonteCarlo).

2. If you generate normal data you will not get more than one mixture class.

You can generate non-normal using Distribution = Skewt.
 Birhanu Worku posted on Sunday, July 31, 2016 - 2:56 am
Thank you Bengt
To make it clear, all my variables are continuous (factors + indicators). I am using Mplus version 7.4. I have done montecarlo simulation assuming that the factors are normally distributed as:
Its because the model has latent interaction. Next I want to generate non-normal data with specified skew and kurtosis for example skew=3 and kurtosis=20 with same model under:
I am trying to see your paper "How To Use A Monte Carlo Study To Decide On Sample Size..." under Appendix 1 if it can help me to generate non-normal data for my model. But i think i can't. For example, with similar syntax in your paper and type=mixture, i got this message:
"*** ERROR in MODEL command
To declare interaction variables, TYPE = RANDOM must be specified in the ANALYSIS command".
My idea was to follow your paper and use mixture approach to generate non-normal data and then use type =random for my model as you replied (1) for me on July 30,2016. Now the problem is type of my model (having latent interaction) and can't use mixture approach. I also asked you if there is alternative method and found that distribution=skewt "not available with type=random" as error message in mlpus.

Thank you and looking forward for your advice
 Linda K. Muthen posted on Sunday, July 31, 2016 - 11:26 am
 Birhanu Worku posted on Sunday, July 31, 2016 - 11:58 am
Thank you prof Linda,

What about if we want to specify skewness and kurtosis directly?
 Linda K. Muthen posted on Sunday, July 31, 2016 - 4:59 pm
This cannot be done.
 Jan Ivanouw posted on Wednesday, August 17, 2016 - 1:54 am

I am wondering about the note in mcex5.2:

"! Note that the u* variances should be 1 in order for the Delta parametrization to give estimates in the correct metric"

How do I fix u* variances to 1 in a Monte Carlo study?

 Jan Ivanouw posted on Wednesday, August 17, 2016 - 5:40 am
Hi again

I have a second question:

When working with the MC version of mcex5.2 example I have changed the categorical dependent variables from 2 unto 3 categories (2 thresholds).

I specify the list of u1$1, u1$2, u2$1 etc in the MODEL MONTECARLO: section, which works fine.

However, when I add the same specifications to the MODEL: section, the program stops and I get the error message:

"The following MODEL statements are ignored:
* Statements in the GENERAL group:

What am I doing wrong?

 Bengt O. Muthen posted on Wednesday, August 17, 2016 - 2:28 pm
To get V(u*)=1 with Delta you choose the right combination of values for the factor loading, factor variance, and residual variance for the indicator.

To get 2 thresholds (3 categories) you need to change the 1 to 2:

Generate = u(2);
 Jan Ivanouw posted on Wednesday, August 17, 2016 - 11:30 pm
Hi Bengt

Thank you for your answers.

The problem of which combination of factor loading, factor variance and residual variance to obtain V(u*)=1 - can you give me a hint about which conditions to satisfy in the combination of these parameters?

The problem of the program stopping:
In fact, I did use generate = u(2), and its works to generate the data. The program also works fine when I do not specify the thresholds in the MODEL: section - however misspecified since the analysis of the generated data then is performed as if there were only two categories (and the default threshold of 0).
The problem is that the program will not allow me to specify the thresholds in the MODEL: section.
So, apparantly I have done something wrong, but I don't know what.

 Bengt O. Muthen posted on Thursday, August 18, 2016 - 10:41 am
Say that the factor variance is 1 and the loading is 0.7. Then the u* variance is

0.7*0.7*1 + V(e)

so in Model Population you should choose

V(e) = 0.51

to make the u* variance = 1.

To see why the program complains, please send output and license number to Support.
 Jan Ivanouw posted on Thursday, August 18, 2016 - 1:36 pm
Thank you for your answer.
While preparing my data analysis for sending, I found the stupid error I have committed.
 Tao Yang posted on Tuesday, January 24, 2017 - 7:47 am
Hello, I'd like to run Monte Carlo power analysis of the interactions
of x (binary predictor) with z and w (continuous variables). I
generated data sets for x, z, w, and y to be used for external Monte
Carlo in Mplus (so that I can create interaction terms using DEFINE).
Syntax below. Without "MODEL: y ON x", I got an error message that
"Only x-variables may have cutpoints in this type of analysis..." So I
included MODEL command line only to tell the program to treat x as an
exogenous variable.

Does the MODEL command influence data generation? In other words, are
data sets generated based on specifications in MODEL POPULATION or

NREPS = 100;REPSAVE = ALL; SAVE = rep*.dat;

zb BY z@1; zb@.50; z@.01;
wb BY w@1; wb@.30; w@.01;
xzb | zb XWITH x;
xwb | wb XWITH x;
y ON x*.20 zb*.10 wb*.30 xzb*.21 xwb*.35;
[x@.50 zb@0 wb@0 z*.35 w*.12];
x@.25; [y*.12]; y*.02;

MODEL: y ON x;
 Bengt O. Muthen posted on Tuesday, January 24, 2017 - 2:46 pm
Only Model Population influence the data generation.
 Amanda Hagman posted on Tuesday, March 07, 2017 - 10:42 am
Dear Drs. Muthen,

In a Monte Carlo LCA simulation I am looking to see if class enumeration is complicated when variables, correlated within class, are misspecified.

I used an external monte carlo to create data at different levels of correlation. Then I want to run the model with and without a correlation specified within classes.

I was able to do this at a low/moderate correlations, .25 AND .35. But my .55 model won't run, neither will the .75 correlation model.

The output for the .55 correlation model is an exact copy of the input. The output for the .75 model show the results for 0 data sets. I am not sure this has to do with the correlation strength, but everything else is the same in the input. The data files created in the first step of the external monte carlo looks good and is complete.

Do you have any suggestions?



Thank you for your help.
 Bengt O. Muthen posted on Tuesday, March 07, 2017 - 6:21 pm
That's not enough information to go on. I don't know how you create the within-class correlation or how you model it.

So we probably need to see the inputs for those 2 problematic runs - send to Support along with your license number.
 Melissa Gordon posted on Tuesday, January 09, 2018 - 7:01 pm

I'm running into the same problem as Janke C. ten Holt (July 18, 2008 - 5:27 am) and Mauricio Garnier-Villarreal (February 14, 2012 - 9:40 pm). Their posts are both on this thread.

I want to misspecify a LCA model by generating continuous data and then dichotomizing it and treating it as binary in the analysis. I am using similar code to their code and running into the same errors they found. Linda asked them to submit their syntax privately. Has the issue been resolved? Is there a different code to use?


 Bengt O. Muthen posted on Wednesday, January 10, 2018 - 10:23 am
You would do that in 2 steps, first generating the continuous data and then in an external Monte Carlo run you dichotomize it in Define. See UG ex 12.6 for an example of a 2-step MC. I don't recall what input errors the previous posters made.
 Jennie Jester posted on Saturday, January 13, 2018 - 1:20 pm
I am running a Monte Carlo simulation to estimate sample size for a proposed study in which we will measure parenting in couples. I plan to do analysis with TYPE = COMPLEX, so I am following example 12.6 generating the date as twolevel and run analysis as COMPLEX. Do I have to have both between and within variables? Do all variables need to be declared as either between or within?
My model is latent growth model of an outcome variable (y) and mediator(m) and I will also have a time-varying independent variable x (not sure how to code). Syntax:
MONTECARLO: NAMES ARE y11-y13 m11-m13;
NREPS = 10000;
SEED = 53487;
CSIZES = 300(2);
SAVE = ex12.6rep*.dat;

MODEL population:
iout sout | y11@0 y12@1 y13@2;
[iout*.5 sout*1];
iout*1; sout*.2; iout with sout*.1;

!similar model here for the mediator;
sout on smed *.3;! this is the parameter I am interested for power;
imed with iout*.1;
!repeat of above model;
 Bengt O. Muthen posted on Saturday, January 13, 2018 - 4:53 pm
Because you have only 2 cluster members, husband and wife, why don't you instead use a single-level, wide approach? So if you measure the outcome at 3 time points, you will have 6 columns of outcomes in the data. We have examples in our short course handouts of this; see Topics 3 or 4.
 Jennie Jester posted on Sunday, January 14, 2018 - 5:41 am
I am worried that we will not be able to afford a large enough sample size to test the mediation, especially in the LGM context. Currently, the budget is for 300 couples. If we do the wide approach, I think we have a sample size of 300. If we do a long approach with TYPE = COMPLEX, do, we have a larger effective sample size, which depends on how closely related our outcomes measures are within the the couple?
 Bengt O. Muthen posted on Sunday, January 14, 2018 - 10:29 am
The power is the same for the two approaches because the models are identical.
 Jennie Jester posted on Tuesday, January 16, 2018 - 8:48 am
I have looked for examples of this in Topics 3 and 4 (and 7 and 8), but I can't find it. I don't understand how to set up the model . I understand using a wide format for longitudinal data, for instance for an LGM, but not for couples data. Any chance you could be more specific about where to find an example of this?

Thanks so much!
 Bengt O. Muthen posted on Tuesday, January 16, 2018 - 2:28 pm
See Topic 8, slides 52-56 and the Khoo reference on slide 54.
 Dallas posted on Thursday, January 18, 2018 - 12:18 pm
Hi. I'm trying to simulate data to reflect some real data.

Suppose I have a real dataset with 200 observations. U1 is a binary variable with mean= .245 and VAR=.186. X1 is also a binary variable with mean= .545 and variance =.249. The logistic regression of U1 on X1 results in an intercept = 1.47 and a beta of .593.

I think the relevant Montecarlo code to simulate data to match this data would be:

generate = u1(1);
cutpoints = x1(.11303854); !invnormal(.545)=.113
analysis: estimator=ml; link=logit;
model population:
x1@1; ///
u1 ON x1@.5927822 ;

However, when I run that code and read in the data. I find that X1's mean is .425 (VAR=.246) and the logistic results are intercept = 1.498772 and a beta of .734.

What I am missing in the generation bit so that the simulated data matches the observed data?
 Bengt O. Muthen posted on Thursday, January 18, 2018 - 2:45 pm
2 issues:

Note that mean (X1) = Prob (x1=1, not 0). So the cutpoint needs to be negative for a Prob > 0.5. You probably have the sign reversed.

Note also that if the intercept is 1.47, the threshold that you specify with [$] is the negative of that.
 Dallas posted on Thursday, January 18, 2018 - 5:05 pm
Hi. Thanks for your quick response. I apologize. I meant to write "threshold" and not intercept. The threshold is 1.470852.

I'm not clear about your cutpoint comment. The probability that X1=1 is 0.545. That is a z-score of ~.113.

What happens is that when I run the Monte Carlo code as written:
generate = u1(1);
cutpoints = x1(.11303854);
analysis: estimator=ml; link=logit;
model population:
x1@1; ///
u1 ON x1@.5927822 ;

I am expecting to find that the probability of X1=.545 in the simulated data. But, it doesn't. It equals 0.425. Not surprisingly, if I run a logistic regression in Mplus on the simulated data, the parameters differ from their fixed values in the Monte Carlo part.

It seems that a z-score cutpoint of .113 should result in an X1 with a probability of .545 but I'm not getting that. Sorry I wasn't clear and correct in my language earlier. Thanks again!
 Bengt O. Muthen posted on Thursday, January 18, 2018 - 6:12 pm
A N(0,1) variable has 0.5 probability of being above 0. So if you want a 0.545 probability, the cutpoint has to be less than 0. The z-score of 0.113 that you used makes the probability of being below that value 0.545, but the probability of being above it (which is what you are looking for) is 1-0.545 = 0.455. See also UG ex 12.1 which uses Cutpoint.
 Dallas posted on Friday, January 19, 2018 - 3:03 am
Oh. I see. I misunderstood (obviously!). If I change the value to -.113 and leave the rest, x1's resulting mean is .515 not .545 and a logistic regression does not recover the expected parameters. Let me ask the question differently.

Suppose I wanted to generate data with the following properties:
A binary outcome with mean= .245 and variance=.186.
A binary predictor X1 with mean= .545 and variance =.249.
And a logistic regression of U1 on X1 results in
U1 ON X1=0.593

What Monte Carlo code would I use so that the generated data had those EXACT properties? Thanks!
 Bengt O. Muthen posted on Friday, January 19, 2018 - 11:39 am
In Monte Carlo studies, the results won't be exact unless you use a sample size of infinity. You get closer and closer with increasing N.
 Dallas posted on Friday, January 19, 2018 - 12:45 pm
Ah. I thought that was true when one used the "*" symbol but that the "@" symbol would fix the values regardless of the sample size. I specified a very large sample and the parameters were as expected. Thanks.

Given your comment, what is the best way to accurately describe what fixing a value does within the Monte Carlo context?

 Bengt O. Muthen posted on Friday, January 19, 2018 - 1:19 pm
In Model Population it doesn't matter if you use * or @ - the randomly generated data are based on the the values given. But data are randomly generated (like drawing a random sample) so no single data set (no single draw) is expected to have exactly the features of the population model.
 Dallas posted on Friday, January 19, 2018 - 5:24 pm
I see. This was/is very helpful. Thank you so much for your prompt and informative replies. Have a great weekend.
 Bengt O. Muthen posted on Saturday, January 20, 2018 - 12:55 pm
See also the V8 UG page 465 and on.

Plus chapter 3 of our book Regression and Mediation using Mplus.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message