Message/Author 

Daniel posted on Thursday, August 28, 2003  8:56 am



I would like to conduct a Monte Carlo model to estimate power in a two group LGM analysis. How would I specify my groups and other related info? 


Example 2 on page 24 of the Addendum to the Mplus User's Guide (at www.statmodel.com under Product Support) is a monte carlo for a twogroup threelevel growth model. You can use that as a starting point. 

Daniel posted on Thursday, August 28, 2003  9:10 am



Thanks very much. You and your husband give the greatest support for any product of any type I have ever purchased. 

Daniel posted on Friday, August 29, 2003  8:06 am



Hi, I'm having quite a bit of difficulty getting started. I have an LGM with two parallel processes. I want to see if I can find an interaction for high and low levels of depression. I want to cut my groups into high and low levels of depression. I am not using multilevel modeling. How should I get started? I'd appreciate any help possible. I looked at the example on page 24, like you suggested, but could not make heads or tails of it because I am not doing multilevel modeling. It keeps on asking me for data. 


Here is the example from page 24 with the multilevel information deleted. Maybe that will help. TITLE: simulation of a two group analysis of growth model MONTECARLO: NAMES ARE y1y4; NGROUPS = 2; NOBSERVATIONS = 960 560; NREPS = 100; SEED = 9191187; MODEL MONTECARLO: i BY y1y4@1; s BY y1@0 y2@1 y3@2 y4@3; y1y4*.5 i*.7 s*.2; [y1y4@0 i*0 s*.2 ]; MODEL MONTECARLOG1: [i*0 s*.2]; MODEL MONTECARLOG2: [i*1 s*.5]; ANALYSIS: TYPE = MEANSTRUCTURE; MODEL: i BY y1y4@1; s BY y1@0 y2@1 y3@2 y4@3; y1y4*.5 i*.7 s*.2; [y1y4@0 i*0 s*.2 ]; MODEL G1: [i*0 s*.2]; MODEL G2: [i*1 s*.5]; OUTPUT: TECH9; 


Actually, the input I posted will not run in Version 2 of Mplus. It will run in Version 3. In Version 2, you will need to use the FILE option to enter your population values instead of MODEL MONTECARLO. Or you will have to modify the input to be TYPE=MIXTURE, NCLASSES=1, GCLASSES=1; If I get time, I will modify this. But it won't be until next week. 

Daniel posted on Friday, August 29, 2003  12:30 pm



Thanks again. I figured out a way to run the model. I look forward to version 3. 


Great. Version 3 is going to have unbelieveable Monte Carlo capabilities among other things. I think you'll like it. 

Limey posted on Thursday, May 06, 2004  1:07 pm



I would like to use the Monte Carlo facility (version 3 of Mplus) to generate 2 seperate populations, and then disproportionately sample a certain number from each population. The samples from these 2 populations need to be combined so that I can then analyze it using GMM. Can this be done all in Mplus or do I need to do some of it in SAS? 


See Example 11.6 which explains how to generate and save the superpopulation files and analyze them with a separate input. Example mcex5.14 which comes with the Mplus CD and is put on the hard disk as part of the installation shows how to use Monte Carlo to generate multiple group data. In the generation of the population include your binary subpopulation inclusion indicator U. In the first group [U$1*0] and in the second group [U$1*1] would do disproportioned sampling. In the second input USEOBSERVATIONS = u eq 1 would make sure that you analyze the subpopulation. 

limey posted on Friday, May 07, 2004  1:53 pm



Thank you for the help! However, it doesn't look like Mplus can generate multiple group data for mixed models. Is this the case? 


I think you can do multiple group for a mixed model if we mean the same thing by mixed model. Can you describe your model? 

limey posted on Monday, May 10, 2004  8:21 am



I would like to do a simulation using a Growth Mixture Model (with latent continuous intercept and slope factors, and a categorical latent class factor). I would like to simulate a complex sample design, where individuals are sampled at different rates from 2 strata. Later, I would like to add clustering into the sample design. Thanks again for all of your help! 

Anonymous posted on Monday, May 10, 2004  11:03 am



I modified example mcex8.1.inp to generate two strata. To add clustering into the design you have to modify mcex10.4.inp the same way. Tihomir.  title: this is an example of a GMM for a continuous outcome using automatic starting values and random starts two strata inclusion variable for subsequent sampling U montecarlo: names are y1y4 x U; genclasses = c(4); classes = c(2); nobs = 500; seed = 3454367; nrep = 10; REPSAVE = ALL; SAVE = ex8.1rep*.dat; categorical=U; generate=U(1); analysis: type = mixture; model population: %overall% [x@0]; x@1; y1y4*.5; i s  y1@0 y2@1 y3@2 y4@3; i*1; s*.2; c#1 on x*1; i on x*.5; s on x*.3; ! first strata %c#1% [i*1 s*.5]; [U$1*0]; %c#2% [i*3 s*1]; [U$1*0]; ! second strata %c#3% [i*2 s*1.5]; [U$1*1]; %c#4% [i*0 s*1.5]; [U$1*1]; model: %overall% y1y4*.5; i s  y1@0 y2@1 y3@2 y4@3; i*1; s*.2; c#1 on x*1; i on x*.5; s on x*.3; %c#1% [i*1 s*.5]; %c#2% [i*3 s*1]; output: tech8 tech9; 

limey posted on Tuesday, May 11, 2004  12:18 pm



You rock. Thanks muchly! 

limeygrl posted on Friday, May 28, 2004  2:19 pm



One more thing regarding previous question. Is there a way to specify the number of observation sampled from the population seperately in each strata. So if nobs=500, 300 would come from strata one and 200 whould come from strata 2? 

bmuthen posted on Saturday, May 29, 2004  10:50 am



Seems like you can do that by specifying values of fixed probabilities for the classes corresponding to strata, so using the logit parameters of [c#]. I guess the strata could be handled by the V3 knownclass option. 

Annica posted on Wednesday, November 03, 2004  1:05 am



What information can be saved when using TYPE=MONTECARLO in the DATA statement, i.e. when performing a Monte Carlo study and the data has been generated outside Mplus? Using the RESULTS option in the SAVEDATA statement I obtain the parameter estimates for each data set. However, some of the replications do not reach convergence and I would like to know which. I know this information is possible to save when performing an internal Monte Carlo study in Mplus. Thank you. 


For external Monte Carlo, ask for TECH9 to see which replications did not converge. 


Is it possible to specify an alternative significance level at which each replication in a Monte Carlo analysis is tested? I did not see anything in the documentation for Monte Carlo that allows significance level to be specified. I would like to determine the power for detecting differences in linear growth between two groups, but I want to test at alpha = .03125 as a way of approximating a multiplicity adjustment. thanks, Chuck 


There is no way to specify an alternative significance level. 

Emily Blood posted on Thursday, September 27, 2007  12:47 pm



If the data has been generated outside of Mplus, is it possible to enter the true values of the population parameters? In the manual I only see described how to enter true population values if the population values were generated in another Mplus run or if the data were generated within Mplus. I have generated the data in Splus and would like to specify the true population values used so I can use the 95% cover probabilities in the Mplus output from the MC run. Thanks for your help! Emily 


You would do external Monte Carlo if the data are generated outside of Mplus. See Step 2 of Example 11.6. You would enter the population values in the MODEL command. 

Emily Blood posted on Thursday, September 27, 2007  1:22 pm



Thank you. What is the syntax for entering a population value in the MODEL command? I know that, for example,i BY y@2 sets the value of the regression coefficient for y to 2 and [i@1] sets the intercept for the i factor to 1, and f ON x*1 will set the starting value for the regression parameter to be 1. It is not clear from the manual how to enter the true population values in the MODEL command. Can you please let me know. Thanks, Emily 


In the MODEL command you should use and * followed by the true population value for free parameters and @ followed by the true population values of fixed parameters. Please see Step 2 of Example 11.6. The true population values are given in the MODEL command. 

Emily Blood posted on Friday, September 28, 2007  6:44 am



Thank you. Since this is the same syntax for setting starting values for free parameters, does putting these true values in automatically make those the starting values in the estimation of the models? Emily 


Yes, they are also used as starting values. 

Emily Blood posted on Friday, September 28, 2007  7:34 am



Thank you for all of your help. The MC facility in Mplus is really great!!! It seems to do everything I need for my simulation study. 

Emily Blood posted on Tuesday, October 09, 2007  4:57 pm



What is the method of computing the standard error of indirect effects? I have the following code in my model (it is a variation on a latent growth curve model where i is a latent intercept). MODEL INDIRECT: i IND tx; There is a mediating variable between tx and i. The output shows a standard error estimate, but I am not clear on what method is used to compute this since the indirect effect involves a product of two estimated coefficients (a+b*c is the form of the indirect effect, where a is the direct effect of tx on i and b is the effect of tx on a mediating variable and c is the effect of the mediating variable on i). If you could let me know that would be great! Thanks, Emily 


The default is to compute standard errors using the Delta method. Bootstrap standard errors are also available. 

Emily Blood posted on Wednesday, October 10, 2007  1:04 pm



I'm sorry to keep bothering you, but for the a+b*c total effect, the delta method std err would be: sqrt[var(a) + b^2*var(c) + c^2*var(b)], is this correct? If so, this doesn't seem to match the value from the output. I can match the value for the std err of the indirect part of the effect (se(b*c)), but for the total effect, I am getting something larger than what is in the output. I am trying to figure out what I am missing. If you could let me know I would really appreciate it. Thanks! Emily 


You have not included covariance terms in your formula. 


I am trying to run a simulation in MPLUS for a growth model to estimate power. The proposed model has three covariates (one is binary, two continuous), and three time points. I am assuming complete data. How do you specify the parameter variances of the three intercepts and the three slopes. Is there an example you can direct me to? Thanks you! 


The data for the user's guide examples are generated by Monte Carlo simulations. Find the Monte Carlo counterpart of the growth model example in Chapter 6 that is closest to what you are doing to see how this is done. These inputs come on the Mplus CD and are put on the hard disk as part of the default installation. 


Thank you so much!! This is very helpful. MF 


Hi, I am running a Montecarlo simulation for a 2 class nonlinear Growth model where a binary covariate has an effect on the slope in one class, but not the other. I am using the following syntaxt in the model population as well as in the model syntax block to create a binary variable with a 50/50 split, simulating a treatment variable: [x@0]; x@1; When I look at the estimated sample statistics for the first replication, the mean is not .5. Am I doing something wrong, or is due to the fact that the statistics are from the first replication are shown? Thank you for your advice. Hanno 


It sounds like you are not using the CUTPOINTS option to categorize the covariate. See Example 11.1. 


Hi Linda, below is my reduced input (I left out the model block due to size). Do you see what is wrong with it. Thanks. Hanno montecarlo: names are y1y5 x; genclasses = c(2); classes = c(2); nobs = 250; seed = 3454367; nrep = 1000; cutpoints=x(1); analysis: type = mixture; model population: %overall% [c#1*1]; [x@0]; x@1; y1y5*.5; i s q y1@1 y2@0 y3@1 y4@4 y5@7; q@0; i*.5; s*.1; i on x@0; s on x*.3; q on x*.01; s with q*0; i with q*0; %c#1% [i*2.2 s*.5 q*.1]; s on x*.3; q on x*.01; %c#2% [i*2.5 s*.5 q*.02]; s on x*0; q on x*0; 


You need CUTPOINTS (0). If you read the example, you will see that we did not do 50/50 split. 

Erika Wolf posted on Wednesday, October 27, 2010  10:24 am



What do you make of monte carlo power analyses where power appears to be more than adequate yet some replications generated in the analysis do not converge and/or some replications include nonpositive definite matrices? Are the power estimates accurate in this situation, given that the majority of replications are fine? I've noticed this tends to occur as I decrease sample size in an effort to determine the minimum sample size needed, yet power estimates in the output are still well above 80% and estimated parameter estimates do not differ meaningfully from the population. 


That is a potential problem of Monte Carlo based power estimation  it relies on good parameter and SE estimation. An alternative is the SatorraSaris pop cov matrix approach (see our Topic 3 handout), but that does not cover as many model situations. A good question is how well the parameter estimates and SEs are recovered. You can attempt to avoid nonpos def by various tricks (Cholesky decomp or inequality constraints, i.e. var >0). 

Erika Wolf posted on Thursday, October 28, 2010  8:15 am



Thanks for your quick response. A followup question from the thread from yesterday: Can you please clarify how mplus calculates the mean parameter estimates, SD, and SE when some solutions in the montecarlo run fail to converge or produce improper solutions? Does it eliminate those instances and change the denominator for the mean values or does it factor that in some how? Also, are those nonreplications reflected in any way in the calculation of the 95% cover or power estimate? Thanks again! 


The values from all replications that are shown as completed are used. Information from replications that are not comleted are not used. 


hi, i'm trying to estimate samples sizes for a two level SEM with the Monte Carlo simulation. I have a couple of questions whihc are not really clear to me from the User guide and Muthen & Muthen (2002). Given that I actually rely on real data I should run the Monte Carlo from the estimates of my model (obtained with "SAVEDATA: ESTIMATES") or on artificial data generated with Monte Carlo method? If so, here is the second question is: in Muthen & Muthen (2002) you fix the variance of factor to 1, saying that this make it more easy interpret results. Should I do the same in my case (simulation based on real data) thank you very much in advance 


If you feel that your real data solution offers the best assessment of model parameters, then use those values. You don't have to do that. But if you do, do it for both the realdata run and the simulation. 


ok thank you very much Bengt, just one more question: can I use the bias computed with the montecarlo estimates ((average SEpop SE)/pop SE) to correct the standard errors in my real model? thank you 


No, because the real data may have features that are different from the simulated data. The fact that the simulated data uses the model parameters estimated from the real data is not sufficient because the model may not capture all aspects of the real data  even when it fits well. If you see a bias in SEs, you may instead try changing from ML to MLR. 


Ok, thanks. In my MLSEM model I already use MLR, but given that the second level clusters are countries, it ends up that the number of parameters is much higher than the number of clusters. Can I consider the SE estimates with MLR reliable also in this case? I think I remember that in one course you gave, you said not to consider the Mplus warning about "trustworthiness of SE" if you use MLR. Thank you very much your, help is really appreciated 


If the number of betweenlevel parameters is greater than the number of betweenlevel cluster units, this is a problem. If it is the total number of parameters in the model that is greater than the number of clusters, this may also be a problem. This has not been thoroughly studied as far as I know. Note that a minimum of 3050 clusters is recommended. 


Hi Linda, I want to use the nonparametric GMM in a simulation but I don’t know how to specify the model. For example, how can I get a model with csub (3) cnp(2)? Please can you tell me what is wrong with my syntax below ? Thank you in advance. NAMES ARE y1y5 ; NOBSERVATIONS = 1000; NREPS = 1; SEED = 53487; GENCLASSES = csub (3) cnp(2); CLASSES = c (1); !SAVE = ex12.6rep*.sav; ANALYSIS: TYPE = MIXTURE; ESTIMATOR = MLR ; MODEL POPULATION: %OVERALL% i s  y1@0 y2@1 y3@2 y4@3 y5@4; i*1; s*.2; i with s*.11; y1*1.0 y2*1.42 y3*2.24 y4*3.46 y5*5.08; %csub#1% [i*1.5 s*1.6]; i*5; s*.2; i with s*.069; y1*1.0 y2*1.42 y3*2.24 y4*3.46 y5*5.08; %csub#2% [i*1.8161 s*0.1930]; i*25.3642; s*0.4757; i with s*2.6744; y1*1.0098 y2*1.4256 y3*2.2380 y4*3.4435 y5*5.0170; %csub#3% [i*2 s*3]; i*25.3642; s*0.4757; i with s*2.6744; y1*10 y2*1.4256 y3*3.80 y4*10 y5*5; 


Please send the output and your license number to support@statmodel.com so I can see the error message. 


Greetings, I working through a modified version of example 12.4 for simulating a growth model. The default code runs as expected, but when I specify a correlation between iw/sw and ib/sb > .11 I receive a fatal error message (Population covariance matrix is NPD). Is this due to the starting values of the iw/sw and ib/sb? Any help is appreciated. MONTECARLO: NAMES ARE y1y3; NOBSERVATIONS = 244; NREPS = 500; NCSIZES = 1; CSIZES = 61 (4); MODEL POPULATION: %WITHIN% iw sw  y1@0 y2@1 y3@2; y1y3*.2; iw*1; sw*.2; iw with sw*.21; %BETWEEN% ib sb  y1@0 y2@1 y3@2; y1y3@0; [ib*1 sb*.5]; ib*.2; sb*.1; ib with sb*.21; ANALYSIS: TYPE IS TWOLEVEL; MODEL: %WITHIN% iw sw  y1@0 y2@1 y3@2; y1y3*.2; iw*1; sw*.2; iw with sw*.21; %BETWEEN% ib sb  y1@0 y2@1 y3@2; y1y3@0; [ib*1 sb*.5]; ib*.2; sb*.1; ib with sb*.21; 


You say ib*.2; sb*.1; ib with sb*.21; But translating this into a correlation that is 0.21/0.1414 >1, so your covariance can't be greater than 0.1414. 


Thank you. I was in the process of recomputing and found that that seemed to be the case. 


Hello, After getting a Monte Carlo simulation of a bivariate growthcurve model with multiple indicators to run, the program noted that many of the replications derived estimates from nonpositive definite matrices. I'm wondering whether these are included in the final results presented, and if so, if I should just disregard the overall results from the power analysis then. Any insight would be greatly appreciated. Thank you! 


All converging replications are included. Nonposdef matrices can occur with negative residual variances for the outcomes if population values are small or sample size is small, and also if growth factor correlations in the population are high or sample size small. 


Hello! 1. I am trying to start with a Monte Carlo analysis to estimate power for a simple growth model with 6 assessments. I'd like to specify a constant corr across time, but this doesn't work: y1 WITH y2y6@0.3 ; y2 WITH y3y6@0.3 ; y3 WITH y4y6@0.3 ; y4 WITH y5y6@0.3 ; y5 WITH y6@0.3 ; What am I doing wrong? 2. Since the DV is created as standard normal, should the effect of a predictor at baseline be in SD units (Cohen's d). 3. And shouldn't the slope parameters be in standard units? Thanks! bac 


1. Send the output and your license number to support@statmodel.com. 2. Yes. 3. Yes, if the predictors is also standardized. 


Thank you for your help with that model, Linda! Now I'm learning how to carry out a Monte Carlo analysis for a growth model using an ordinal outcome. Simple pre posttest design, testing difference in change for two randomized groups (equal Ns), and controlling for two continuous covariates as nuisance variables. I'm modeling equal proportions of cases in each of the 4 ordinal categories. Do you see any problems with the specified threshholds? Or any other problems I should be asking about? Thanks for any help you can provide! i s  u1@0 u2@1 ; [ i@0 s*.1 ] ; i*.3 ; s@0 ; i WITH s@0 ; [u1$1*1.1 u1$2*0 u1$3*1.1] ; [u2$1*1.1 u2$2*0 u2$3*1.1] ; i ON x@0 z1*.18 z2*.1 ; s ON x*.824 z1*0 z2*0 ; ! OR for s ON x = 2.28, medium 


You don't want to do growth modeling with only 2 timepoints. The parameters won't be identified. I would just do an ancova type regression of post on pre and group. 


First post: Thank you for your recommendation, Bengt! That brings back memories of articles long ago that debated the best way to analyze data from a pretestposttest control group design, and if I recall, your suggested method was one of the top two winners! (The other being the ANCOVA of the change score on the pretest!). I have used the commands below to do the analysis the latent growth/multilevel way, and obtained results for both the LGM and a MC analysis for the model. This model tests the 3way interaction of two fixed predictors on the (linear) change from pre to post. I think this is analogous to the classic 2way ANOVA. There was no problem with identification. Am I missing something? bac Model: i s  alcuse0@0 alcuse2@1 ; i ; s@0 ; i WITH s@0 ; i ON cpeer ; gamma1i  i ON ccoa ; [gamma1i] ; gamma1i ON cpeer ; gamma1i@0 ; s ON cpeer ; gamma1s  s ON ccoa ; [ gamma1s ] ; gamma1s ON cpeer ; gamma1s@0 ; 


Second post: Aside from what the best method is for the prepost analysis, I'm not sure how to do the Monte Carlo analysis using your suggestion, since it treats the pretest as a predictor. The analysis I wrote about is for an ordinal outcome at pre and posttest, with 4 points on the scale. If I use the ANCOVA with posttest on pretest, how can I specify that the pretest predictor is ordinal? I thought that Monte Carlo predictors could only be normal or binary. Next, in this MC analysis, if I want (say) equal proportions in the four categories, with the threshold defined as Ln[p/(1p)], does this give the correct thresholds? [u1$1*1.1 u1$2*0 u1$3*1.1] ; ! Equal proportions of 25% Thanks! bac 


When you fix the variance of the slope growth factor to zero and the covariance of the intercept and slope growth factors to zero, this is not really a growth model. At the top of page 493 of the user's guide, you will find the formulas for probit regression. You can generalize these to logisitic regression using F (t) = 1 / 1 + exp (t) Mplus Discussion posts are limited to 1500 characters. In the future, keep your posts within that limit. 


Hello, I am running a semicontinuous growth model for a continuous outcome as in example 12.9. You note that the exponential function must be applied to continuous variables saved for subsequent twopart analysis. For our data the means in the output for replication 1 for the continuous variable appear to correspond with the means from the generated data. Am I correct in assuming then that the output is on the base e log scale as well? Also, if I am inputting means for the continuous variable for the model population command, should these already be log transformed or in there raw form? Thanks so much for your help! 


If the data are saved using the DATA TWOPART command, the continuous variable has been transformed using elog, so to get back to the original variable exp needs to be used. For montecarlo model population the variables should be in their original form and done in line with the montecarlo version of UG ex 12.9 (see website). 


When you say data saved for the DATA TWOPART command, are you talking about if we had done the analysis on real data and then used that for generation and coverage? This is my fault for not expressing the issue clearly, but I think maybe I am still missing something. If in the Monte Carlo I use REPSAVE to save all the replications, the values for the continuous variable have been elog transformed according to example 12.9. What confuses me is if I input intercepts and mean values for the continuous variable in the Monte Carlo in their original form, they seem to be untransformed in the data files for the replications. For example, if I code [y1*20], then the mean of y1 in the replication .dat files, and in the output, is closer to 20, and not the elog of 20 (somewhere around 3) as I would expect. This made me think that I should be putting in elog transformed values in the model population command, but you seem to be saying otherwise, so I can't quite figure out what I'm missing. I've probably misunderstood the notes to example 12.9, but really appreciate any assistance in clearing up my confusion. 


I agree that this can be presented more clearly. What the text on page 436 tries to convey is how a "Step 1" ("internal") Monte Carlo run relates to a "Step 2" ("external", DATA TYPE=MONTECARLO) Monte Carlo run, where I use the words Step 1 and Step 2 in line with the chapter 12 examples. The Step 1 setup is shown in UG ex 12.9. The data generated are not obtained by a transformation (like log) but correspond to the parameter values given. But if you submit those data to a second run, say a Step 2 Monte Carlo run, the continuous variables will be by default log transformed when using DATA TWOPART. So to get the same parameter estimate results in Step 2 as in Step 1, you want to antilog (exponentiate) the continuous variables in the data generated by Step 1 and that can be done using DEFINE in the Step 2 run. So the exponentiation is just to counteract the DATA TWOPART action in Step 2. Alternatively, the DATA TWOPART command in the Step 2 run can use the option TRANSFORM = NONE. I hope that makes it clearer. 

Anna King posted on Monday, September 15, 2014  8:06 am



Hello Professors, Sorry about my stupid question. When I'm doing a simulation study to compare the influence of different estimation methods (e.g, onestep and threestep methods) on covariate effects, should I generate one set of data for later analyses? Thanks! Anna 


Yes, please see Web Note 15. 

Back to top 