Message/Author 

Anonymous posted on Friday, December 12, 2003  7:48 pm



I am trying to calculate power for a multilevel studylatent class growth. I have the example from the addendum to the user’s manual (example is for a two group, three level model). I also have Muthén and Muthén (2002) describing how to use a monte carlo study to decide power. A few questions in trying to integrate the two pieces of information. Is the first step to calculate ICCs and design effect to see if I even need to do multilevel modeling? I’m following the example from the product support area. I’ve removed the input for the second group. For the monte carlo study, do I simply run a multilevel model and get the starting values from the output to get starting values for the within and between parts of the model? In the multilevel model, I want to examine the effect of a between variable on slope. Do I only specify that regression in the between part of the model, only in the within part or in both places? Thanks. 

bmuthen posted on Saturday, December 13, 2003  12:24 pm



Answered off line because of further questions. 

Anonymous posted on Sunday, January 25, 2004  3:24 pm



We are presently planning a family study where we would like to use multilevel SEM to examine potential influences on the psychosocial adjustment of children (level 1) nested within family systems (level 2). In the current or upcoming versions of Mplus, is there a way to estimate nonrecursive models? We have not seen this issue mentioned in any of the references we have collected on multilevel SEM. Thank you in advance for whatever help you can provide. 

bmuthen posted on Sunday, January 25, 2004  3:31 pm



The current version of Mplus allows nonrecursive multilevel SEM. 

Anonymous posted on Wednesday, January 28, 2004  3:16 pm



We are planning a family study where we would like to use multilevel SEM to analyze data collected from children nested within families. I have been reading what has been written about power analysis and sample size estimation for multilevel models. In those papers, there appears to be concern about power and the number of clusters. I see little comment about power and the number of observations within each cluster other than a note that more variability in observations per cluster can have a negative effect on power. In this data set, we expect to have more that 400 families with either 1 or 2 children per cluster so a low number of clusters is not a concern. Most (70%), however, will only have 1 child per cluster. Should we be concerned about the number of clusters with only 1 observation? How might this affect power? Thank you for your reply about nonrecursive models. 

bmuthen posted on Wednesday, January 28, 2004  3:26 pm



Having clusters with only 1 child is not a concern. They do not contribute information to the estimation of parameters related to withincluster (child) variation, but do contribute to the estimation of clusterrelated (family) parameters. So the question is for which type of parameter you want high power. If you have a small number of families with more than 1 child and want high power for withinfamily parameters, there might be a problem. As you say, you are likely to have good power for familylevel parameters. You can study these power matters using the Mplus Monte Carlo facility if you have a notion of the approximate model and parameter values. 

David Bard posted on Thursday, May 18, 2006  10:32 am



I am trying to run a power analysis for a longitudinal RCT where data are expected to fit a latent difference score SEM (McArdle and colleagues invention an upcoming SEM pub cited below). This has forced me to ask some hard questions about the structural relations between variables at the within and between levels using the Mplus 2level structure. Can you share some insight? 1) I'm finding it very hard to decide whether or not the SEM model autoregressive and coupling effects (for bivariate growth) need to be constrained to be equal at both levels in the same way that Muthen (1997) suggests constraining the slope and indicator effects to be the same in clustered LGM. How does one go about deciding such things even for more typical multilevel SEM situations? In other words, when should loadings and regression effects be constrained across levels, and when should they be allowed to vary? Is this purely an empirical issue, or should theory drive model specification. 2) In Muthen & Muthen (2002), without 3rd level (or Mplus' 2nd level) clustering, the authors show how to run a monte carlo power analysis for an LGM with a binary covariate. 2a) Slope effect size was defined as the difference in the average group slope estimates divided by the variance of the latent slope term. The difference in group slopes is directly related to the regression of the slope term on the "binary" covariate, which if scored 0/1, has a mean of .5 and a variance of .25. Using these estimates, regression effects of .2 and .1 were chosen to represent medium and small effect sizes. These effects appear in the appendix Mplus code, but the covariate is scaled with a variance of 1 and a mean of 0 and not .25 and .5 for data generation. 2ai). Is the monte carlo SEM used to generate the outcome data modeling the covariate as dichotomous 0/1 (or perhaps now, 1/1, so that mean=0 and var=1), or continuous only later to be transformed into the desired dichotomy? 2aii). Regardless, does this variance rescaling not change the effect size? 2b) When trying to plug in reasonable estimates of slope and intercept residual variability, does it make sense to use the Muthen & Muthen (2002) estimates at the within level, and use a ratio of the between to within variation equal to the expected ICC? For example, I expect an ICC of around .01 (at least for time 1 measurements), so does it sound reasonable to set the between slope residual variance equal to .01*V(WS) i.e., .01 times the estimated within slope residual variance (which = .09 in Muthen & Muthen, 2002). Thank you. McArdle, J. J. (in press). Latent difference score approach to longitudinal dynamic structural analysis. Structural Equation Modeling. 

David Bard posted on Thursday, May 18, 2006  10:39 am



Oh, sorry, one more question related to my first. In older applications of these multilevel models (examples on the Mplus homepage for clustered LGM), residual variation for the observed outcomes is modeled at both the within and between levels; yet, in newer applications, this variation is only modeled in the within level (examples included in newest version of Mplus manual). How does one make these decisions? 


1. Growth models imply a mean structure for the variable that is repeatedly measured over time. If you look at the 3level version of the growth model in standard multilevel books, you see that it has time scores that multiply both the means and residuals of the growth factors. The latter part ends up on within and the former on between, but the two parts are using the same time scores; hence the acrosslevel equality in Mplus. Other models such as McArdle's have to be studied from this point of view to see if the same modelimplied equality across levels holds. I have not yet read the article. 2a The x is first generated as N(0,1) and then dichotomized. It is the dichotomized version that is used in the model to generate the data. 2b. Yes this is reasonable. Followup question: The betweenlevel residual variances are often close to zero. In conventional 3level growth modeling such as HLM, they are fixed at zero. Mplus allows them to be estimated and sometimes that is important. However, with categorical and other outcomes it is computationally timesaving to restrict them to zero. 

David Bard posted on Friday, May 19, 2006  7:59 am



Thank you. Those are excellent explanations. Two quick followups: I cannot seem to get a dichotomous variable to run at the between level in my SEM model. I keep getting the message: Only xvariables may have cutpoints in this type of analysis. Cutpoints for variable W were specified. Yet, when I run the LGM example 11.6 from the manual, I do not get this message. Are cutpoint variables at the between level only allowed for LGMs? Also, if you create a dichotomous variable in the monte carlo routine, is the default scoring 0/1? Does rescaling the variable to a mean of 0 and variance of 1 in the model syntax make the dichotomous variable 1/1? Thanks again. 


You do not create binary dependent variables using the CUTPOINT option. This is for independent variables only. The GENERATE option is used to generate binary dependent variables. See the user's guide for further information and examples. If you have further problems, send your input and license number to support@statmodel.com. 

David Bard posted on Saturday, May 20, 2006  3:57 pm



I'm sorry I should have been more clear, I was creating a binary independent variable and have now discovered it works fine as long as the rescaling and mean change sytnax (e.g., [w@0] and w@1) is left out of the model. Now that it's working, I have another dilemma. I'm interested in determining power for a clinical trial where blockrandomization is used to ensure equal number of treatment and control clinics (only 10 clinics enrolled). Unfortunately, with the syntax we've discussed thus far, my treament variable is not always evenly distributed across clusters. Is there a way to avoid this in the monte carlo routine? 


You can use a multiplegroup approach where you work with a fixed sample size for the treatment group and for the control group. 

David Bard posted on Thursday, July 13, 2006  1:07 pm



I'm back to a similar problem described above (May 20). I'm trying to fit the multiplegroup approach but having trouble with the correct sytnax for this type of monte carlo. I keep getting the following error message when using the Nobservations, Ngroups, and CSIZES syntax below. NOBSERVATIONS = 300 300; NGROUPS =2; . . . NCSIZES = 3; CSIZES = 9 (30) 8 (20) 17 (10); !cluster sizes or CSIZES = 9 (15) 8 (10) 17 (5); !cluster sizes within each group or even NCSIZES = 1; CSIZES = 50 (12); !cluster sizes or CSIZES = 50 (6); !cluster size within each group ***The number of sets of cluster information does not match the number of groups. 


For multiple group analysis, you need to specify the number of unique cluster sizes for each group and the sizes for each group. This is done using the  symbol. See Chapter 18 where the NCSIZES and CSIZES options are discussed for an example of how to use these options for multiple group analysis. 


I'm interested in doing a power analysis for a RCT comparing group tx vs. individual tx (block randomization). Would you have a good quick reference that could help in setting up this power analysis? I have read the info on the website for how to set this up using the Monte Carlo simmulation method, but wondered if there was an applied example that I could also refer to. I am also concerned with looking at mediators and moderators in this RCT and trying to determine power for these analyses as well. Any suggestsions would be greatly appreciated. 


See the article Muthen and Muthen (2002) in SEM on our web site under Papers. 


Bengt, Thank you for the reference, it is very helpful. Should there be any concern with calculating power for a design where individuals nested within groups for the group tx arm of the study but not for the individual arm of the study. Best wishes, Tom 


UG Ex11.4 shows how to do MC studies in a 2level setting. You can build on that example. If one of the groups does not have clustering in this way, you could probably use "external" MC analyzing data generated for the tx arm (2level) separately from the other arm (singlelevel)  and then send the combined data to MC. 

Thomas Olino posted on Wednesday, November 29, 2006  11:19 am



I am attempting to estimate power for moderational analysis in a multilevel model. Would it make sense to generate data based on main effects, then use those generated data to compute the interaction for each of the generated data sets, and then see how often the interaction turned out signifcant? Thanks! 


This makes sense to me. But you will need to use our external Monte Carlo to be able to use the DEFINE command to create the interaction. External Monte Carlo is described in Example 11.6. 

Thomas Olino posted on Wednesday, November 29, 2006  6:21 pm



Thanks for the tip. A colleague had looked at 225 replications by running analyses with individual datasets. The external monte carlo is much more efficient. 

Thomas Olino posted on Thursday, November 30, 2006  6:34 pm



One additional follow up  in the monte carlo simulation, is the appropriate expectation that the interaction should be significant a high percentage of time or approximately 5% of the time? Thanks. 


Interactions have low power in general. The power depends on sample size and effect size. 


Bengt, I have followed your advice from Nov1314 and run two sepearte MC studies to look at power within each arm of the study (first grp then individual). I'm now trying to build upon UG ex11.4 as you suggested with the group and individual data. I'm interested in (1) the power to detect difference in outcome btwn tx (grp vs ind), and (2) whether a mediator (social support) is stronger in grp vs tx. I'm not sure exactly how to set this up in part because I'm not sure what parameter gives me the test for (1). Do I set it up as ex11. but with: NCSIZES = 2; CSIZES = 10 (8) 1 (80); and then interpret the effects of w in the model or do I also create a categorical X variable for tx condition? I think I could figure out (2), if I was sure about (1), but any suggestions would be helpful. Tom 


You want to say csizes = 10(8) 80(1) if you have 80 people who are not clustered within a higher unit. Saying 80(1) means that you have 80 clusters of size 1. But I was actually suggesting "external MC" which is in line with ex11.6. 


Thanks Bengt, I see now my typo and appologize. I can run everything as you suggested, with individual and group data sets generated using SAVE/REPSAVE comands and then input into an "external MC." Do you have any thoughts on number (2) the example of moderated mediation I mentioned on the 6th? 


I think your question of the 6th was how to assess if a mediator is stronger in one group than the other. You can do this via Model constraint using the New option to create a new parameter corresponding to the difference between the 2 mediator estimates. You will then get a monte carlo summary of this new parameter to see how often equality of the mediator estimates is rejected (see last output column). 


Thanks again Bengt, That was very helpful! and makes wonderful sense. I will try that right away. Best wishes, Tom 

Boliang Guo posted on Thursday, January 11, 2007  5:30 am



Hi Prof Muthen, I try to get MC results for estimate estimates in a basic 2 level path analysis, PC clevelly ask me: this is an example of twolevel path analysis with mediating effect MC results? does this mean I can not get MC results for estimates in a 2 level path analysis model with current MPlue yet? 


Examples 9.3, 9.4, and 9.5 are twolevel path analyses. They have Monte Carlo counterparts. These come with the Mplus CD and are also available on the website. This type of Monte Carlo analysis has been available for several years. 

Boliang Guo posted on Thursday, January 11, 2007  10:29 am



thanks, Linda. 

Boliang Guo posted on Thursday, January 11, 2007  12:10 pm



hello again Prof Muthen, I add model constraint to a 2 level path analysis MC model for mediation computation, I compute the estimates from Kennys data in step 1, then in step 2 for MC estimate, when I add same constraint to both model population and model, Mplus say, 'A parameter label has been redeclared in MODEL CONSTRAINT.' if I remove constraint command from Model population or from model,say, leave only one constraint command in model or model population, then, mplus work and the result is same whereever I put model constraint in model or model population. my question is: in MC analysis, 'model constraint' command is only for Model not for model population, right? thanks. 


MODEL CONSTRAINT is only for MODEL. 


Hello, I am doing a Multilevel SEM Monte Carlo study to test the accuracy of estimation with small group sample sizes. I have a 2level factor model with 4 indicators (measured at the individual level). Both at the within and between levels, two independent variables have an effect on the latent factor. Up till now, the indicators (at the individual level) were assumed to follow normal distributions. However, I want to introduce nonnormality into the data generation model by manipulating kurtosis and skewness of the indicators. In the papers on this site if found that you suggest using a mixture of subpopulations for which the latent factor follows a normal distribution with different parameters. Am I getting this more or less right? If so, there are a couple of things I do not see clearly yet... 1. Can this mixture approach also be applied in the case of multilevel sem? 2. How do you choose the proportions and means/variances of the subpopulations to assure that certain skewness and kurtosis values are obtained? 3. Is it possible for indicators that load on the same latent variable to have divergent skewness and curtosis values? Thanks! Bart 


1. Yes. You might want to start with the Monte Carlo counterpart of Example 10.2. You can generate the data as two classes and analyze it as one class. 2. Unfortunately, this is trial and error. 3. Yes. 


Hello again Hopefully a final question wrt a Monte Carlo study for a twolevel SEM model with small group sample sizes. I want to report population within and between covariance matrices. Elsewhere on this discussion board, I found that for onelevel Monte Carlo studies, population covariance matrices can be obtained by running a separate analysis with all parameters constrained to the population values, and the identity matrix as input file ('type is covariance'). However, to me it seems that this 'trick' cannot be used in the case of twolevel models, because these models do not allow a covariance matrix as input (is this right?). Instead, I thought of using one of the datasets that were generated during the monte carlo as input file (and of course also constraining all parameters to the population values). Is this a correct way to obtain population covariance matrices? Thanks beforehand! Bart 


That's right  you can use any data. It only matters that you fix all parameters at the pop values. 


Hello, On page 604 of Muthen and Muthen (2002) you state that the R^2 value for the intercept growth factor is 0.20. Could you clarify how you arrived at this value. I've read through it and seem to be missing something. Thank you! Tom 


This comes about as follows: Beta**2*var(x)/Beta**2*var(x)+ var(r)= .5**2*.25/.5**2*.25 + .25 = .0625/.3125 = .2 


Thank you Linda. Could you also tell me how you arrived at an effect size of 0.63 for a regression slope coefficient of 0.2? My calculations are not no coming out. Also, why are you interested in both an effect and R^2 for the intercept? 


We look at Rsquare for both the intercept and slope growth factors. We look at effect size for the regression coefficient in the regression of the slope growth factor on the covariate because that is the parameter for which power will be assessed. Effect size is used to gauge the size of the effect. The effect size is the difference in the slope means for the two values of the covariate divided by the standard deviation of the slope growth factor: .2 / Sqrt (.04*.25 + .09 = .63) 


This is not a multilevel question but it seemed to fit with Thomas Schmitte's post so I'm placing it here. I would like to run a monte carlo LGM for power estimation with missing data similar to examples 3 and 5 in Muthen and Muthen (2002). However, the estimates for missing data within these examples (12% at T1, 18% at T2, 27% at T3, and 50% at T4) is too high for my purposes. Here is the input for missingness from those examples: MODEL MISSING: %OVERALL% [y1@2 y2@1.5 y3@1 y4@0]; y2y4 ON x@1; I assume that the 2,  1.5, etc. are threshold values for degree of missingness. How can I vary these? Our pilot work suggests retention and data completion rates of .80.90 across 5 time waves. I don't want to run these assuming no missing data either. thanks much! Cindy 


You can read about the input for this type of Monte Carlo run in the UG ex 11.2 discussion. See especially page 338 in the V5 UG. The intercept statement in MODEL MISSING, say [y3@1]; says that the intercept is 1 in the logistic regression for the probability of missing at y3. If we have no covariates in MODEL MISSING, or if we consider the covariate(s) at the value zero, this 1 intercept value translates to 0.27 of having missing y3. A lower logit intercept (say 2) gives a lower probability. Use the formula L = log(p/(1p)) to go from the probability p to the logit L. So if you want p=.10 you get L=2.2. Now, if you want x to influence missingness, which you show, then the missingness percentage goes up for increasing slope on x. You have to experiment to get the values you want. The easiest approach is trial and error using one replication and a large sample size (say 10,000). You generate one rep and save the data, then run the data as real data and request "patterns" in the Output command to get the information about the missingness percentages. 


Hi, I have a question regarding a mulilevel monte carlo simulation for comparing different computation approaches to multilevel data sets. I have simulated data for a multilevel model with one level 1 variable (X), a level two variable (Z), and a cross level interaction (XZ) affecting the outcome (Y). Using the external monte carlo analysis tool I want to compare a multilevel model to an aggregated regression approach. While it is no problem to aggregate the outcome (Y) and the level 1 predictor (X) using the CLUSTER_MEAN option, I can not create an interaction term (X*Z) thereafter, due to the restriction that a variable created using the CLUSTER_MEAN function cannot be used in subsequent DEFINE statements in the same analysis. Is there any possibility to create an interaction term (X*Z) for an aggregate regression analysis to compare it to the original crosslevel interaction? Thank You very much! Till 


You would need to do the analysis in two steps. Save the cluster mean variables and then use DEFINE in the next step to create the interaction term. 


Thank you for your quick respondence! I tried to do that, but failed to save the (1000) data files used in the external monte carlo analysis tool. Using the savedata command is not possible with Type=Montecarlo, neither can the Repsave option be used in the external monte carlo analysis. So is their a possiblity to save the cluster mean variables for all genareted data sets used for the external monte carlo analysis (see input statement below)? Again, thank you for your help! Till File=MC1list.dat; TYPE=Montecarlo; Variable: Names= x2 y x1 clus; UseVariables = x2 x1c yc; Cluster=clus; Define: x1c= CLUSTER_MEAN(x1); yc = CLUSTER_MEAN(y); ... 


I can't think of any way to do this except by doing it for each data set separately. You might check on the website to see if Using Mplus via R under HowTo might be of help. 

Jan Stochl posted on Friday, November 02, 2012  8:10 am



Dear Linda and Bengt, I am trying to assess robustness of estimators against ignorance of clustering for multilevel data(i.e., when researchers use nonmultilevel CFA for clustered data). For that purposes I have simulated datasets with different levels of ICCs (I simulated 100 datasets for each of 11 levels of betweenlevel variability) Beside that, I am trying to show that there is no parameter bias (regardless of ICC level) when clustered data are correctly analyzed using multilevel CFA. In the case when clustered data are correctly analyzed using hierarchical approach, everything works as expected except for the data that were simulated to have ICCs close to zero. In that case I observe unexpectedly large variability within each model parameter and deflated chisquare. I use WLSMV estimator. I hypothesize that this might be consequence of how MPlus handles negative estimates of ICC which may occur when the betweenlevel variability is very small. I cannot find anywhere what MPlus does in this case? Or do you have any other explanation? Thanks a lot for your reply, Jan 


Jan When ICC is close to 0 that means you are dealing with nearly nonexistent between level random effects and singular between level matrix. In such situations in practical setting you would want to abandon these random effects from the model. Look in tech 9 for error messages and warning in the simulation. Possible attempts to improve the estimation would include increasing the number of integration points decreasing the convergence criteria, redefining the model as a more restricted model with fewer random effects. In principle negative ICC estimates can not occur with the WLSMV estimator since it uses the EMalgorithm, but ICC near zero means random effect variances near zero which can lead to estimation difficulties or nonconvergence. Tihomir 

Jan Stochl posted on Wednesday, November 07, 2012  11:48 am



Thanks a lot Tihomir, it helped a lot! Jan 

Joop Hox posted on Tuesday, January 08, 2013  4:37 am



I am trying a MonteCarlo simulation with an intervention that exists only at the between level. I want the intervention to be a binary variable, so I want to use CUTPOINTS to create that. Mplus keeps telling me that CUTPOINTS does not work for dependent variables, but in fact my intervention is an independent variable. Why doesn't this work? Here is my simulation setup: TITLE: Mediation model 2>1>1 MONTECARLO: NAMES ARE atti norm behav interv; NOBSERVATIONS = 1000; NCSIZES = 1; CSIZES = 200 (5); NREP = 100; ! SEED = 100656; ! CUTPOINTS = interv(0); !cutpoints does not work? BETWEEN = interv; MODEL POPULATION: %WITHIN% atti@1; norm@1; atti WITH norm@.2; behav ON atti@.5 norm@.5; %BETWEEN% atti@1; norm@1; atti WITH norm@.2; behav ON atti@.5 norm@.5; interv@1; atti on interv@.5; 


I suspect the you mention the variance of interv in the MODEL command which makes Mplus interpret it as a dependent variable. Remove the variance from the MODEL command. 

mpduser1 posted on Friday, April 05, 2013  8:04 am



I am trying to perform a power analysis for a simple linear regression model (1 dichotomous / dummy predictor) where I have clustered data, but the clustering is not taken into account in the model (analytic model has been misspecified). My objective is to see how much power I am losing due to clustering in the data (i.e., data are not IID). I am using Mplus 7.0. Have I set up my models in the nonIID case correctly? I ask because preliminary results indicate that I'm getting more power using clustered, nonIID data, than IID data. Thank you.  MONTECARLO: NAMES ARE y x; NOBSERVATIONS = 400; NREPS = 1000; SEED = 2013; CUTPOINTS = x(0); NCSIZES = 1; CSIZES = 40(10); WITHIN = x; BETWEEN = ; MODEL POPULATION: %WITHIN% [x@0]; x@1; y on x*.35; %BETWEEN% y@.20 ; ANALYSIS: TYPE IS TWOLEVEL; MODEL: %WITHIN% y on x; %BETWEEN% OUTPUT: TECH9; 


The "increased" power you are seeing is a consequence of the standard errors being underestimated when you do not account for clustering. This is not truly increased power, you are using inappropriate/standard errors for you tests 


The power estimate is not valid because you have not given any coverage values in the MODEL command. Coverage values are taken from the MODEL command. Population parameter values for data generation are taken from the MODEL POPULATION command. Your MODEL command should be: MODEL: %WITHIN% y on x*.35; %BETWEEN% y*.20; 

mpduser1 posted on Friday, April 05, 2013  9:35 am



Dan and Linda, thank you. Linda, If I specify the following, am I the setting up the model appropriately, while also fixing an ICC of .20? Also, a propos of Dan's comment, do I need to specify COMPLEX to get correct S.E.s and power calculations? Thank you.  MODEL POPULATION: %WITHIN% [x@0]; x@1; y on x*.35; y@.8; %BETWEEN% y@.2; ANALYSIS: TYPE IS TWOLEVEL; MODEL: %WITHIN% y on x; %BETWEEN% [y@0]; y@0; 


See Example 12.6. 


Hi Linda  How can I create an ordinal IV (representing assessment times) in a Monte Carlo analysis? I'm trying to set up a program to estimate power for a multilevel approach to longitudinal analysis, and I can't figure out how to create an ordinal predictor representing time. Thanks! bac 


See the GENERATE option. You would generate a variable with more than one threshold. 


Thanks, Linda  I tried that with: names are y t x ; CUTPOINTS = x (0); GENERATE = t (4 p) ; but since time is an xvariable, I get the warning: Xvariables should not be specified as categorical in the GENERATE option. The CUTPOINTS option should be used for xvariables instead. The following variable was specified as categorical in the GENERATE option: T But CUTPOINTS can only be used for binary predictors. I need to generate (say) 5 records per case to represent 5 occasions 15 (or preferably, 04), to allow predicting y ON t. Thanks! bac 


Please send the full output and your license number to support@statmodel.com. 


Is it the case that for a Monte Carlo MLSEM with a dummycode at level 2 to reflect treatment status, that if the residual variances for the indicators are fixed at 1 then the factor variances at between and within levels become the proportion of variance at that specific level (e.g., fb2*.70, fw2*.30 with y11y32*1 would indicate 70% of the total variance is due to between level) or is it the between+within+residual? 


At each level the total variance at that level is as usual lambda^2*V(factor)+V(residual) 


I'm interested in estimating sample size need to achieve power in a crossclassified or threelevel model with categorical outcomes in Mplus 7.2. Reviewing Mplus's manual and statmodel, it seems like this models would need to be estimated using Bayesian estimation. Is it possible to estimate sample size in this type of model via Mplus's Monte Carlo methods? 


Yes. And you find the underlying Monte Carlo runs for all the UG examples on our website. 

dlsherry posted on Saturday, January 24, 2015  9:57 am



Hello. I am trying to run a monte carlo power analyses for a two level path model. I can get the model to run as a regular analysis but I cannot get the model to run as a monte carlo analyses. I have trying fixed the variances to zero but I keep getting the following error: *** FATAL ERROR THE POPULATION COVARIANCE MATRIX THAT YOU GAVE AS INPUT IS NOT POSITIVE DEFINITE AS IT SHOULD BE. Could you provide any syntax for a two level path analysis for monte carlo? I have not been able to find one. Thank you. 


Fixing variances to zero will give that message. See mcex9.3.inp or 9.4 or 9.5. These are the Monte Carlo analyses that generate the data for Examples 9.3, 9.4, 9.5. 


DearLinda & Bengt, I am trying to estimate power using monte carlo approach for a twolevel (time points nested within individuals) regression model, with one withinparticipants preidctor (time 'x'), and one betweenparticipants predictor (group 'w'). It is the interaction between group and time that I am interested in checking the power for. The outcome is continuous. I am relatively new to using Mplus for multilevel analyses, so was hoping to run the syntax past you: TITLE: two level regression MONTECARLO: NAMES ARE y x w; WITHIN = x; BETWEEN = W; NOBSERVATIONS = 5040; NCSIZES = 1; CSIZES = 120 (42); NREP = 100; ANALYSIS: TYPE = TWOLEVEL RANDOM; MODEL POPULATION: %WITHIN% S  y ON x; Y*1 X*1; %BETWEEN% y on w*0.24; S on w*0.23; MODEL: %WITHIN% S  y ON x; Y*1 X*1; %BETWEEN% y on w*0.24; S on w*0.23; A specific questions: I presume the regression weights (e.g., y on w*0.2) represent unstandardized values (assume I am correct and these are regression weights). Many thanks for your help 


Looks ok, except you have to give a w variance on Between in Model Population. And you don't want to give an x variance on Within in Model (x is exogenous). The coefficients are unstandardized. 


Many thanks. I have adapted the model slightly to include the mean and variance for w (its a binary variable, so mean = .5, variance = .25). I've noticed that whether I set (or leave out) the variance of Y influences the power I get and also the ICC, but I am unsure how to control this. Ideally I want to set the ICC for Y to a value around .4, how would I achieve this in this model? Syntax below: TITLE: random intercept two level MONTECARLO: NAMES ARE y x w; WITHIN = x; BETWEEN = W; NOBSERVATIONS = 5040; NCSIZES = 1; CSIZES = 120 (42); NREP = 100; ANALYSIS: TYPE = TWOLEVEL RANDOM; MODEL POPULATION: %WITHIN% y*1; S  y ON x; %BETWEEN% y on w*0.25; S on w*0.25; w*0.25; [w*0.5]; MODEL: %WITHIN% y*1; S  y ON x; %BETWEEN% y on w*0.25; S on w*0.25; w*0.25; [w*0.5]; 


Sorry, just found a thread on this  so I can manipulate the ICC of Y by asjusting the ration of between and within variance for y  so this moel is reasonable: TITLE: random intercept two level MONTECARLO: NAMES ARE y x w; WITHIN = x; BETWEEN = W; NOBSERVATIONS = 5040; NCSIZES = 1; CSIZES = 120 (42); NREP = 100; ANALYSIS: TYPE = TWOLEVEL RANDOM; MODEL POPULATION: %WITHIN% S  y ON x; y*.6; %BETWEEN% y on w*0.25; S on w*0.25; w*0.25; [w*0.5]; y*.4; MODEL: %WITHIN% S  y ON x; y*.6; %BETWEEN% y on w*0.25; S on w*0.25; w*0.25; [w*0.5]; y*.4; 


Looks ok, except you want to give a variance for x on Within in Model Population (not in Model). 

QianLi Xue posted on Wednesday, October 07, 2015  11:22 am



Hi, I would like to do a power calculation for paired data (e.g. left and right eyes) and multiple measurements are taken over time for both eyes. The effect size of interest is rate of change in one eye relative to the other eye (i.e., withincluster effect) as a result of giving one treatment to one eye and placebo to the other). I'm trying to adapt your Example 9.12 (twolevel growth model) to this paired study design. Is there a way to simulate the treatment indicator "x" to take value 1 or 0 within each subject, because treatment is only offered to one of the two eyes. In your example 9.12, the x variable can take the same value for both members of the same cluster. Thanks! 


Is there a reason not to do this as a wide, singlelevel growth model for 2 parallel processes, one for each eye? With 2 binary treatment variables each influencing one eye. 

QianLi Xue posted on Thursday, October 08, 2015  12:28 pm



Thanks so much for your response! I think the use of the parallel model is not able to account for the fact that if one eye gets the treatment, the other must be the control. If we use the parallel model with 2 binary treatment indicators, one for each eye, I guess I will have to constrain the path coefficients of these two treatment indicators to be the same between the two eyes (i.e., parallel processes), right? which is fine, but again, by adopting this approach, we're breaking the pairing of a treatment with a control within the same person. I'm wondering if I can use the twolevel growth model, but drop cases in the simulated datasets where the treatment assignment is the same within a person (i.e., treating both eyes, or treating neither). 


If you start from ex 9.12, are you thinking that Within consists of 2 "members", that is, the 2 eyes, and Between consists of persons? And X the treatment? One can use Cutpoints to make X binary. 


In which case one views the 2 eyes as statistically equivalent  which is perhaps ok. 

Noah Emery posted on Sunday, August 07, 2016  5:47 pm



Hello, I am attempting to conduct a Monte Carlo simulation for a twolevel model with random intercepts and a 111 mediation. As part of this, I want to generate a data set then analyze it to get standardized effects. However, I am struggling to create an ID variable to use as the cluster for level two. Is there a command that can accomplish this? I have looked around but have been unsuccessful locating one thus far. Thanks in advance 


See Examples 12.4. The CSIZES and NCSIZES options are used to define the clusters. 


I'm struggling with the algebra for simulating a clustereddata CFA. I've set it up as an external Montecarlo so that I can use CLUSTER in the data analysis, while generating data as twolevel. Where I'm running into problems is deriving the population parameters to reflect a given ICC for the given loadings. The goal is a dataset which, when correctly analyzed, has expected values of standardized factor loadings of .75 and appropriate SEs, with an ICC of .10. I'd appreciate any help in figuring out how to set %WITHIN% F1*; F1 BY Y1Y5*; Y1Y5*; %BETWEEN% Y1Y5*; in the model population. At least I *think* that's where the magic happens. I can do this easily enough in a singlelevel simulation. But my efforts to do the twolevel version are failing. If all else fails, I can simulate as singlelevel adjusting the N for a design effect, but I'd really like to learn how to do this. Thanks. 


Let me expand that question. I expect the ICC to also influence by factor covariances, so, for example, a desired F1F2 correlation of .65. %WITHIN% F1 BY Y1Y5* Y6Y10@0; F2 BY Y1Y5@0 Y6Y10*; F1* F2*; F1 WITH F2*; Y1Y10*; %BETWEEN% Y1Y10*; 


Because you have no factor on Between, I assume you want the icc for the Y variables, not for the factor. But you probably still want the Y intercepts of Between to correlate (we know that aggregates typically do). But to answer the question, try: %WITHIN% F1@1; F1 BY Y1Y5*0.75; Y1Y5*0.4375; ! this makes the within variances = 1 %BETWEEN% Y1Y5*x; where you get x from icc = B/(W+B): 0.10 = x/(1+x). That's x = 0.10/(10.10)= 0.11. 


Thank you! I kept making it harder than that. 


(part 1 of 2) Bengt, I'm still struggling with this. Does the answer get more complicated for multiple factors? (I sent a chaser question but you might not have seen it.) My estimates are coming in lower than I'd expect. The generation model: MODEL POPULATION: %WITHIN% f1 BY y1y18*.75 y19y90*0; f2 by y1y18*0 y19y36*.75 y37y90*0; f3 by y1y36*0 y37y54*.75 y55y90*0; f4 by y1y54*0 y55y72*.75 y73y90*0; f5 by y1y72*0 y73y90*.75; f1f5@1; y1y90*.4375; f1f5 with f1f5*.65; [f1f5@0]; %BETWEEN% y1y90*.11; [y1y90@0]; ******* The analysis model: ANALYSIS: TYPE IS COMPLEX; MODEL: f1 BY y1y18*.75 y19y90*0 (*1); f2 BY y1y18*0 y19y36*.75 y37y90*0 (*1); f3 BY y1y36*0 y37y54*.75 y55y90*0 (*1); f4 BY y1y54*0 y55y72*.75 y73y90*0 (*1); f5 BY y1y72*0 y73y90*.75 (*1); f1f5 with f1f5*.85; 


(part 2 of 2) So I'm going for standardized factor loadings of .75. But I'm typically getting standardized loadings of ~.6. Factor correlations are also low. The below is for the ICC of .1. With an ICC of .01 in the generation (BETWEEN variances of Y at .01, no other changes), the standardized loadings are around .70.71. Is analyzing with EFA (vs CFA) part of the problem? This is a large model, so I'm only running 100 samples, ~90 of which converge, but the results are consistent across all loadings on all factors, so it seems to be "real." I appreciate your help. ******* STDYX Standardization ESTIMATES S. E. % Sig Average Std. Dev. Average Coeff F1 BY Y1 0.607 0.140 0.190 0.809 Y2 0.611 0.117 0.185 0.831 Y3 0.608 0.134 0.195 0.787 Y4 0.591 0.152 0.187 0.798 Y5 0.605 0.161 0.165 0.787 etc. 


I think it has to do with using EFA. The default Geomin rotation criterion for a simple structure may not be compatible with your pattern of loadings. It sounds like Geomin gives you higher loadings in the zero spots and lower factor correlations; the nonzero loadings give you higher observed variable variances and therefore lower standardized loadings. We mention this issue briefly in the FAQ Lambda is not compatible with the notion of simplicity of the rotation criterion This makes simulations of EFA hard to do. 


Thanks, Bengt. That helps a great deal. 


Also, note that the variance of the outcomes in your step 2 Type=complex analysis is the total variance, so also including the 0.11. This would make your standardized loadings smaller than the true withinlevel loadings. 

Jas Wer posted on Sunday, October 09, 2016  8:45 pm



I have a family sample, with two children per family. Using this data I've conducted analyses to see if differences between siblings in X predict differences in Y. Because the number of sibling pairs who differ in their Y is quite small for some Ys , I would like to estimate the power to detect a withinfamily effect on Y. I'm new to multilevel and Monte Carlo analyses and trying to write an Mplus syntax to estimate power, however the User's Guide examples seem almost too complex for this application. Could you point me to some answers for these questions: 1) My X is measured per child, so could be modeled as a between or within predictor. Since I'm only interested in withineffects, is it correct to specify within=X in the MONTE CARLO subcommand and %within% Y on X; in the MODEL subcommand, simply leaving out any "between" commands/models? 2) In the MODEL POPULATION section, I'm unsure which model parameters to use for X. I seem to need to specify "%within%"before the model parameters, so I think I can't just use the mean and variance across the whole sample. Which parameters do I use here? 3) Some of my outcomes are binary, others continuous. The binary outcomes are not always 50%/50% but also 90%/10% which I understand influences power. Is there a way to specify this, other than generate=Y(1); categorical=Y; ? Thank you for your help! 


Why not take the simpler approach of a multivariate model instead of a multilevel model? That is, you have 2 Xs and 2 Ys, one for each child. You can use 3 sources for guidance. First, each UG example has a Monte Carlo version on our website. Second, you can study the article on our website: Muthén, L.K. & Muthén, B.O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 4, 599620. Mplus inputs and outputs used in this paper can be viewed and/or downloaded from the Examples page. download paper contact first author show abstract Third, you can study our new book: http://www.statmodel.com/Mplus_Book.shtml Regarding your question 3), you can specify [y$1*c] where c is the value of the threshold that gives you the percentage you want. 

Ryan Snead posted on Monday, August 21, 2017  10:38 am



Good afternoon! I am running a post hoc power analysis on a path model using complex survey data. When I run the path model for my analysis I use TYPE = COMPLEX. My Monte Carlo syntax specifies the correct NCSIZES/ CSIZES, but gives me an error for TYPE = COMPLEX and says to use TWOLEVEL. Using the latter type requires me to insert between/within syntax that I don't think is relevant for my model. Is there a different method you would recommend that still allows me to account for clustering in the data? Thanks for your help in advance! I attended the Wednesday workshop at Hopkins last week. It was very helpful! 


Clustered data are generated using TWOLEVEL even though you want to analyze the model using COMPLEX. Examples 12.6 shows this twostep procedure for a growth model. You would do it for a path model. 

Ryan Snead posted on Tuesday, August 22, 2017  7:31 am



Linda, Thank your for your quick response and helpful recommendation. I have two follow up questions. 1) Our sample is multilevel because the design links mothers to providers. Is it necessary to specify between/within slope and intercept like in the 12.6 example? Is what I have come up with satisfactory given only one level 2 variable (x1)? 2) If I wanted to view the results given a larger sample, say 100 to 400, would I need to proportionally increase my CSIZES in this first step? MONTECARLO: NAMES = x1 x2 x3 x4 y1 y2 y3; NOBSERVATIONS = 100; NREPS = 1000; SEED = 8675309; POPULATION = estimates.dat; COVERAGE = estimates.dat; NCSIZES = 4; CSIZES = 25 (1) 12 (2) 13 (3) 3 (4); ANALYSIS: TYPE = TWOLEVEL; MODEL POPULATION: ! current path model y1 ON x1 x2; y2 ON y1; y3 ON x1 x2 x3 x4 y1 y2; y1 WITH y2@0; y1 WITH y3@0; y2 WITH y3@0; MODEL POPULATION: ! two level path model %within% y1 ON x2; y2 ON y1; y3 ON x2 x3 x4 y1 y2; y1 WITH y2@0; y1 WITH y3@0; y2 WITH y3@0; %between% y1 ON x1; y3 ON x1; 


Yes, you need to proportionally increase CSIZES. Note that you have very small cluster sizes of 1, 2, 3, and 4 subjects. And you are estimating about 10 parameters within cluster  that is, the small cluster sizes can lead to poor estimates. Regarding the model, I note that y2 is not on the Within list while at the same time it isn't mentioned on Between. 

Ryan Snead posted on Thursday, August 24, 2017  6:45 am



Thank you for your help! 


Hello, I used Monte Carlo simulation to determine the power of multiple linear regression with 1 continuous dependent variable, 3 continuous independent variables and their interaction (4 interaction terms) (sample N=100). Since I did not find a syntax to generate interaction terms with observed variables, I generated them assuming that the 3 independent variables are latent (3 indicators each). Then, I saved the samples generated and analyzed the data treating the latent factors and their interactions as observed variables (using the DEFINE syntax). To minimize the measurement error when the latent factors are treated as observed, each factor loading is 0.95 (residual variance 0.09). Do you see any problem with this approach? Thanks! MONTECARLO: NAMES ARE Y X1X9; NOBSERVATIONS = 100; NREPS = 100; MODEL POPULATION: f1 by x1x3*0.95; f2 by x4x6*0.95; f3 by x7x9*0.95; x1x9*0.09; f1 f2 with f3*.40; f1 with f2*.40; f1f3@1; int1  f1 xwith f2; int2  f3 xwith f2; int3  f1 xwith f3; int4  int1 xwith f3; Y on f1*0.20 f2*0.20 f3*0.20 int1*0.20 int2*0.20 int3*0.20 int4*0.20; Y*.72; 


A simpler way is to do a 2step Monte Carlo run. Step 1: generate all needed variables and save the data sets. Step 2: Do an external Monte Carlo run which uses Define to create the variables. See the V8 UG ex 12.16 for how to do a 2step Monte Carlo. 


Thank you so much for the prompt response. I could not find the example you refer to. However, I did use the 2step Monte Carlo but I only posted the step 1. The second step looks like this: variable: NAMES ARE Y X1X9; USEVARIABLES ARE Y f1 f2 f3 int1 int2 int3 int4; DEFINE: f1=mean(x1x3); f2=mean(x4x6); f3=mean(x7x9); INT1=f1*f2; INT2=f1*f3; INT3=f3*f2; int4=f1*f2*f3; standardize f1f3; MODEL: [f1int4@0]; f1int4@1; Y ON f1int4*.2; y*0.72; Unless you meant that there is a simpler way to generate interaction terms. Thank you! 


You find the UG ex 12.6 in the V8 UG that is posted on our website. This shows what step 1 and step means and how they are carried out. In the second step, you would use the Define command to created the interaction. for instance: Define: inter = x*y; 

Kelly S posted on Monday, April 02, 2018  4:59 pm



Hello, I am using multilevel sem and trying to check for mediation. Mplus suggests using Monte Carlo simulation for my hypothesised model (assuming due to the number of degrees of freedom). However, no modification indices are available for Monte Carlo and multilevel analysis. How can I check my model's goodness of fit? At the moment I only get AIC and BIC, which are not easily interpreted... how can I check RMSEA, SMMR, CFI and TLI? Any ideas? Thank you in advance 


Perhaps you are referring to Monte Carlo integration when using ML. If fit indices are not given it is because the usual sufficient statistics are not relevant in your case. You can always work with neighboring models that have fewer restrictions (e.g. more paths) and test the significance of those extra parameters. 


How can I get an effect size for an individual predictor in a regression, using TYPE=COMPLEX? 


Just express the effect size in Model Constraint and it takes into account Complex. 


Greetings, I’m running a multigroup factor analysis on national data will cluster, weight, and stratum variables. Given a small sample size of our minority group (n=126 for minority, n=400 for majority), we wanted to generate new data using montecarlo simulation that can be used for additional analysis. Ran into a few problems. Please see below Multigroup factor analysis on original data ran successfully. Sample code used, which ran correctly. Estimates produced. I’m not lost on how to proceed with generating new data using montecarlo simulation. Based on the book, I entered the following code, but I’m not able to run the analysis. I looked at the file that I created, and it only has 12 data columns, so I’m not sure whats actually going on here. Thank you for your time and I look forward to your response. MONTECARLO: NAMES ARE ID WT STRAT CLUS RACE U1U13; NOBSERVATIONS = 500 500; NREPS = 500; SEED = 12345; NGROUPS=2; POPULATION = NAME.dat; COVERAGE = NAME.dat; MODEL POPULATION: F1  F3 BY U1U13 (*1); [F1F3@0]; MODEL POPULATIONg2: F1  F3 BY RU1U13 (*1); MODEL: F1 BY U1U2 (*.35); F2 BY U3U5 (*.40); F3 BY U6U10 (*.45); MODEL g2: F1 BY U1U2 (*.35); F2 BY U3U5 (*.40); F3 BY U6U10 (*.45); 


You need to give (residual) variances for F and the U's. 


Thank you for your response. I tried something different to make it easier for me understand how to go about doing this. Im trying to follow example 12.7 in the book, but ran an ESEM. Given that I'm using national data, strata, weight, and cluster variables are included. The data are also categorical. The model computed and I saved the estimated using the SAVEDATA: ESTIMATES= command. This model contained 9 variables. f1f3 by y1y9 (*1); However, my Montecarlo simulation following Example 12.7 step two using ESEM is giving me the following error statement: The COVERAGE file contains too much data. Check your data. File: MONTEGRA.dat please help 


Please send relevant files to Support along with your license number. 


I'm trying to calculate the power for a crossclassified model but I think there must be something I'm not understanding about the CSIZES and NOBSERVATIONS options. I've run the model and level2a has 245 clusters across 5 sizes and level2b has 69 clusters across 15 sizes. I also have a total of 396 observations and 1 observation per cell. I understood from the UG that when specifying CSIZES that the numbers outside the brackets should sum to level2b's clusters and the numbers inside the brackets should sum to level2a's clusters. And that the NCSIZES should describe the number of unique cluster sizes for each level. But I can't seem to get it to where the number of clusters are correct at the same time as the number of observations. NCSIZES = 15[5] CSIZES = 5[2(1)] 4[14(1)] 3[10(1)] 2[81(1)] 1[138(1)] NOBSERVATIONS = 396 In this iteration the numbers inside the brackets sum to level2a's clusters, and the observations are correct, but the numbers outside the brackets sum to 15 when I thought they should sum to 69. Hopefully someone can spot the mistake I'm making. Thanks for the help. 


Please send the full output and your license number to support@statmodel.com. 


When I run the following model and vary the # of clusters I'm finding that power for the dv1 ON fw paths do not change from 1 and the coverage is always above .91. This seems curious and highly unlikely to me especially since I even see this when I lower the number of clusters to 20. In fact, the only thing I see worsen is SRMRbetween. Can you explain why this is? Is there something off in my code? Thanks. MONTECARLO: NAMES ARE y1y10 exp1 exp2 exp3 dv1 dv2; CSIZES=30(20); NOBS = 600; NREPS = 1000; SEED = 4533; NCSIZES =1; WITHIN ARE exp1 exp2 exp3 y1y10; ANALYSIS: TYPE IS TWOLEVEL RANDOM; MODEL POPULATION: %WITHIN% fW by y1*0.5 y2*0.5 y3*0.5 y4*0.5 y5*0.5 y6*0.5 y7*.50 y8*.50 y9*.50 y10*.50; fw@1; fw ON exp1*.20 exp2*.20 exp3*.20; dv1 ON fw*.3; dv2 ON fw*.3; exp1*0.60 exp2*0.60 exp3*0.60; y1y10*.75; dv1dv2*.91; %BETWEEN% dv1dv2@.25; MODEL: %WITHIN% fW by y1*0.5 y2*0.5 y3*0.5 y4*0.5 y5*0.5 y6*0.5 y7*.50 y8*.50 y9*.50 y10*.50; fw@1; fw ON exp1*.20 exp2*.20 exp3*.20; dv1 ON fw*.3; dv2 ON fw*.3; y1y10*.75; dv1dv2*.91; %BETWEEN% dv1dv2@.25; 


I wonder why in your Model command you fix residual variances: %BETWEEN% dv1dv2@.25; Perhaps the results are also affected by having 20 observations per cluster. And where the indicators for fw have zero intraclass correlations so essentially #clusters * 20 independent observations 


Thanks for the response. A few additional questions regarding your response... 1. Shouldn't the small number of obs per cluster results in lower power? 2. I fixed the dv1dv2 at .25 because I wanted an ICC of .20 for the DVs, but I guess I should have freely estimated it. Is that correct? 3. I see your point about 0 ICC on the indicators. If I want those to also have an ICC of .20, would I take them off the WITHIN line and just list them on BETWEEN with a variance of .25 so the code looks like the following? If that's correct, I'm still getting similar results (very high power with small # of clusters) and I'm not sure why. MODEL POPULATION: %WITHIN% fW by y1*0.5 y2*0.5 y3*0.5 y4*0.5 y5*0.5 y6*0.5 y7*.50 y8*.50 y9*.50 y10*.50; fw@1; fw ON exp1*.20 exp2*.20 exp3*.20; dv1 ON fw*.3; dv2 ON fw*.3; exp1*0.60 exp2*0.60 exp3*0.60; y1y10*.75; dv1dv2*.91; %BETWEEN% dv1dv2*.25; y1y10*.25; MODEL: %WITHIN% fW by y1*0.5 y2*0.5 y3*0.5 y4*0.5 y5*0.5 y6*0.5 y7*.50 y8*.50 y9*.50 y10*.50; fw@1; fw ON exp1*.20 exp2*.20 exp3*.20; dv1 ON fw*.3; dv2 ON fw*.3; y1y10*.75; dv1dv2*.91; %BETWEEN% dv1dv2*.25; y1y10*.25; 


1. 20 per cluster isn't that small when you consider that you have many clusters. 2. Yes, free them. 3. Looks right  you can send your output to Support along with your license number. 

Back to top 