Mplus Discussion >> Power/monte carlo studies with multilevel data

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Power/monte carlo studies with multil...

Mplus Discussion > Multilevel Data/Complex Sample >

Message/Author

Anonymous posted on Friday, December 12, 2003 - 7:48 pm

I am trying to calculate power for a multilevel study-latent class growth. I have the example from the addendum to the user’s manual (example is for a two group, three level model). I also have Muthén and Muthén (2002) describing how to use a monte carlo study to decide power.

A few questions in trying to integrate the two pieces of information.

Is the first step to calculate ICCs and design effect to see if I even need to do multilevel modeling?

I’m following the example from the product support area. I’ve removed the input for the second group. For the monte carlo study, do I simply run a multilevel model and get the starting values from the output to get starting values for the within and between parts of the model?

In the multilevel model, I want to examine the effect of a between variable on slope. Do I only specify that regression in the between part of the model, only in the within part or in both places?

Thanks.

bmuthen posted on Saturday, December 13, 2003 - 12:24 pm

Answered off line because of further questions.

Anonymous posted on Sunday, January 25, 2004 - 3:24 pm

We are presently planning a family study where we would like to use multi-level SEM to examine potential influences on the psychosocial adjustment of children (level 1) nested within family systems (level 2). In the current or upcoming versions of Mplus, is there a way to estimate nonrecursive models? We have not seen this issue mentioned in any of the references we have collected on multi-level SEM. Thank you in advance for whatever help you can provide.

bmuthen posted on Sunday, January 25, 2004 - 3:31 pm

The current version of Mplus allows nonrecursive multi-level SEM.

Anonymous posted on Wednesday, January 28, 2004 - 3:16 pm

We are planning a family study where we would like to use multilevel SEM to analyze data collected from children nested within families. I have been reading what has been written about power analysis and sample size estimation for multilevel models. In those papers, there appears to be concern about power and the number of clusters. I see little comment about power and the number of observations within each cluster other than a note that more variability in observations per cluster can have a negative effect on power. In this data set, we expect to have more that 400 families with either 1 or 2 children per cluster so a low number of clusters is not a concern. Most (70%), however, will only have 1 child per cluster. Should we be concerned about the number of clusters with only 1 observation? How might this affect power? Thank you for your reply about nonrecursive models.

bmuthen posted on Wednesday, January 28, 2004 - 3:26 pm

Having clusters with only 1 child is not a concern. They do not contribute information to the estimation of parameters related to within-cluster (child) variation, but do contribute to the estimation of cluster-related (family) parameters. So the question is for which type of parameter you want high power. If you have a small number of families with more than 1 child and want high power for within-family parameters, there might be a problem. As you say, you are likely to have good power for family-level parameters. You can study these power matters using the Mplus Monte Carlo facility if you have a notion of the approximate model and parameter values.

David Bard posted on Thursday, May 18, 2006 - 10:32 am

I am trying to run a power analysis for a longitudinal RCT where data are expected to fit a latent difference score SEM (McArdle and colleagues invention- an upcoming SEM pub cited below). This has forced me to ask some hard questions about the structural relations between variables at the within and between levels using the Mplus 2-level structure. Can you share some insight?

1) I'm finding it very hard to decide whether or not the SEM model auto-regressive and coupling effects (for bivariate growth) need to be constrained to be equal at both levels in the same way that Muthen (1997) suggests constraining the slope and indicator effects to be the same in clustered LGM. How does one go about deciding such things even for more typical multilevel SEM situations? In other words, when should loadings and regression effects be constrained across levels, and when should they be allowed to vary? Is this purely an empirical issue, or should theory drive model specification.

2) In Muthen & Muthen (2002), without 3rd level (or Mplus' 2nd level) clustering, the authors show how to run a monte carlo power analysis for an LGM with a binary covariate.

2a) Slope effect size was defined as the difference in the average group slope estimates divided by the variance of the latent slope term. The difference in group slopes is directly related to the regression of the slope term on the "binary" covariate, which if scored 0/1, has a mean of .5 and a variance of .25. Using these estimates, regression effects of .2 and .1 were chosen to represent medium and small effect sizes. These effects appear in the appendix Mplus code, but the covariate is scaled with a variance of 1 and a mean of 0 and not .25 and .5 for data generation. 2ai). Is the monte carlo SEM used to generate the outcome data modeling the covariate as dichotomous 0/1 (or perhaps now, -1/1, so that mean=0 and var=1), or continuous- only later to be transformed into the desired dichotomy? 2aii). Regardless, does this variance rescaling not change the effect size?

2b) When trying to plug in reasonable estimates of slope and intercept residual variability, does it make sense to use the Muthen & Muthen (2002) estimates at the within level, and use a ratio of the between to within variation equal to the expected ICC? For example, I expect an ICC of around .01 (at least for time 1 measurements), so does it sound reasonable to set the between slope residual variance equal to .01*V(WS)- i.e., .01 times the estimated within slope residual variance (which = .09 in Muthen & Muthen, 2002).

Thank you.

McArdle, J. J. (in press). Latent difference score approach to longitudinal dynamic structural analysis.
Structural Equation Modeling.

David Bard posted on Thursday, May 18, 2006 - 10:39 am

Oh, sorry, one more question related to my first. In older applications of these multilevel models (examples on the Mplus homepage for clustered LGM), residual variation for the observed outcomes is modeled at both the within and between levels; yet, in newer applications, this variation is only modeled in the within level (examples included in newest version of Mplus manual). How does one make these decisions?

Bengt O. Muthen posted on Friday, May 19, 2006 - 4:56 am

1. Growth models imply a mean structure for the variable that is repeatedly measured over time. If you look at the 3-level version of the growth model in standard multilevel books, you see that it has time scores that multiply both the means and residuals of the growth factors. The latter part ends up on within and the former on between, but the two parts are using the same time scores; hence the across-level equality in Mplus. Other models such as McArdle's have to be studied from this point of view to see if the same model-implied equality across levels holds. I have not yet read the article.

2a The x is first generated as N(0,1) and then dichotomized. It is the dichotomized version that is used in the model to generate the data.

2b.

Yes this is reasonable.

Follow-up question:

The between-level residual variances are often close to zero. In conventional 3-level growth modeling such as HLM, they are fixed at zero. Mplus allows them to be estimated and sometimes that is important. However, with categorical and other outcomes it is computationally time-saving to restrict them to zero.

David Bard posted on Friday, May 19, 2006 - 7:59 am

Thank you. Those are excellent explanations. Two quick follow-ups:
I cannot seem to get a dichotomous variable to run at the between level in my SEM model. I keep getting the message:

Only x-variables may have cutpoints in this type of analysis. Cutpoints for variable W were specified.

Yet, when I run the LGM example 11.6 from the manual, I do not get this message. Are cutpoint variables at the between level only allowed for LGMs?

Also, if you create a dichotomous variable in the monte carlo routine, is the default scoring 0/1? Does rescaling the variable to a mean of 0 and variance of 1 in the model syntax make the dichotomous variable -1/1?

Thanks again.

Linda K. Muthen posted on Friday, May 19, 2006 - 9:50 am

You do not create binary dependent variables using the CUTPOINT option. This is for independent variables only. The GENERATE option is used to generate binary dependent variables. See the user's guide for further information and examples. If you have further problems, send your input and license number to support@statmodel.com.

David Bard posted on Saturday, May 20, 2006 - 3:57 pm

I'm sorry I should have been more clear, I was creating a binary independent variable and have now discovered it works fine as long as the rescaling and mean change sytnax (e.g., [w@0] and w@1) is left out of the model. Now that it's working, I have another dilemma. I'm interested in determining power for a clinical trial where block-randomization is used to ensure equal number of treatment and control clinics (only 10 clinics enrolled). Unfortunately, with the syntax we've discussed thus far, my treament variable is not always evenly distributed across clusters. Is there a way to avoid this in the monte carlo routine?

Bengt O. Muthen posted on Sunday, May 21, 2006 - 1:01 pm

You can use a multiple-group approach where you work with a fixed sample size for the treatment group and for the control group.

David Bard posted on Thursday, July 13, 2006 - 1:07 pm

I'm back to a similar problem described above (May 20). I'm trying to fit the multiple-group approach but having trouble with the correct sytnax for this type of monte carlo. I keep getting the following error message when using the Nobservations, Ngroups, and CSIZES syntax below.

NOBSERVATIONS = 300 300;
NGROUPS =2;
.
.
.
NCSIZES = 3;
CSIZES = 9 (30) 8 (20) 17 (10); !cluster sizes
or
CSIZES = 9 (15) 8 (10) 17 (5); !cluster sizes within each group
or even
NCSIZES = 1;
CSIZES = 50 (12); !cluster sizes
or
CSIZES = 50 (6); !cluster size within each group

***The number of sets of cluster information does not match the number of
groups.

Linda K. Muthen posted on Friday, July 14, 2006 - 10:03 am

For multiple group analysis, you need to specify the number of unique cluster sizes for each group and the sizes for each group. This is done using the | symbol. See Chapter 18 where the NCSIZES and CSIZES options are discussed for an example of how to use these options for multiple group analysis.

Tom Hildebrandt posted on Monday, November 13, 2006 - 12:50 pm

I'm interested in doing a power analysis for a RCT comparing group tx vs. individual tx (block randomization). Would you have a good quick reference that could help in setting up this power analysis? I have read the info on the website for how to set this up using the Monte Carlo simmulation method, but wondered if there was an applied example that I could also refer to. I am also concerned with looking at mediators and moderators in this RCT and trying to determine power for these analyses as well. Any suggestsions would be greatly appreciated.

Bengt O. Muthen posted on Monday, November 13, 2006 - 5:58 pm

See the article Muthen and Muthen (2002) in SEM on our web site under Papers.

Tom Hildebrandt posted on Tuesday, November 14, 2006 - 6:37 am

Bengt,

Thank you for the reference, it is very helpful. Should there be any concern with calculating power for a design where individuals nested within groups for the group tx arm of the study but not for the individual arm of the study.

Best wishes,
Tom

Bengt O. Muthen posted on Tuesday, November 14, 2006 - 7:21 am

UG Ex11.4 shows how to do MC studies in a 2-level setting. You can build on that example. If one of the groups does not have clustering in this way, you could probably use "external" MC analyzing data generated for the tx arm (2-level) separately from the other arm (single-level) - and then send the combined data to MC.

Thomas Olino posted on Wednesday, November 29, 2006 - 11:19 am

I am attempting to estimate power for moderational analysis in a multilevel model. Would it make sense to generate data based on main effects, then use those generated data to compute the interaction for each of the generated data sets, and then see how often the interaction turned out signifcant?

Thanks!

Linda K. Muthen posted on Wednesday, November 29, 2006 - 3:38 pm

This makes sense to me. But you will need to use our external Monte Carlo to be able to use the DEFINE command to create the interaction. External Monte Carlo is described in Example 11.6.

Thomas Olino posted on Wednesday, November 29, 2006 - 6:21 pm

Thanks for the tip. A colleague had looked at 225 replications by running analyses with individual datasets. The external monte carlo is much more efficient.

Thomas Olino posted on Thursday, November 30, 2006 - 6:34 pm

One additional follow up - in the monte carlo simulation, is the appropriate expectation that the interaction should be significant a high percentage of time or approximately 5% of the time?

Thanks.

Linda K. Muthen posted on Friday, December 01, 2006 - 9:36 am

Interactions have low power in general. The power depends on sample size and effect size.

Tom Hildebrandt posted on Wednesday, December 06, 2006 - 6:10 am

Bengt,
I have followed your advice from Nov13-14 and run two sepearte MC studies to look at power within each arm of the study (first grp then individual). I'm now trying to build upon UG ex11.4 as you suggested with the group and individual data. I'm interested in (1) the power to detect difference in outcome btwn tx (grp vs ind), and (2) whether a mediator (social support) is stronger in grp vs tx. I'm not sure exactly how to set this up in part because I'm not sure what parameter gives me the test for (1). Do I set it up as ex11. but with:

NCSIZES = 2;
CSIZES = 10 (8) 1 (80);

and then interpret the effects of w in the model or do I also create a categorical X variable for tx condition?

I think I could figure out (2), if I was sure about (1), but any suggestions would be helpful.

Tom

Bengt O. Muthen posted on Thursday, December 07, 2006 - 6:41 pm

You want to say csizes = 10(8) 80(1) if you have 80 people who are not clustered within a higher unit. Saying 80(1) means that you have 80 clusters of size 1.

But I was actually suggesting "external MC" which is in line with ex11.6.

Tom Hildebrandt posted on Saturday, December 09, 2006 - 3:37 pm

Thanks Bengt,

I see now my typo and appologize. I can run everything as you suggested, with individual and group data sets generated using SAVE/REPSAVE comands and then input into an "external MC." Do you have any thoughts on number (2) the example of moderated mediation I mentioned on the 6th?

Bengt O. Muthen posted on Monday, December 11, 2006 - 10:12 am

I think your question of the 6th was how to assess if a mediator is stronger in one group than the other. You can do this via Model constraint using the New option to create a new parameter corresponding to the difference between the 2 mediator estimates. You will then get a monte carlo summary of this new parameter to see how often equality of the mediator estimates is rejected (see last output column).

Tom Hildebrandt posted on Monday, December 11, 2006 - 1:08 pm

Thanks again Bengt,

That was very helpful! and makes wonderful sense. I will try that right away.

Best wishes,
Tom

Boliang Guo posted on Thursday, January 11, 2007 - 5:30 am

Hi Prof Muthen, I try to get MC results for estimate estimates in a basic 2 level path analysis, PC clevelly ask me:

this is an example of two-level path
analysis with mediating effect MC results?

does this mean I can not get MC results for estimates in a 2 level path analysis model with current MPlue yet?

Linda K. Muthen posted on Thursday, January 11, 2007 - 8:19 am

Examples 9.3, 9.4, and 9.5 are two-level path analyses. They have Monte Carlo counterparts. These come with the Mplus CD and are also available on the website. This type of Monte Carlo analysis has been available for several years.

Boliang Guo posted on Thursday, January 11, 2007 - 10:29 am

thanks, Linda.

Boliang Guo posted on Thursday, January 11, 2007 - 12:10 pm

hello again Prof Muthen,
I add model constraint to a 2 level path analysis MC model for mediation computation,
I compute the estimates from Kennys data in step 1, then in step 2 for MC estimate,
when I add same constraint to both model population and model, Mplus say,
'A parameter label has been redeclared in MODEL CONSTRAINT.'
if I remove constraint command from Model population or from model,say, leave only one constraint command in model or model population, then, mplus work and the result is same whereever I put model constraint in model or model population. my question is:
in MC analysis, 'model constraint' command is only for Model not for model population, right?
thanks.

Linda K. Muthen posted on Thursday, January 11, 2007 - 2:35 pm

MODEL CONSTRAINT is only for MODEL.

Bart Meuleman posted on Tuesday, May 22, 2007 - 7:45 am

Hello,

I am doing a Multilevel SEM Monte Carlo study to test the accuracy of estimation with small group sample sizes. I have a 2-level factor model with 4 indicators (measured at the individual level). Both at the within and between levels, two independent variables have an effect on the latent factor.

Up till now, the indicators (at the individual level) were assumed to follow normal distributions. However, I want to introduce nonnormality into the data generation model by manipulating kurtosis and skewness of the indicators.

In the papers on this site if found that you suggest using a mixture of subpopulations for which the latent factor follows a normal distribution with different parameters. Am I getting this more or less right?

If so, there are a couple of things I do not see clearly yet...
1. Can this mixture approach also be applied in the case of multilevel sem?
2. How do you choose the proportions and means/variances of the subpopulations to assure that certain skewness and kurtosis values are obtained?
3. Is it possible for indicators that load on the same latent variable to have divergent skewness and curtosis values?

Thanks!

Bart

Linda K. Muthen posted on Tuesday, May 22, 2007 - 8:49 am

1. Yes. You might want to start with the Monte Carlo counterpart of Example 10.2. You can generate the data as two classes and analyze it as one class.
2. Unfortunately, this is trial and error.
3. Yes.

Bart Meuleman posted on Saturday, August 09, 2008 - 3:11 pm

Hello again

Hopefully a final question wrt a Monte Carlo study for a two-level SEM model with small group sample sizes.

I want to report population within and between covariance matrices. Elsewhere on this discussion board, I found that for one-level Monte Carlo studies, population covariance matrices can be obtained by running a separate analysis with all parameters constrained to the population values, and the identity matrix as input file ('type is covariance').

However, to me it seems that this 'trick' cannot be used in the case of two-level models, because these models do not allow a covariance matrix as input (is this right?).

Instead, I thought of using one of the datasets that were generated during the monte carlo as input file (and of course also constraining all parameters to the population values). Is this a correct way to obtain population covariance matrices?

Thanks beforehand!

Bart

Bengt O. Muthen posted on Saturday, August 09, 2008 - 3:33 pm

That's right - you can use any data. It only matters that you fix all parameters at the pop values.

Thomas A. Schmitt posted on Monday, August 24, 2009 - 9:14 am

Hello,

On page 604 of Muthen and Muthen (2002) you state that the R^2 value for the intercept growth factor is 0.20. Could you clarify how you arrived at this value. I've read through it and seem to be missing something. Thank you!

Tom

Linda K. Muthen posted on Monday, August 24, 2009 - 9:50 am

This comes about as follows:

Beta**2*var(x)/Beta**2*var(x)+ var(r)=

.5**2*.25/.5**2*.25 + .25 =

.0625/.3125 = .2

Thomas A. Schmitt posted on Monday, August 24, 2009 - 2:10 pm

Thank you Linda. Could you also tell me how you arrived at an effect size of 0.63 for a regression slope coefficient of 0.2? My calculations are not no coming out. Also, why are you interested in both an effect and R^2 for the intercept?

Linda K. Muthen posted on Monday, August 24, 2009 - 3:34 pm

We look at R-square for both the intercept and slope growth factors. We look at effect size for the regression coefficient in the regression of the slope growth factor on the covariate because that is the parameter for which power will be assessed. Effect size is used to gauge the size of the effect. The effect size is the
difference in the slope means for the two values of the covariate divided by the standard deviation of the slope growth factor:

.2 / Sqrt (.04*.25 + .09 = .63)

Cindy Schaeffer posted on Friday, September 11, 2009 - 12:46 pm

This is not a multilevel question but it seemed to fit with Thomas Schmitte's post so I'm placing it here.

I would like to run a monte carlo LGM for power estimation with missing data similar to examples 3 and 5 in Muthen and Muthen (2002). However, the estimates for missing data within these examples (12% at T1, 18% at T2, 27% at T3, and 50% at T4) is too high for my purposes. Here is the input for missingness from those examples:

MODEL MISSING:
%OVERALL%
[y1@-2 y2@-1.5 y3@-1 y4@0];
y2-y4 ON x@1;

I assume that the -2, - 1.5, etc. are threshold values for degree of missingness. How can I vary these? Our pilot work suggests retention and data completion rates of .80-.90 across 5 time waves. I don't want to run these assuming no missing data either.

thanks much!
Cindy

Bengt O. Muthen posted on Friday, September 11, 2009 - 1:37 pm

You can read about the input for this type of Monte Carlo run in the UG ex 11.2 discussion. See especially page 338 in the V5 UG. The intercept statement in MODEL MISSING, say

[y3@-1];

says that the intercept is -1 in the logistic regression for the probability of missing at y3. If we have no covariates in MODEL MISSING, or if we consider the covariate(s) at the value zero, this -1 intercept value translates to 0.27 of having missing y3. A lower logit intercept (say -2) gives a lower probability. Use the formula

L = log(p/(1-p))

to go from the probability p to the logit L. So if you want p=.10 you get L=-2.2.

Now, if you want x to influence missingness, which you show, then the missingness percentage goes up for increasing slope on x. You have to experiment to get the values you want. The easiest approach is trial and error using one replication and a large sample size (say 10,000). You generate one rep and save the data, then run the data as real data and request "patterns" in the Output command to get the information about the missingness percentages.

Till Haumann posted on Thursday, June 16, 2011 - 8:39 am

Hi,

I have a question regarding a mulilevel monte carlo simulation for comparing different computation approaches to multilevel data sets.

I have simulated data for a multilevel model with one level 1 variable (X), a level two variable (Z), and a cross level interaction (XZ) affecting the outcome (Y).

Using the external monte carlo analysis tool I want to compare a multilevel model to an aggregated regression approach.

While it is no problem to aggregate the outcome (Y) and the level 1 predictor (X) using the CLUSTER_MEAN option, I can not create an interaction term (X*Z) thereafter, due to the restriction that a variable created using the CLUSTER_MEAN function cannot be used in subsequent DEFINE statements in the same analysis.

Is there any possibility to create an interaction term (X*Z) for an aggregate regression analysis to compare it to the original cross-level interaction?

Thank You very much!

Till

Linda K. Muthen posted on Thursday, June 16, 2011 - 10:48 am

You would need to do the analysis in two steps. Save the cluster mean variables and then use DEFINE in the next step to create the interaction term.

Till Haumann posted on Friday, June 17, 2011 - 1:08 am

Thank you for your quick respondence!

I tried to do that, but failed to save the (1000) data files used in the external monte carlo analysis tool.

Using the savedata command is not possible with Type=Montecarlo, neither can the Repsave option be used in the external monte carlo analysis.

So is their a possiblity to save the cluster mean variables for all genareted data sets used for the external monte carlo analysis (see input statement below)?

Again, thank you for your help!
Till

File=MC1list.dat;
TYPE=Montecarlo;

Variable:
Names= x2 y x1 clus;
UseVariables = x2 x1c yc;
Cluster=clus;

Define:
x1c= CLUSTER_MEAN(x1);
yc = CLUSTER_MEAN(y);
...

Linda K. Muthen posted on Friday, June 17, 2011 - 1:39 pm

I can't think of any way to do this except by doing it for each data set separately. You might check on the website to see if Using Mplus via R under How-To might be of help.

Jan Stochl posted on Friday, November 02, 2012 - 8:10 am

Dear Linda and Bengt,
I am trying to assess robustness of estimators against ignorance of clustering for multilevel data(i.e., when researchers use non-multilevel CFA for clustered data). For that purposes I have simulated datasets with different levels of ICCs (I simulated 100 datasets for each of 11 levels of between-level variability) Beside that, I am trying to show that there is no parameter bias (regardless of ICC level) when clustered data are correctly analyzed using multilevel CFA. In the case when clustered data are correctly analyzed using hierarchical approach, everything works as expected except for the data that were simulated to have ICCs close to zero. In that case I observe unexpectedly large variability within each model parameter and deflated chi-square. I use WLSMV estimator. I hypothesize that this might be consequence of how MPlus handles negative estimates of ICC which may occur when the between-level variability is very small. I cannot find anywhere what MPlus does in this case? Or do you have any other explanation?

Thanks a lot for your reply,
Jan

Tihomir Asparouhov posted on Friday, November 02, 2012 - 4:23 pm

Jan

When ICC is close to 0 that means you are dealing with nearly non-existent between level random effects and singular between level matrix. In such situations in practical setting you would want to abandon these random effects from the model. Look in tech 9 for error messages and warning in the simulation. Possible attempts to improve the estimation would include increasing the number of integration points decreasing the convergence criteria, redefining the model as a more restricted model with fewer random effects.

In principle negative ICC estimates can not occur with the WLSMV estimator since it uses the EM-algorithm, but ICC near zero means random effect variances near zero which can lead to estimation difficulties or non-convergence.

Tihomir

Jan Stochl posted on Wednesday, November 07, 2012 - 11:48 am

Thanks a lot Tihomir, it helped a lot!

Jan

Joop Hox posted on Tuesday, January 08, 2013 - 4:37 am

I am trying a MonteCarlo simulation with an intervention that exists only at the between level. I want the intervention to be a binary variable, so I want to use CUTPOINTS to create that. Mplus keeps telling me that CUTPOINTS does not work for dependent variables, but in fact my intervention is an independent variable. Why doesn't this work?

Here is my simulation setup:

TITLE: Mediation model 2-->1-->1
MONTECARLO:
NAMES ARE atti norm behav interv;
NOBSERVATIONS = 1000;
NCSIZES = 1;
CSIZES = 200 (5);
NREP = 100;
! SEED = 100656;
! CUTPOINTS = interv(0); !cutpoints does not work?
BETWEEN = interv;

MODEL POPULATION:
%WITHIN%
atti@1; norm@1;
atti WITH norm@.2;
behav ON atti@.5 norm@.5;

%BETWEEN%
atti@1; norm@1;
atti WITH norm@.2;
behav ON atti@.5 norm@.5;
interv@1;
atti on interv@.5;

Linda K. Muthen posted on Tuesday, January 08, 2013 - 1:26 pm

I suspect the you mention the variance of interv in the MODEL command which makes Mplus interpret it as a dependent variable. Remove the variance from the MODEL command.

mpduser1 posted on Friday, April 05, 2013 - 8:04 am

I am trying to perform a power analysis for a simple linear regression model (1 dichotomous / dummy predictor) where I have clustered data, but the clustering is not taken into account in the model (analytic model has been mis-specified).

My objective is to see how much power I am losing due to clustering in the data (i.e., data are not IID).

I am using Mplus 7.0.

Have I set up my models in the non-IID case correctly? I ask because preliminary results indicate that I'm getting more power using clustered, non-IID data, than IID data.

Thank you.

--------------------------

MONTECARLO:
NAMES ARE y x;
NOBSERVATIONS = 400;
NREPS = 1000;
SEED = 2013;
CUTPOINTS = x(0);
NCSIZES = 1;
CSIZES = 40(10);
WITHIN = x;
BETWEEN = ;

MODEL POPULATION:

%WITHIN%

[x@0];
x@1;

y on x*.35;

%BETWEEN%
y@.20
;

ANALYSIS:
TYPE IS TWOLEVEL;

MODEL:
%WITHIN%
y on x;

%BETWEEN%

OUTPUT:
TECH9;

Dan Feaster posted on Friday, April 05, 2013 - 9:17 am

The "increased" power you are seeing is a consequence of the standard errors being underestimated when you do not account for clustering. This is not truly increased power, you are using inappropriate/standard errors for you tests

Linda K. Muthen posted on Friday, April 05, 2013 - 9:24 am

The power estimate is not valid because you have not given any coverage values in the MODEL command. Coverage values are taken from the MODEL command. Population parameter values for data generation are taken from the MODEL POPULATION command. Your MODEL command should be:

MODEL:
%WITHIN%
y on x*.35;
%BETWEEN%
y*.20;

mpduser1 posted on Friday, April 05, 2013 - 9:35 am

Dan and Linda, thank you.

Linda,

If I specify the following, am I the setting up the model appropriately, while also fixing an ICC of .20?

Also, a propos of Dan's comment, do I need to specify COMPLEX to get correct S.E.s and power calculations?

Thank you.

------------------------------

MODEL POPULATION:

%WITHIN%

[x@0];
x@1;

y on x*.35;
y@.8;

%BETWEEN%
y@.2;

ANALYSIS:
TYPE IS TWOLEVEL;

MODEL:
%WITHIN%
y on x;

%BETWEEN%
[y@0];
y@0;

Linda K. Muthen posted on Monday, April 08, 2013 - 6:58 am

See Example 12.6.

Bruce A. Cooper posted on Friday, May 03, 2013 - 3:59 pm

Hi Linda -
How can I create an ordinal IV (representing assessment times) in a Monte Carlo analysis? I'm trying to set up a program to estimate power for a multilevel approach to longitudinal analysis, and I can't figure out how to create an ordinal predictor representing time.
Thanks!
-bac

Linda K. Muthen posted on Friday, May 03, 2013 - 4:58 pm

See the GENERATE option. You would generate a variable with more than one threshold.

Bruce A. Cooper posted on Monday, May 06, 2013 - 11:10 am

Thanks, Linda -
I tried that with:
names are y t x ;
CUTPOINTS = x (0);
GENERATE = t (4 p) ;

but since time is an x-variable, I get the warning: X-variables should not be specified as categorical in the GENERATE option. The CUTPOINTS option should be used for x-variables instead. The following variable was specified as categorical in the GENERATE option: T

But CUTPOINTS can only be used for binary predictors. I need to generate (say) 5 records per case to represent 5 occasions 1-5 (or preferably, 0-4), to allow predicting y ON t.
Thanks!
-bac

Linda K. Muthen posted on Monday, May 06, 2013 - 3:57 pm

Please send the full output and your license number to support@statmodel.com.

Yaacov Petscher posted on Thursday, July 10, 2014 - 12:45 pm

Is it the case that for a Monte Carlo ML-SEM with a dummy-code at level 2 to reflect treatment status, that if the residual variances for the indicators are fixed at 1 then the factor variances at between and within levels become the proportion of variance at that specific level (e.g., fb2*.70, fw2*.30 with y11-y32*1 would indicate 70% of the total variance is due to between level) or is it the between+within+residual?

Bengt O. Muthen posted on Thursday, July 10, 2014 - 6:46 pm

At each level the total variance at that level is as usual

lambda^2*V(factor)+V(residual)

Christopher Harper posted on Friday, September 05, 2014 - 11:29 am

I'm interested in estimating sample size need to achieve power in a cross-classified or three-level model with categorical outcomes in Mplus 7.2. Reviewing Mplus's manual and statmodel, it seems like this models would need to be estimated using Bayesian estimation. Is it possible to estimate sample size in this type of model via Mplus's Monte Carlo methods?

Bengt O. Muthen posted on Friday, September 05, 2014 - 2:20 pm

Yes. And you find the underlying Monte Carlo runs for all the UG examples on our website.

dlsherry posted on Saturday, January 24, 2015 - 9:57 am

Hello. I am trying to run a monte carlo power analyses for a two level path model. I can get the model to run as a regular analysis but I cannot get the model to run as a monte carlo analyses. I have trying fixed the variances to zero but I keep getting the following error:

*** FATAL ERROR
THE POPULATION COVARIANCE MATRIX THAT YOU GAVE
AS INPUT IS NOT POSITIVE DEFINITE AS IT SHOULD BE.

Could you provide any syntax for a two level path analysis for monte carlo? I have not been able to find one.

Thank you.

Linda K. Muthen posted on Saturday, January 24, 2015 - 11:02 am

Fixing variances to zero will give that message. See mcex9.3.inp or 9.4 or 9.5. These are the Monte Carlo analyses that generate the data for Examples 9.3, 9.4, 9.5.

Peter Taylor posted on Tuesday, August 18, 2015 - 6:07 am

DearLinda & Bengt,

I am trying to estimate power using monte carlo approach for a two-level (time points nested within individuals) regression model, with one within-participants preidctor (time 'x'), and one between-participants predictor (group 'w'). It is the interaction between group and time that I am interested in checking the power for. The outcome is continuous.

I am relatively new to using Mplus for multi-level analyses, so was hoping to run the syntax past you:

TITLE: two level regression
MONTECARLO:
NAMES ARE y x w;
WITHIN = x;
BETWEEN = W;
NOBSERVATIONS = 5040;
NCSIZES = 1;
CSIZES = 120 (42);
NREP = 100;
ANALYSIS: TYPE = TWOLEVEL RANDOM;
MODEL POPULATION:
%WITHIN%
S | y ON x;
Y*1 X*1;
%BETWEEN%
y on w*0.24;
S on w*0.23;
MODEL:
%WITHIN%
S | y ON x;
Y*1 X*1;
%BETWEEN%
y on w*0.24;
S on w*0.23;

A specific questions:

I presume the regression weights (e.g., y on w*0.2) represent unstandardized values (assume I am correct and these are regression weights).

Many thanks for your help

Bengt O. Muthen posted on Tuesday, August 18, 2015 - 10:52 am

Looks ok, except you have to give a w variance on Between in Model Population. And you don't want to give an x variance on Within in Model (x is exogenous).

The coefficients are unstandardized.

Peter Taylor posted on Thursday, August 20, 2015 - 4:39 am

Many thanks. I have adapted the model slightly to include the mean and variance for w (its a binary variable, so mean = .5, variance = .25). I've noticed that whether I set (or leave out) the variance of Y influences the power I get and also the ICC, but I am unsure how to control this. Ideally I want to set the ICC for Y to a value around .4, how would I achieve this in this model?
Syntax below:

TITLE: random intercept two level
MONTECARLO:
NAMES ARE y x w;
WITHIN = x;
BETWEEN = W;
NOBSERVATIONS = 5040;
NCSIZES = 1;
CSIZES = 120 (42);
NREP = 100;
ANALYSIS: TYPE = TWOLEVEL RANDOM;
MODEL POPULATION:
%WITHIN%
y*1;
S | y ON x;
%BETWEEN%
y on w*0.25;
S on w*0.25;
w*0.25; [w*0.5];
MODEL:
%WITHIN%
y*1;
S | y ON x;
%BETWEEN%
y on w*0.25;
S on w*0.25;
w*0.25; [w*0.5];

Peter Taylor posted on Thursday, August 20, 2015 - 4:56 am

Sorry, just found a thread on this - so I can manipulate the ICC of Y by asjusting the ration of between and within variance for y - so this moel is reasonable:

TITLE: random intercept two level
MONTECARLO:
NAMES ARE y x w;
WITHIN = x;
BETWEEN = W;
NOBSERVATIONS = 5040;
NCSIZES = 1;
CSIZES = 120 (42);
NREP = 100;
ANALYSIS: TYPE = TWOLEVEL RANDOM;
MODEL POPULATION:
%WITHIN%
S | y ON x; y*.6;
%BETWEEN%
y on w*0.25;
S on w*0.25;
w*0.25; [w*0.5]; y*.4;
MODEL:
%WITHIN%
S | y ON x; y*.6;
%BETWEEN%
y on w*0.25;
S on w*0.25;
w*0.25; [w*0.5]; y*.4;

Bengt O. Muthen posted on Thursday, August 20, 2015 - 6:31 pm

Looks ok, except you want to give a variance for x on Within in Model Population (not in Model).

QianLi Xue posted on Wednesday, October 07, 2015 - 11:22 am

Hi, I would like to do a power calculation for paired data (e.g. left and right eyes) and multiple measurements are taken over time for both eyes. The effect size of interest is rate of change in one eye relative to the other eye (i.e., within-cluster effect) as a result of giving one treatment to one eye and placebo to the other). I'm trying to adapt your Example 9.12 (two-level growth model) to this paired study design. Is there a way to simulate the treatment indicator "x" to take value 1 or 0 within each subject, because treatment is only offered to one of the two eyes. In your example 9.12, the x variable can take the same value for both members of the same cluster. Thanks!

Bengt O. Muthen posted on Wednesday, October 07, 2015 - 5:50 pm

Is there a reason not to do this as a wide, single-level growth model for 2 parallel processes, one for each eye? With 2 binary treatment variables each influencing one eye.

QianLi Xue posted on Thursday, October 08, 2015 - 12:28 pm

Thanks so much for your response! I think the use of the parallel model is not able to account for the fact that if one eye gets the treatment, the other must be the control. If we use the parallel model with 2 binary treatment indicators, one for each eye, I guess I will have to constrain the path coefficients of these two treatment indicators to be the same between the two eyes (i.e., parallel processes), right? which is fine, but again, by adopting this approach, we're breaking the pairing of a treatment with a control within the same person. I'm wondering if I can use the two-level growth model, but drop cases in the simulated datasets where the treatment assignment is the same within a person (i.e., treating both eyes, or treating neither).

Bengt O. Muthen posted on Thursday, October 08, 2015 - 6:34 pm

If you start from ex 9.12, are you thinking that Within consists of 2 "members", that is, the 2 eyes, and Between consists of persons? And X the treatment?

One can use Cutpoints to make X binary.

Bengt O. Muthen posted on Thursday, October 08, 2015 - 6:35 pm

In which case one views the 2 eyes as statistically equivalent - which is perhaps ok.

Noah Emery posted on Sunday, August 07, 2016 - 5:47 pm

Hello,

I am attempting to conduct a Monte Carlo simulation for a two-level model with random intercepts and a 1-1-1 mediation. As part of this, I want to generate a data set then analyze it to get standardized effects. However, I am struggling to create an ID variable to use as the cluster for level two. Is there a command that can accomplish this? I have looked around but have been unsuccessful locating one thus far.

Thanks in advance

Linda K. Muthen posted on Monday, August 08, 2016 - 9:59 am

See Examples 12.4. The CSIZES and NCSIZES options are used to define the clusters.

Patrick Malone posted on Thursday, September 15, 2016 - 2:31 pm

I'm struggling with the algebra for simulating a clustered-data CFA. I've set it up as an external Montecarlo so that I can use CLUSTER in the data analysis, while generating data as twolevel.

Where I'm running into problems is deriving the population parameters to reflect a given ICC for the given loadings.

The goal is a dataset which, when correctly analyzed, has expected values of standardized factor loadings of .75 and appropriate SEs, with an ICC of .10.

I'd appreciate any help in figuring out how to set

%WITHIN%
F1*;
F1 BY Y1-Y5*;
Y1-Y5*;
%BETWEEN%
Y1-Y5*;

in the model population. At least I *think* that's where the magic happens. I can do this easily enough in a single-level simulation. But my efforts to do the twolevel version are failing.

If all else fails, I can simulate as single-level adjusting the N for a design effect, but I'd really like to learn how to do this.

Thanks.

Patrick Malone posted on Thursday, September 15, 2016 - 2:34 pm

Let me expand that question. I expect the ICC to also influence by factor covariances, so, for example, a desired F1-F2 correlation of .65.

%WITHIN%
F1 BY Y1-Y5* Y6-Y10@0;
F2 BY Y1-Y5@0 Y6-Y10*;
F1* F2*;
F1 WITH F2*;
Y1-Y10*;
%BETWEEN%
Y1-Y10*;

Bengt O. Muthen posted on Thursday, September 15, 2016 - 3:28 pm

Because you have no factor on Between, I assume you want the icc for the Y variables, not for the factor. But you probably still want the Y intercepts of Between to correlate (we know that aggregates typically do).

But to answer the question, try:

%WITHIN%
F1@1;
F1 BY Y1-Y5*0.75;
Y1-Y5*0.4375;
! this makes the within variances = 1
%BETWEEN%
Y1-Y5*x;

where you get x from icc = B/(W+B):

0.10 = x/(1+x).

That's x = 0.10/(1-0.10)= 0.11.

Patrick Malone posted on Friday, September 16, 2016 - 6:08 am

Thank you! I kept making it harder than that.

Patrick Malone posted on Monday, September 19, 2016 - 6:53 am

(part 1 of 2)

Bengt,

I'm still struggling with this. Does the answer get more complicated for multiple factors? (I sent a chaser question but you might not have seen it.) My estimates are coming in lower than I'd expect.

The generation model:

MODEL POPULATION:

%WITHIN%
f1 BY y1-y18*.75 y19-y90*0;
f2 by y1-y18*0 y19-y36*.75 y37-y90*0;
f3 by y1-y36*0 y37-y54*.75 y55-y90*0;
f4 by y1-y54*0 y55-y72*.75 y73-y90*0;
f5 by y1-y72*0 y73-y90*.75;

f1-f5@1;
y1-y90*.4375;
f1-f5 with f1-f5*.65;
[f1-f5@0];

%BETWEEN%
y1-y90*.11;
[y1-y90@0];

*******
The analysis model:
ANALYSIS:
TYPE IS COMPLEX;

MODEL:
f1 BY y1-y18*.75 y19-y90*0 (*1);
f2 BY y1-y18*0 y19-y36*.75 y37-y90*0 (*1);
f3 BY y1-y36*0 y37-y54*.75 y55-y90*0 (*1);
f4 BY y1-y54*0 y55-y72*.75 y73-y90*0 (*1);
f5 BY y1-y72*0 y73-y90*.75 (*1);
f1-f5 with f1-f5*.85;

Patrick Malone posted on Monday, September 19, 2016 - 6:54 am

(part 2 of 2)
So I'm going for standardized factor loadings of .75. But I'm typically getting standardized loadings of ~.6. Factor correlations are also low.

The below is for the ICC of .1. With an ICC of .01 in the generation (BETWEEN variances of Y at .01, no other changes), the standardized loadings are around .70-.71.

Is analyzing with EFA (vs CFA) part of the problem? This is a large model, so I'm only running 100 samples, ~90 of which converge, but the results are consistent across all loadings on all factors, so it seems to be "real."

I appreciate your help.
*******

STDYX Standardization

ESTIMATES S. E. % Sig
Average Std. Dev. Average Coeff

F1 BY
Y1 0.607 0.140 0.190 0.809
Y2 0.611 0.117 0.185 0.831
Y3 0.608 0.134 0.195 0.787
Y4 0.591 0.152 0.187 0.798
Y5 0.605 0.161 0.165 0.787
etc.

Bengt O. Muthen posted on Monday, September 19, 2016 - 11:45 am

I think it has to do with using EFA. The default Geomin rotation criterion for a simple structure may not be compatible with your pattern of loadings. It sounds like Geomin gives you higher loadings in the zero spots and lower factor correlations; the non-zero loadings give you higher observed variable variances and therefore lower standardized loadings.

We mention this issue briefly in the FAQ

Lambda is not compatible with the notion of simplicity of the rotation criterion

This makes simulations of EFA hard to do.

Patrick Malone posted on Monday, September 19, 2016 - 12:03 pm

Thanks, Bengt. That helps a great deal.

Bengt O. Muthen posted on Monday, September 19, 2016 - 12:22 pm

Also, note that the variance of the outcomes in your step 2 Type=complex analysis is the total variance, so also including the 0.11. This would make your standardized loadings smaller than the true within-level loadings.

Jas Wer posted on Sunday, October 09, 2016 - 8:45 pm

I have a family sample, with two children per family. Using this data I've conducted analyses to see if differences between siblings in X predict differences in Y. Because the number of sibling pairs who differ in their Y is quite small for some Ys , I would like to estimate the power to detect a within-family effect on Y. I'm new to multilevel and Monte Carlo analyses and trying to write an Mplus syntax to estimate power, however the User's Guide examples seem almost too complex for this application. Could you point me to some answers for these questions:

1) My X is measured per child, so could be modeled as a between or within predictor. Since I'm only interested in within-effects, is it correct to specify within=X in the MONTE CARLO subcommand and %within% Y on X; in the MODEL subcommand, simply leaving out any "between" commands/models?

2) In the MODEL POPULATION section, I'm unsure which model parameters to use for X. I seem to need to specify "%within%"before the model parameters, so I think I can't just use the mean and variance across the whole sample. Which parameters do I use here?

3) Some of my outcomes are binary, others continuous. The binary outcomes are not always 50%/50% but also 90%/10% which I understand influences power. Is there a way to specify this, other than generate=Y(1); categorical=Y; ?

Thank you for your help!

Bengt O. Muthen posted on Monday, October 10, 2016 - 2:44 pm

Why not take the simpler approach of a multivariate model instead of a multilevel model? That is, you have 2 Xs and 2 Ys, one for each child.

You can use 3 sources for guidance. First, each UG example has a Monte Carlo version on our website. Second, you can study the article on our website:

Muthén, L.K. & Muthén, B.O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 4, 599-620. Mplus inputs and outputs used in this paper can be viewed and/or downloaded from the Examples page.
download paper contact first author show abstract

Third, you can study our new book:

http://www.statmodel.com/Mplus_Book.shtml

Regarding your question 3), you can specify [y$1*c] where c is the value of the threshold that gives you the percentage you want.

Ryan Snead posted on Monday, August 21, 2017 - 10:38 am

Good afternoon!

I am running a post hoc power analysis on a path model using complex survey data. When I run the path model for my analysis I use TYPE = COMPLEX. My Monte Carlo syntax specifies the correct NCSIZES/ CSIZES, but gives me an error for TYPE = COMPLEX and says to use TWOLEVEL. Using the latter type requires me to insert between/within syntax that I don't think is relevant for my model. Is there a different method you would recommend that still allows me to account for clustering in the data?

Thanks for your help in advance! I attended the Wednesday workshop at Hopkins last week. It was very helpful!

Linda K. Muthen posted on Monday, August 21, 2017 - 10:51 am

Clustered data are generated using TWOLEVEL even though you want to analyze the model using COMPLEX. Examples 12.6 shows this two-step procedure for a growth model. You would do it for a path model.

Ryan Snead posted on Tuesday, August 22, 2017 - 7:31 am

Linda,

Thank your for your quick response and helpful recommendation. I have two follow up questions. 1) Our sample is multilevel because the design links mothers to providers. Is it necessary to specify between/within slope and intercept like in the 12.6 example? Is what I have come up with satisfactory given only one level 2 variable (x1)? 2) If I wanted to view the results given a larger sample, say 100 to 400, would I need to proportionally increase my CSIZES in this first step?

MONTECARLO:
NAMES = x1 x2 x3 x4 y1 y2 y3;
NOBSERVATIONS = 100;
NREPS = 1000;
SEED = 8675309;
POPULATION = estimates.dat;
COVERAGE = estimates.dat;
NCSIZES = 4;
CSIZES = 25 (1) 12 (2) 13 (3) 3 (4);

ANALYSIS: TYPE = TWOLEVEL;

MODEL POPULATION: ! current path model

y1 ON x1 x2;
y2 ON y1;
y3 ON x1 x2 x3 x4 y1 y2;

y1 WITH y2@0;
y1 WITH y3@0;
y2 WITH y3@0;

MODEL POPULATION: ! two level path model

%within%
y1 ON x2;
y2 ON y1;
y3 ON x2 x3 x4 y1 y2;

y1 WITH y2@0;
y1 WITH y3@0;
y2 WITH y3@0;

%between%
y1 ON x1;
y3 ON x1;

Bengt O. Muthen posted on Tuesday, August 22, 2017 - 6:04 pm

Yes, you need to proportionally increase CSIZES. Note that you have very small cluster sizes of 1, 2, 3, and 4 subjects. And you are estimating about 10 parameters within cluster - that is, the small cluster sizes can lead to poor estimates.

Regarding the model, I note that y2 is not on the Within list while at the same time it isn't mentioned on Between.

Ryan Snead posted on Thursday, August 24, 2017 - 6:45 am

Thank you for your help!

Andrea Norcini Pala posted on Saturday, January 13, 2018 - 1:37 pm

Hello,
I used Monte Carlo simulation to determine the power of multiple linear regression with 1 continuous dependent variable, 3 continuous independent variables and their interaction (4 interaction terms) (sample N=100). Since I did not find a syntax to generate interaction terms with observed variables, I generated them assuming that the 3 independent variables are latent (3 indicators each). Then, I saved the samples generated and analyzed the data treating the latent factors and their interactions as observed variables (using the DEFINE syntax). To minimize the measurement error when the latent factors are treated as observed, each factor loading is 0.95 (residual variance 0.09). Do you see any problem with this approach? Thanks!

MONTECARLO:
NAMES ARE Y X1-X9;
NOBSERVATIONS = 100;
NREPS = 100;
MODEL POPULATION:
f1 by x1-x3*0.95;
f2 by x4-x6*0.95;
f3 by x7-x9*0.95;
x1-x9*0.09;
f1 f2 with f3*.40;
f1 with f2*.40;
f1-f3@1;
int1 | f1 xwith f2;
int2 | f3 xwith f2;
int3 | f1 xwith f3;
int4 | int1 xwith f3;
Y on f1*0.20 f2*0.20 f3*0.20 int1*0.20 int2*0.20 int3*0.20 int4*0.20;
Y*.72;

Bengt O. Muthen posted on Saturday, January 13, 2018 - 4:51 pm

A simpler way is to do a 2-step Monte Carlo run. Step 1: generate all needed variables and save the data sets. Step 2: Do an external Monte Carlo run which uses Define to create the variables. See the V8 UG ex 12.16 for how to do a 2-step Monte Carlo.

Andrea Norcini Pala posted on Sunday, January 14, 2018 - 7:44 am

Thank you so much for the prompt response. I could not find the example you refer to. However, I did use the 2-step Monte Carlo but I only posted the step 1. The second step looks like this:

variable:
NAMES ARE Y X1-X9;
USEVARIABLES ARE Y f1 f2 f3 int1 int2 int3 int4;

DEFINE:
f1=mean(x1-x3);
f2=mean(x4-x6);
f3=mean(x7-x9);
INT1=f1*f2;
INT2=f1*f3;
INT3=f3*f2;
int4=f1*f2*f3;
standardize f1-f3;
MODEL:
[f1-int4@0];
f1-int4@1;
Y ON f1-int4*.2;
y*0.72;

Unless you meant that there is a simpler way to generate interaction terms.
Thank you!

Bengt O. Muthen posted on Sunday, January 14, 2018 - 10:39 am

You find the UG ex 12.6 in the V8 UG that is posted on our website. This shows what step 1 and step means and how they are carried out. In the second step, you would use the Define command to created the interaction. for instance:

Define:

inter = x*y;

Kelly S posted on Monday, April 02, 2018 - 4:59 pm

Hello,

I am using multilevel sem and trying to check for mediation. Mplus suggests using Monte Carlo simulation for my hypothesised model (assuming due to the number of degrees of freedom). However, no modification indices are available for Monte Carlo and multilevel analysis. How can I check my model's goodness of fit? At the moment I only get AIC and BIC, which are not easily interpreted... how can I check RMSEA, SMMR, CFI and TLI? Any ideas?

Thank you in advance

Bengt O. Muthen posted on Tuesday, April 03, 2018 - 11:04 am

Perhaps you are referring to Monte Carlo integration when using ML. If fit indices are not given it is because the usual sufficient statistics are not relevant in your case. You can always work with neighboring models that have fewer restrictions (e.g. more paths) and test the significance of those extra parameters.

Jennie Jester posted on Wednesday, May 09, 2018 - 2:01 pm

How can I get an effect size for an individual predictor in a regression, using TYPE=COMPLEX?

Bengt O. Muthen posted on Thursday, May 10, 2018 - 2:45 pm

Just express the effect size in Model Constraint and it takes into account Complex.

Raheem Paxton posted on Thursday, July 05, 2018 - 2:28 pm

Greetings,

I’m running a multigroup factor analysis on national data will cluster, weight, and stratum variables. Given a small sample size of our minority group (n=126 for minority, n=400 for majority), we wanted to generate new data using montecarlo simulation that can be used for additional analysis. Ran into a few problems. Please see below

Multigroup factor analysis on original data ran successfully. Sample code used, which ran correctly. Estimates produced.

I’m not lost on how to proceed with generating new data using montecarlo simulation. Based on the book, I entered the following code, but I’m not able to run the analysis. I looked at the file that I created, and it only has 12 data columns, so I’m not sure whats actually going on here. Thank you for your time and I look forward to your response.
MONTECARLO: NAMES ARE ID WT STRAT CLUS RACE U1-U13;
NOBSERVATIONS = 500 500;
NREPS = 500;
SEED = 12345;
NGROUPS=2;
POPULATION = NAME.dat;
COVERAGE = NAME.dat;
MODEL POPULATION:
F1 - F3 BY U1-U13 (*1);
[F1-F3@0];

MODEL POPULATION-g2:
F1 - F3 BY RU1-U13 (*1);

MODEL: F1 BY U1-U2 (*.35);
F2 BY U3-U5 (*.40);
F3 BY U6-U10 (*.45);

MODEL g2:

F1 BY U1-U2 (*.35);
F2 BY U3-U5 (*.40);
F3 BY U6-U10 (*.45);

Bengt O. Muthen posted on Thursday, July 05, 2018 - 5:49 pm

You need to give (residual) variances for F and the U's.

Raheem Paxton posted on Saturday, July 07, 2018 - 9:52 am

Thank you for your response. I tried something different to make it easier for me understand how to go about doing this. Im trying to follow example 12.7 in the book, but ran an ESEM. Given that I'm using national data, strata, weight, and cluster variables are included. The data are also categorical.

The model computed and I saved the estimated using the SAVEDATA: ESTIMATES= command. This model contained 9 variables.
f1-f3 by y1-y9 (*1);

However, my Montecarlo simulation following Example 12.7 step two using ESEM is giving me the following error statement:
The COVERAGE file contains too much data. Check your data.
File: MONTEGRA.dat

please help

Bengt O. Muthen posted on Sunday, July 08, 2018 - 5:07 pm

Please send relevant files to Support along with your license number.

Joseph Harris posted on Thursday, August 23, 2018 - 2:29 pm

I'm trying to calculate the power for a cross-classified model but I think there must be something I'm not understanding about the CSIZES and NOBSERVATIONS options. I've run the model and level2a has 245 clusters across 5 sizes and level2b has 69 clusters across 15 sizes. I also have a total of 396 observations and 1 observation per cell.

I understood from the UG that when specifying CSIZES that the numbers outside the brackets should sum to level2b's clusters and the numbers inside the brackets should sum to level2a's clusters. And that the NCSIZES should describe the number of unique cluster sizes for each level. But I can't seem to get it to where the number of clusters are correct at the same time as the number of observations.

NCSIZES = 15[5]
CSIZES = 5[2(1)] 4[14(1)] 3[10(1)] 2[81(1)] 1[138(1)]
NOBSERVATIONS = 396

In this iteration the numbers inside the brackets sum to level2a's clusters, and the observations are correct, but the numbers outside the brackets sum to 15 when I thought they should sum to 69. Hopefully someone can spot the mistake I'm making.

Thanks for the help.

Linda K. Muthen posted on Friday, August 24, 2018 - 9:40 am

Please send the full output and your license number to support@statmodel.com.

Michael Strambler posted on Saturday, October 06, 2018 - 1:35 pm

When I run the following model and vary the # of clusters I'm finding that power for the dv1 ON fw paths do not change from 1 and the coverage is always above .91. This seems curious and highly unlikely to me especially since I even see this when I lower the number of clusters to 20. In fact, the only thing I see worsen is SRMR-between. Can you explain why this is? Is there something off in my code? Thanks.

MONTECARLO:
NAMES ARE y1-y10 exp1 exp2 exp3 dv1 dv2;
CSIZES=30(20);
NOBS = 600;
NREPS = 1000;
SEED = 4533;
NCSIZES =1;
WITHIN ARE exp1 exp2 exp3 y1-y10;
ANALYSIS: TYPE IS TWOLEVEL RANDOM;

MODEL POPULATION:
%WITHIN%
fW by y1*0.5 y2*0.5 y3*0.5 y4*0.5 y5*0.5 y6*0.5 y7*.50 y8*.50 y9*.50 y10*.50;
fw@1;
fw ON exp1*.20 exp2*.20 exp3*.20;
dv1 ON fw*.3;
dv2 ON fw*.3;
exp1*0.60 exp2*0.60 exp3*0.60;
y1-y10*.75;
dv1-dv2*.91;
%BETWEEN%
dv1-dv2@.25;

MODEL:
%WITHIN%
fW by y1*0.5 y2*0.5 y3*0.5 y4*0.5 y5*0.5 y6*0.5 y7*.50 y8*.50 y9*.50 y10*.50;
fw@1;
fw ON exp1*.20 exp2*.20 exp3*.20;
dv1 ON fw*.3;
dv2 ON fw*.3;
y1-y10*.75;
dv1-dv2*.91;
%BETWEEN%
dv1-dv2@.25;

Bengt O. Muthen posted on Sunday, October 07, 2018 - 12:41 pm

I wonder why in your Model command you fix residual variances:

%BETWEEN%
dv1-dv2@.25;

Perhaps the results are also affected by having 20 observations per cluster. And where the indicators for fw have zero intraclass correlations so essentially #clusters * 20 independent observations

Michael Strambler posted on Sunday, October 07, 2018 - 4:02 pm

Thanks for the response. A few additional questions regarding your response...

1. Shouldn't the small number of obs per cluster results in lower power?
2. I fixed the dv1-dv2 at .25 because I wanted an ICC of .20 for the DVs, but I guess I should have freely estimated it. Is that correct?
3. I see your point about 0 ICC on the indicators. If I want those to also have an ICC of .20, would I take them off the WITHIN line and just list them on BETWEEN with a variance of .25 so the code looks like the following? If that's correct, I'm still getting similar results (very high power with small # of clusters) and I'm not sure why.

MODEL POPULATION:
%WITHIN%
fW by y1*0.5 y2*0.5 y3*0.5 y4*0.5 y5*0.5 y6*0.5 y7*.50 y8*.50 y9*.50 y10*.50;
fw@1;
fw ON exp1*.20 exp2*.20 exp3*.20;
dv1 ON fw*.3;
dv2 ON fw*.3;
exp1*0.60 exp2*0.60 exp3*0.60;
y1-y10*.75;
dv1-dv2*.91;
%BETWEEN%
dv1-dv2*.25;
y1-y10*.25;

MODEL:
%WITHIN%
fW by y1*0.5 y2*0.5 y3*0.5 y4*0.5 y5*0.5 y6*0.5 y7*.50 y8*.50 y9*.50 y10*.50;
fw@1;
fw ON exp1*.20 exp2*.20 exp3*.20;
dv1 ON fw*.3;
dv2 ON fw*.3;
y1-y10*.75;
dv1-dv2*.91;
%BETWEEN%
dv1-dv2*.25;
y1-y10*.25;

Bengt O. Muthen posted on Monday, October 08, 2018 - 10:53 am

1. 20 per cluster isn't that small when you consider that you have many clusters.

2. Yes, free them.

3. Looks right - you can send your output to Support along with your license number.

Sara Riese posted on Thursday, October 25, 2018 - 6:11 am

Hello,
I am running a Monte Carlo simulation to estimate power for a multilevel path analysis with a binary outcome, using ESTIMATOR=MLR, LINK=LOGIT, and TYPE=TWOLEVEL.

I want to be sure I am entering the hypothesized associations correctly at each level. At the within-level, I am entering the hypothesized log odds of the outcome, while at the between-level it is the linear regression coefficient. Is that correct?

Thank you!

Bengt O. Muthen posted on Thursday, October 25, 2018 - 4:14 pm

Sounds right.

Sara Riese posted on Friday, October 26, 2018 - 10:40 am

Thank you! Now when I run the MC simulation, most of the population parameters are 0, not the value from the model population statement. Below is my code (i removed the mean/variance values for size), do you see what could be causing this?

MONTECARLO:

NAMES = inf hr techany inter educ
firstanc parity dist level public idl;
GENERATE = idl(1);
categorical = idl;
nobservations = 1857;
ncsizes = 12;
csize = 7(1) 26(2) 15(3) 12(4)
243(5) 10(6) 8(7) 7(8)
9(9) 20(10) 1(11) 2(13);

seed = 1314;
nreps = 10;

BETWEEN = inf hr public level;
WITHIN = firstanc educ parity dist;

SAVE = MC_idl.dat;

ANALYSIS:

TYPE = TWOLEVEL;
ALGORITHM = INT;
ESTIMATOR = MLR;
LINK = LOGIT;

model population:

%WITHIN%
!variances/means of all within covariates here

!covariances of within covars here

!hypothesized values
idl ON educ*.9 firstanc*.2 parity*.9 dist*.9 techany*.54 inter*.54;

%BETWEEN%
!hypothesized values here
idl ON inf*.3 hr*.3 techany*.5 inter*.5 public*.1 level*.5;
techany ON inf*.5 hr*.3;
inter ON inf*.1 hr*.1;

!variances/ means of all between covariates here

!covariances for btwn level here

Bengt O. Muthen posted on Friday, October 26, 2018 - 3:22 pm

You need the Model command - typically you just copy the Model Population statements.

George Saunders posted on Monday, February 11, 2019 - 9:15 am

If I have three between-subjects covariates (Categorical) that I wish to use in the model: race (with 4 levels), gender (with 3 levels), proactive personality (with 5 levels). How do I specify this with? Should I be using the ngroups or NCSIZES commands? I would like all those between-subjects variables present in the model. I would like all the groups to be balanced with the same number of participants. Thank you.

Bengt O. Muthen posted on Monday, February 11, 2019 - 4:49 pm

Use multiple groups with the NGROUPS option (not NCSIZES).

George Saunders posted on Monday, February 11, 2019 - 5:31 pm

Sorry for the beginner questions but: how, then, do I account for the fact that there is only one ngroups statement but there are several factors (3)? Do I just use the ngroups statement 3 times, once for each covariate? Or, multiple all my levels (5x4x4) and thus have 80 groups (ngroups=80)? And then how do would I code the model population statements after this? Thank a lot for your assistance.

Bengt O. Muthen posted on Tuesday, February 12, 2019 - 5:19 pm

Yes, this would lead to 80 groups (or 60=4*3*5). Multiple-group Monte Carlo examples are shown on our website for the UG examples.

Y.A. posted on Saturday, March 30, 2019 - 3:21 am

Dear Prof. Muthen,

I am trying a montecarlo simulation with a two level model, I have tried to give all the variance I can think of a starting value, I still get the fatal error. Could you give me some suggestions please? Thank you very much.

montecarlo:
names=cwb unhappy stress mindful;
within=stress cwb;
between=mindful;
nobserv=400;
ncsizes=1;
csizes=40(10);
nreps=1000;
seed=57456;

ANALYSIS:
TYPE = TWOLEVEL RANDOM;

model population:
%within%
unhappy@1 stress@1;
cwb on unhappy*0.6 stress*0.3;
s|unhappy on stress;
cwb*1;
%between%
mindful@1;
s on mindful*0.3; [s*0.3]; s*0.1;
unhappy on mindful*0.2; unhappy*0.2;
unhappy with s*0.2;

model:
%within%
cwb on unhappy(b)
stress;
s|unhappy on stress;
%between%
s on mindful(a1);
[s](a0);
unhappy on mindful;
unhappy with s;

*** FATAL ERROR
THE POPULATION COVARIANCE MATRIX THAT YOU GAVE
AS INPUT IS NOT POSITIVE DEFINITE AS IT SHOULD BE.

Michael Strambler posted on Sunday, March 31, 2019 - 9:32 am

When modeling a MCFA at the WITHIN level (just variances listed on BETWEEN) with items not declared on WITHIN or BETWEEN, is the SRMR-between meaningful? if so, in what way? Thank you.

Bengt O. Muthen posted on Sunday, March 31, 2019 - 3:13 pm

Answer for YA:

Your statements

s*0.1;
unhappy*0.2;
unhappy with s*0.2;

create a correlation greater than 1.

Tihomir Asparouhov posted on Monday, April 01, 2019 - 9:03 am

Answer to Michael Strambler:

Take a look at section 2.8 in

http://www.statmodel.com/download/SRMR2.pdf

for the general definition. If you have only variances on the between level then SRMR on the between level would be an approximate test for the covariances=0. If you have covariances as well then it would be a "test/check" whether potential misspecifications on the within level have distorted the within-between split.

Maria Pavlova posted on Tuesday, April 07, 2020 - 10:33 am

Hi all,

yet another question concerning a Monte-Carlo simulation. Am I right that there is no way to combine twolevel analysis with multiple group analysis if the DV is ordered categorical? I am trying to simulate an intervention study where each group has a different number of observations per person (dynamic wait-listed design). Repeated measures are nested within participants. I want to look at within-level effects of intervention on an ordinal DV and at the between-group differences in the rate of DV. It appears that I cannot specify different cluster sizes for different groups because I have to use genclasses instead of ngroups, and the program doesn't seem to allow group-specific number of observations and cluster sizes for a model with classes.

Bengt O. Muthen posted on Tuesday, April 07, 2020 - 4:58 pm

Have you tried ML and Knownclass?

Maria Pavlova posted on Wednesday, April 08, 2020 - 12:41 am

Hi Bengt,

I am talking about Monte-Carlo simulation. If I try MLR with the following syntax (excerpt),

montecarlo:
nobservations = 60000 70000 80000 90000;
genclasses = c(4 b);
classes = c(4);
ncsizes = 1 | 1 | 1 | 1;
csizes = 10000(6) | 10000(7) | 10000(8) | 10000(9);
names = intent motiv voli c;
generate = intent(4) ;
cutpoints = motiv(-0.43) voli(0.43) | motiv(-0.18) voli(0.565) |
motiv(0) voli(0.675) | motiv(0.14) voli(0.765);
within = motiv voli;
between = c;
categorical = intent;
analysis: type = twolevel mixture;
estimator = mlr;

the program produces an error message:
ERROR in MONTECARLO command
The number of observations per group has exceeded the number of groups.

Tihomir Asparouhov posted on Thursday, April 09, 2020 - 3:24 pm

You are mixing multiple group and mixture setups.

Start with something like that

montecarlo:
nobservations = 300000;
genclasses = c(4 b);
classes = c(4);
ncsizes = 4;
csizes = 10000(6) 10000(7) 10000(8) 10000(9);
names = intent motiv voli;
generate = intent(4) ;
within = motiv voli;
between = c;

The cut points feature can be implemented using categorical variables.

Alternatively - drop the mixture setup - generate it as multiple group and later analyze as mixture. I am not sure really if you want 4 groups or 4x4 = 16 separate groups (mixture classes times 4 groups)

margot peeters posted on Thursday, August 27, 2020 - 5:05 am

Dear all,

I am trying to run a Monte Carlo Simulation study to determine sample size for a planned study. It is multilevel multigroup model.

I have applied (real)data for all parameters (b's and variances) from a previous study that is lacking power/sample size for this model.

I have been trying many combinations (leaving some variables out), however, I continue to receive the message

*** FATAL ERROR
THE POPULATION COVARIANCE MATRIX THAT YOU GAVE
AS INPUT IS NOT POSITIVE DEFINITE AS IT SHOULD BE.

Since I do not get any output it is difficult to determine which parameter is causing troubles.

Could you help me or provide some further suggestions to overcome this problem.Thank you!

margot peeters posted on Thursday, August 27, 2020 - 5:25 am

sorry, forgot to post my syntax

NOBSERVATIONS = 3840;

NREPS = 500;
SEED = 53487;
NCsizes = 1;
csizes = 160 (24);
GenCLASSES = c(4);
classes = c(4);
within is QF2 selfcontrol sex;

between is alc_norm grade;

SAVE = MSsimulmultilevel.sav;

ANALYSIS: TYPE = twolevel mixture random;
ESTIMATOR = MLR;

MODEL population:

%WITHIN%
%Overall%
selfcontrol*.14;
QF3 on QF2*.33;
! QF3 on selfcontrol*.24;
QF3 on sex*.11;

slopeself | QF3 on selfcontrol;

%BETWEEN%
%Overall%
QF3*.05;
alc_norm*.14;
QF3 on grade*-.20;
slopeself on alc_norm*.15;
QF3 on alc_norm*.60;

%c#1%
QF3 on alc_norm*1.14;

%C#2%
QF3 on alc_norm*.20;

%C#3%
QF3 on alc_norm*.38;

%C#4%
QF3 on alc_norm*.35;

OUTPUT: PATTERNS TECH9;

Bengt O. Muthen posted on Thursday, August 27, 2020 - 4:23 pm

All predictors need to be given variances in Model Population and you haven't done that for the variable sex.

margot peeters posted on Friday, August 28, 2020 - 1:29 am

Dear Bengt,

Thank you for your fast reply. I have changed my model and added variances for all predictors (including grade and QF2) but unfortunately that has not changed the error message. Are there any other options I can try?
Thank you.
Margot

Bengt O. Muthen posted on Friday, August 28, 2020 - 6:07 am

To diagnose this, we need to see your full output - send to Support along with your license number.