Anonymous posted on Friday, December 12, 2003 - 7:48 pm
I am trying to calculate power for a multilevel study-latent class growth. I have the example from the addendum to the userís manual (example is for a two group, three level model). I also have Muthén and Muthén (2002) describing how to use a monte carlo study to decide power.
A few questions in trying to integrate the two pieces of information.
Is the first step to calculate ICCs and design effect to see if I even need to do multilevel modeling?
Iím following the example from the product support area. Iíve removed the input for the second group. For the monte carlo study, do I simply run a multilevel model and get the starting values from the output to get starting values for the within and between parts of the model?
In the multilevel model, I want to examine the effect of a between variable on slope. Do I only specify that regression in the between part of the model, only in the within part or in both places?
bmuthen posted on Saturday, December 13, 2003 - 12:24 pm
Answered off line because of further questions.
Anonymous posted on Sunday, January 25, 2004 - 3:24 pm
We are presently planning a family study where we would like to use multi-level SEM to examine potential influences on the psychosocial adjustment of children (level 1) nested within family systems (level 2). In the current or upcoming versions of Mplus, is there a way to estimate nonrecursive models? We have not seen this issue mentioned in any of the references we have collected on multi-level SEM. Thank you in advance for whatever help you can provide.
bmuthen posted on Sunday, January 25, 2004 - 3:31 pm
The current version of Mplus allows nonrecursive multi-level SEM.
Anonymous posted on Wednesday, January 28, 2004 - 3:16 pm
We are planning a family study where we would like to use multilevel SEM to analyze data collected from children nested within families. I have been reading what has been written about power analysis and sample size estimation for multilevel models. In those papers, there appears to be concern about power and the number of clusters. I see little comment about power and the number of observations within each cluster other than a note that more variability in observations per cluster can have a negative effect on power. In this data set, we expect to have more that 400 families with either 1 or 2 children per cluster so a low number of clusters is not a concern. Most (70%), however, will only have 1 child per cluster. Should we be concerned about the number of clusters with only 1 observation? How might this affect power? Thank you for your reply about nonrecursive models.
bmuthen posted on Wednesday, January 28, 2004 - 3:26 pm
Having clusters with only 1 child is not a concern. They do not contribute information to the estimation of parameters related to within-cluster (child) variation, but do contribute to the estimation of cluster-related (family) parameters. So the question is for which type of parameter you want high power. If you have a small number of families with more than 1 child and want high power for within-family parameters, there might be a problem. As you say, you are likely to have good power for family-level parameters. You can study these power matters using the Mplus Monte Carlo facility if you have a notion of the approximate model and parameter values.
David Bard posted on Thursday, May 18, 2006 - 10:32 am
I am trying to run a power analysis for a longitudinal RCT where data are expected to fit a latent difference score SEM (McArdle and colleagues invention- an upcoming SEM pub cited below). This has forced me to ask some hard questions about the structural relations between variables at the within and between levels using the Mplus 2-level structure. Can you share some insight?
1) I'm finding it very hard to decide whether or not the SEM model auto-regressive and coupling effects (for bivariate growth) need to be constrained to be equal at both levels in the same way that Muthen (1997) suggests constraining the slope and indicator effects to be the same in clustered LGM. How does one go about deciding such things even for more typical multilevel SEM situations? In other words, when should loadings and regression effects be constrained across levels, and when should they be allowed to vary? Is this purely an empirical issue, or should theory drive model specification.
2) In Muthen & Muthen (2002), without 3rd level (or Mplus' 2nd level) clustering, the authors show how to run a monte carlo power analysis for an LGM with a binary covariate.
2a) Slope effect size was defined as the difference in the average group slope estimates divided by the variance of the latent slope term. The difference in group slopes is directly related to the regression of the slope term on the "binary" covariate, which if scored 0/1, has a mean of .5 and a variance of .25. Using these estimates, regression effects of .2 and .1 were chosen to represent medium and small effect sizes. These effects appear in the appendix Mplus code, but the covariate is scaled with a variance of 1 and a mean of 0 and not .25 and .5 for data generation. 2ai). Is the monte carlo SEM used to generate the outcome data modeling the covariate as dichotomous 0/1 (or perhaps now, -1/1, so that mean=0 and var=1), or continuous- only later to be transformed into the desired dichotomy? 2aii). Regardless, does this variance rescaling not change the effect size?
2b) When trying to plug in reasonable estimates of slope and intercept residual variability, does it make sense to use the Muthen & Muthen (2002) estimates at the within level, and use a ratio of the between to within variation equal to the expected ICC? For example, I expect an ICC of around .01 (at least for time 1 measurements), so does it sound reasonable to set the between slope residual variance equal to .01*V(WS)- i.e., .01 times the estimated within slope residual variance (which = .09 in Muthen & Muthen, 2002).
McArdle, J. J. (in press). Latent difference score approach to longitudinal dynamic structural analysis. Structural Equation Modeling.
David Bard posted on Thursday, May 18, 2006 - 10:39 am
Oh, sorry, one more question related to my first. In older applications of these multilevel models (examples on the Mplus homepage for clustered LGM), residual variation for the observed outcomes is modeled at both the within and between levels; yet, in newer applications, this variation is only modeled in the within level (examples included in newest version of Mplus manual). How does one make these decisions?
1. Growth models imply a mean structure for the variable that is repeatedly measured over time. If you look at the 3-level version of the growth model in standard multilevel books, you see that it has time scores that multiply both the means and residuals of the growth factors. The latter part ends up on within and the former on between, but the two parts are using the same time scores; hence the across-level equality in Mplus. Other models such as McArdle's have to be studied from this point of view to see if the same model-implied equality across levels holds. I have not yet read the article.
2a The x is first generated as N(0,1) and then dichotomized. It is the dichotomized version that is used in the model to generate the data.
Yes this is reasonable.
The between-level residual variances are often close to zero. In conventional 3-level growth modeling such as HLM, they are fixed at zero. Mplus allows them to be estimated and sometimes that is important. However, with categorical and other outcomes it is computationally time-saving to restrict them to zero.
David Bard posted on Friday, May 19, 2006 - 7:59 am
Thank you. Those are excellent explanations. Two quick follow-ups: I cannot seem to get a dichotomous variable to run at the between level in my SEM model. I keep getting the message:
Only x-variables may have cutpoints in this type of analysis. Cutpoints for variable W were specified.
Yet, when I run the LGM example 11.6 from the manual, I do not get this message. Are cutpoint variables at the between level only allowed for LGMs?
Also, if you create a dichotomous variable in the monte carlo routine, is the default scoring 0/1? Does rescaling the variable to a mean of 0 and variance of 1 in the model syntax make the dichotomous variable -1/1?
You do not create binary dependent variables using the CUTPOINT option. This is for independent variables only. The GENERATE option is used to generate binary dependent variables. See the user's guide for further information and examples. If you have further problems, send your input and license number to firstname.lastname@example.org.
David Bard posted on Saturday, May 20, 2006 - 3:57 pm
I'm sorry I should have been more clear, I was creating a binary independent variable and have now discovered it works fine as long as the rescaling and mean change sytnax (e.g., [w@0] and w@1) is left out of the model. Now that it's working, I have another dilemma. I'm interested in determining power for a clinical trial where block-randomization is used to ensure equal number of treatment and control clinics (only 10 clinics enrolled). Unfortunately, with the syntax we've discussed thus far, my treament variable is not always evenly distributed across clusters. Is there a way to avoid this in the monte carlo routine?
You can use a multiple-group approach where you work with a fixed sample size for the treatment group and for the control group.
David Bard posted on Thursday, July 13, 2006 - 1:07 pm
I'm back to a similar problem described above (May 20). I'm trying to fit the multiple-group approach but having trouble with the correct sytnax for this type of monte carlo. I keep getting the following error message when using the Nobservations, Ngroups, and CSIZES syntax below.
NOBSERVATIONS = 300 300; NGROUPS =2; . . . NCSIZES = 3; CSIZES = 9 (30) 8 (20) 17 (10); !cluster sizes or CSIZES = 9 (15) 8 (10) 17 (5); !cluster sizes within each group or even NCSIZES = 1; CSIZES = 50 (12); !cluster sizes or CSIZES = 50 (6); !cluster size within each group
***The number of sets of cluster information does not match the number of groups.
For multiple group analysis, you need to specify the number of unique cluster sizes for each group and the sizes for each group. This is done using the | symbol. See Chapter 18 where the NCSIZES and CSIZES options are discussed for an example of how to use these options for multiple group analysis.
I'm interested in doing a power analysis for a RCT comparing group tx vs. individual tx (block randomization). Would you have a good quick reference that could help in setting up this power analysis? I have read the info on the website for how to set this up using the Monte Carlo simmulation method, but wondered if there was an applied example that I could also refer to. I am also concerned with looking at mediators and moderators in this RCT and trying to determine power for these analyses as well. Any suggestsions would be greatly appreciated.
Thank you for the reference, it is very helpful. Should there be any concern with calculating power for a design where individuals nested within groups for the group tx arm of the study but not for the individual arm of the study.
UG Ex11.4 shows how to do MC studies in a 2-level setting. You can build on that example. If one of the groups does not have clustering in this way, you could probably use "external" MC analyzing data generated for the tx arm (2-level) separately from the other arm (single-level) - and then send the combined data to MC.
Thomas Olino posted on Wednesday, November 29, 2006 - 11:19 am
I am attempting to estimate power for moderational analysis in a multilevel model. Would it make sense to generate data based on main effects, then use those generated data to compute the interaction for each of the generated data sets, and then see how often the interaction turned out signifcant?
Bengt, I have followed your advice from Nov13-14 and run two sepearte MC studies to look at power within each arm of the study (first grp then individual). I'm now trying to build upon UG ex11.4 as you suggested with the group and individual data. I'm interested in (1) the power to detect difference in outcome btwn tx (grp vs ind), and (2) whether a mediator (social support) is stronger in grp vs tx. I'm not sure exactly how to set this up in part because I'm not sure what parameter gives me the test for (1). Do I set it up as ex11. but with:
NCSIZES = 2; CSIZES = 10 (8) 1 (80);
and then interpret the effects of w in the model or do I also create a categorical X variable for tx condition?
I think I could figure out (2), if I was sure about (1), but any suggestions would be helpful.
I see now my typo and appologize. I can run everything as you suggested, with individual and group data sets generated using SAVE/REPSAVE comands and then input into an "external MC." Do you have any thoughts on number (2) the example of moderated mediation I mentioned on the 6th?
I think your question of the 6th was how to assess if a mediator is stronger in one group than the other. You can do this via Model constraint using the New option to create a new parameter corresponding to the difference between the 2 mediator estimates. You will then get a monte carlo summary of this new parameter to see how often equality of the mediator estimates is rejected (see last output column).
Examples 9.3, 9.4, and 9.5 are two-level path analyses. They have Monte Carlo counterparts. These come with the Mplus CD and are also available on the website. This type of Monte Carlo analysis has been available for several years.
Boliang Guo posted on Thursday, January 11, 2007 - 10:29 am
Boliang Guo posted on Thursday, January 11, 2007 - 12:10 pm
hello again Prof Muthen, I add model constraint to a 2 level path analysis MC model for mediation computation, I compute the estimates from Kennys data in step 1, then in step 2 for MC estimate, when I add same constraint to both model population and model, Mplus say, 'A parameter label has been redeclared in MODEL CONSTRAINT.' if I remove constraint command from Model population or from model,say, leave only one constraint command in model or model population, then, mplus work and the result is same whereever I put model constraint in model or model population. my question is: in MC analysis, 'model constraint' command is only for Model not for model population, right? thanks.
I am doing a Multilevel SEM Monte Carlo study to test the accuracy of estimation with small group sample sizes. I have a 2-level factor model with 4 indicators (measured at the individual level). Both at the within and between levels, two independent variables have an effect on the latent factor.
Up till now, the indicators (at the individual level) were assumed to follow normal distributions. However, I want to introduce nonnormality into the data generation model by manipulating kurtosis and skewness of the indicators.
In the papers on this site if found that you suggest using a mixture of subpopulations for which the latent factor follows a normal distribution with different parameters. Am I getting this more or less right?
If so, there are a couple of things I do not see clearly yet... 1. Can this mixture approach also be applied in the case of multilevel sem? 2. How do you choose the proportions and means/variances of the subpopulations to assure that certain skewness and kurtosis values are obtained? 3. Is it possible for indicators that load on the same latent variable to have divergent skewness and curtosis values?
1. Yes. You might want to start with the Monte Carlo counterpart of Example 10.2. You can generate the data as two classes and analyze it as one class. 2. Unfortunately, this is trial and error. 3. Yes.
Hopefully a final question wrt a Monte Carlo study for a two-level SEM model with small group sample sizes.
I want to report population within and between covariance matrices. Elsewhere on this discussion board, I found that for one-level Monte Carlo studies, population covariance matrices can be obtained by running a separate analysis with all parameters constrained to the population values, and the identity matrix as input file ('type is covariance').
However, to me it seems that this 'trick' cannot be used in the case of two-level models, because these models do not allow a covariance matrix as input (is this right?).
Instead, I thought of using one of the datasets that were generated during the monte carlo as input file (and of course also constraining all parameters to the population values). Is this a correct way to obtain population covariance matrices?
On page 604 of Muthen and Muthen (2002) you state that the R^2 value for the intercept growth factor is 0.20. Could you clarify how you arrived at this value. I've read through it and seem to be missing something. Thank you!
Thank you Linda. Could you also tell me how you arrived at an effect size of 0.63 for a regression slope coefficient of 0.2? My calculations are not no coming out. Also, why are you interested in both an effect and R^2 for the intercept?
We look at R-square for both the intercept and slope growth factors. We look at effect size for the regression coefficient in the regression of the slope growth factor on the covariate because that is the parameter for which power will be assessed. Effect size is used to gauge the size of the effect. The effect size is the difference in the slope means for the two values of the covariate divided by the standard deviation of the slope growth factor:
This is not a multilevel question but it seemed to fit with Thomas Schmitte's post so I'm placing it here.
I would like to run a monte carlo LGM for power estimation with missing data similar to examples 3 and 5 in Muthen and Muthen (2002). However, the estimates for missing data within these examples (12% at T1, 18% at T2, 27% at T3, and 50% at T4) is too high for my purposes. Here is the input for missingness from those examples:
I assume that the -2, - 1.5, etc. are threshold values for degree of missingness. How can I vary these? Our pilot work suggests retention and data completion rates of .80-.90 across 5 time waves. I don't want to run these assuming no missing data either.
says that the intercept is -1 in the logistic regression for the probability of missing at y3. If we have no covariates in MODEL MISSING, or if we consider the covariate(s) at the value zero, this -1 intercept value translates to 0.27 of having missing y3. A lower logit intercept (say -2) gives a lower probability. Use the formula
L = log(p/(1-p))
to go from the probability p to the logit L. So if you want p=.10 you get L=-2.2.
Now, if you want x to influence missingness, which you show, then the missingness percentage goes up for increasing slope on x. You have to experiment to get the values you want. The easiest approach is trial and error using one replication and a large sample size (say 10,000). You generate one rep and save the data, then run the data as real data and request "patterns" in the Output command to get the information about the missingness percentages.
I have a question regarding a mulilevel monte carlo simulation for comparing different computation approaches to multilevel data sets.
I have simulated data for a multilevel model with one level 1 variable (X), a level two variable (Z), and a cross level interaction (XZ) affecting the outcome (Y).
Using the external monte carlo analysis tool I want to compare a multilevel model to an aggregated regression approach.
While it is no problem to aggregate the outcome (Y) and the level 1 predictor (X) using the CLUSTER_MEAN option, I can not create an interaction term (X*Z) thereafter, due to the restriction that a variable created using the CLUSTER_MEAN function cannot be used in subsequent DEFINE statements in the same analysis.
Is there any possibility to create an interaction term (X*Z) for an aggregate regression analysis to compare it to the original cross-level interaction?
I can't think of any way to do this except by doing it for each data set separately. You might check on the website to see if Using Mplus via R under How-To might be of help.
Jan Stochl posted on Friday, November 02, 2012 - 8:10 am
Dear Linda and Bengt, I am trying to assess robustness of estimators against ignorance of clustering for multilevel data(i.e., when researchers use non-multilevel CFA for clustered data). For that purposes I have simulated datasets with different levels of ICCs (I simulated 100 datasets for each of 11 levels of between-level variability) Beside that, I am trying to show that there is no parameter bias (regardless of ICC level) when clustered data are correctly analyzed using multilevel CFA. In the case when clustered data are correctly analyzed using hierarchical approach, everything works as expected except for the data that were simulated to have ICCs close to zero. In that case I observe unexpectedly large variability within each model parameter and deflated chi-square. I use WLSMV estimator. I hypothesize that this might be consequence of how MPlus handles negative estimates of ICC which may occur when the between-level variability is very small. I cannot find anywhere what MPlus does in this case? Or do you have any other explanation?
When ICC is close to 0 that means you are dealing with nearly non-existent between level random effects and singular between level matrix. In such situations in practical setting you would want to abandon these random effects from the model. Look in tech 9 for error messages and warning in the simulation. Possible attempts to improve the estimation would include increasing the number of integration points decreasing the convergence criteria, redefining the model as a more restricted model with fewer random effects.
In principle negative ICC estimates can not occur with the WLSMV estimator since it uses the EM-algorithm, but ICC near zero means random effect variances near zero which can lead to estimation difficulties or non-convergence.
Jan Stochl posted on Wednesday, November 07, 2012 - 11:48 am
Thanks a lot Tihomir, it helped a lot!
Joop Hox posted on Tuesday, January 08, 2013 - 4:37 am
I am trying a MonteCarlo simulation with an intervention that exists only at the between level. I want the intervention to be a binary variable, so I want to use CUTPOINTS to create that. Mplus keeps telling me that CUTPOINTS does not work for dependent variables, but in fact my intervention is an independent variable. Why doesn't this work?
Here is my simulation setup:
TITLE: Mediation model 2-->1-->1 MONTECARLO: NAMES ARE atti norm behav interv; NOBSERVATIONS = 1000; NCSIZES = 1; CSIZES = 200 (5); NREP = 100; ! SEED = 100656; ! CUTPOINTS = interv(0); !cutpoints does not work? BETWEEN = interv;
MODEL POPULATION: %WITHIN% atti@1; norm@1; atti WITH norm@.2; behav ON atti@.5 norm@.5;
%BETWEEN% atti@1; norm@1; atti WITH norm@.2; behav ON atti@.5 norm@.5; interv@1; atti on interv@.5;
I suspect the you mention the variance of interv in the MODEL command which makes Mplus interpret it as a dependent variable. Remove the variance from the MODEL command.
mpduser1 posted on Friday, April 05, 2013 - 8:04 am
I am trying to perform a power analysis for a simple linear regression model (1 dichotomous / dummy predictor) where I have clustered data, but the clustering is not taken into account in the model (analytic model has been mis-specified).
My objective is to see how much power I am losing due to clustering in the data (i.e., data are not IID).
I am using Mplus 7.0.
Have I set up my models in the non-IID case correctly? I ask because preliminary results indicate that I'm getting more power using clustered, non-IID data, than IID data.
MONTECARLO: NAMES ARE y x; NOBSERVATIONS = 400; NREPS = 1000; SEED = 2013; CUTPOINTS = x(0); NCSIZES = 1; CSIZES = 40(10); WITHIN = x; BETWEEN = ;
The "increased" power you are seeing is a consequence of the standard errors being underestimated when you do not account for clustering. This is not truly increased power, you are using inappropriate/standard errors for you tests
The power estimate is not valid because you have not given any coverage values in the MODEL command. Coverage values are taken from the MODEL command. Population parameter values for data generation are taken from the MODEL POPULATION command. Your MODEL command should be:
MODEL: %WITHIN% y on x*.35; %BETWEEN% y*.20;
mpduser1 posted on Friday, April 05, 2013 - 9:35 am
Dan and Linda, thank you.
If I specify the following, am I the setting up the model appropriately, while also fixing an ICC of .20?
Also, a propos of Dan's comment, do I need to specify COMPLEX to get correct S.E.s and power calculations?
Hi Linda - How can I create an ordinal IV (representing assessment times) in a Monte Carlo analysis? I'm trying to set up a program to estimate power for a multilevel approach to longitudinal analysis, and I can't figure out how to create an ordinal predictor representing time. Thanks! -bac
Thanks, Linda - I tried that with: names are y t x ; CUTPOINTS = x (0); GENERATE = t (4 p) ;
but since time is an x-variable, I get the warning: X-variables should not be specified as categorical in the GENERATE option. The CUTPOINTS option should be used for x-variables instead. The following variable was specified as categorical in the GENERATE option: T
But CUTPOINTS can only be used for binary predictors. I need to generate (say) 5 records per case to represent 5 occasions 1-5 (or preferably, 0-4), to allow predicting y ON t. Thanks! -bac
Is it the case that for a Monte Carlo ML-SEM with a dummy-code at level 2 to reflect treatment status, that if the residual variances for the indicators are fixed at 1 then the factor variances at between and within levels become the proportion of variance at that specific level (e.g., fb2*.70, fw2*.30 with y11-y32*1 would indicate 70% of the total variance is due to between level) or is it the between+within+residual?
I'm interested in estimating sample size need to achieve power in a cross-classified or three-level model with categorical outcomes in Mplus 7.2. Reviewing Mplus's manual and statmodel, it seems like this models would need to be estimated using Bayesian estimation. Is it possible to estimate sample size in this type of model via Mplus's Monte Carlo methods?
Yes. And you find the underlying Monte Carlo runs for all the UG examples on our website.
dlsherry posted on Saturday, January 24, 2015 - 9:57 am
Hello. I am trying to run a monte carlo power analyses for a two level path model. I can get the model to run as a regular analysis but I cannot get the model to run as a monte carlo analyses. I have trying fixed the variances to zero but I keep getting the following error:
*** FATAL ERROR THE POPULATION COVARIANCE MATRIX THAT YOU GAVE AS INPUT IS NOT POSITIVE DEFINITE AS IT SHOULD BE.
Could you provide any syntax for a two level path analysis for monte carlo? I have not been able to find one.
I am trying to estimate power using monte carlo approach for a two-level (time points nested within individuals) regression model, with one within-participants preidctor (time 'x'), and one between-participants predictor (group 'w'). It is the interaction between group and time that I am interested in checking the power for. The outcome is continuous.
I am relatively new to using Mplus for multi-level analyses, so was hoping to run the syntax past you:
TITLE: two level regression MONTECARLO: NAMES ARE y x w; WITHIN = x; BETWEEN = W; NOBSERVATIONS = 5040; NCSIZES = 1; CSIZES = 120 (42); NREP = 100; ANALYSIS: TYPE = TWOLEVEL RANDOM; MODEL POPULATION: %WITHIN% S | y ON x; Y*1 X*1; %BETWEEN% y on w*0.24; S on w*0.23; MODEL: %WITHIN% S | y ON x; Y*1 X*1; %BETWEEN% y on w*0.24; S on w*0.23;
A specific questions:
I presume the regression weights (e.g., y on w*0.2) represent unstandardized values (assume I am correct and these are regression weights).
Many thanks. I have adapted the model slightly to include the mean and variance for w (its a binary variable, so mean = .5, variance = .25). I've noticed that whether I set (or leave out) the variance of Y influences the power I get and also the ICC, but I am unsure how to control this. Ideally I want to set the ICC for Y to a value around .4, how would I achieve this in this model? Syntax below:
TITLE: random intercept two level MONTECARLO: NAMES ARE y x w; WITHIN = x; BETWEEN = W; NOBSERVATIONS = 5040; NCSIZES = 1; CSIZES = 120 (42); NREP = 100; ANALYSIS: TYPE = TWOLEVEL RANDOM; MODEL POPULATION: %WITHIN% y*1; S | y ON x; %BETWEEN% y on w*0.25; S on w*0.25; w*0.25; [w*0.5]; MODEL: %WITHIN% y*1; S | y ON x; %BETWEEN% y on w*0.25; S on w*0.25; w*0.25; [w*0.5];
Sorry, just found a thread on this - so I can manipulate the ICC of Y by asjusting the ration of between and within variance for y - so this moel is reasonable:
TITLE: random intercept two level MONTECARLO: NAMES ARE y x w; WITHIN = x; BETWEEN = W; NOBSERVATIONS = 5040; NCSIZES = 1; CSIZES = 120 (42); NREP = 100; ANALYSIS: TYPE = TWOLEVEL RANDOM; MODEL POPULATION: %WITHIN% S | y ON x; y*.6; %BETWEEN% y on w*0.25; S on w*0.25; w*0.25; [w*0.5]; y*.4; MODEL: %WITHIN% S | y ON x; y*.6; %BETWEEN% y on w*0.25; S on w*0.25; w*0.25; [w*0.5]; y*.4;
Looks ok, except you want to give a variance for x on Within in Model Population (not in Model).
QianLi Xue posted on Wednesday, October 07, 2015 - 11:22 am
Hi, I would like to do a power calculation for paired data (e.g. left and right eyes) and multiple measurements are taken over time for both eyes. The effect size of interest is rate of change in one eye relative to the other eye (i.e., within-cluster effect) as a result of giving one treatment to one eye and placebo to the other). I'm trying to adapt your Example 9.12 (two-level growth model) to this paired study design. Is there a way to simulate the treatment indicator "x" to take value 1 or 0 within each subject, because treatment is only offered to one of the two eyes. In your example 9.12, the x variable can take the same value for both members of the same cluster. Thanks!
Is there a reason not to do this as a wide, single-level growth model for 2 parallel processes, one for each eye? With 2 binary treatment variables each influencing one eye.
QianLi Xue posted on Thursday, October 08, 2015 - 12:28 pm
Thanks so much for your response! I think the use of the parallel model is not able to account for the fact that if one eye gets the treatment, the other must be the control. If we use the parallel model with 2 binary treatment indicators, one for each eye, I guess I will have to constrain the path coefficients of these two treatment indicators to be the same between the two eyes (i.e., parallel processes), right? which is fine, but again, by adopting this approach, we're breaking the pairing of a treatment with a control within the same person. I'm wondering if I can use the two-level growth model, but drop cases in the simulated datasets where the treatment assignment is the same within a person (i.e., treating both eyes, or treating neither).
In which case one views the 2 eyes as statistically equivalent - which is perhaps ok.
Noah Emery posted on Sunday, August 07, 2016 - 5:47 pm
I am attempting to conduct a Monte Carlo simulation for a two-level model with random intercepts and a 1-1-1 mediation. As part of this, I want to generate a data set then analyze it to get standardized effects. However, I am struggling to create an ID variable to use as the cluster for level two. Is there a command that can accomplish this? I have looked around but have been unsuccessful locating one thus far.
Because you have no factor on Between, I assume you want the icc for the Y variables, not for the factor. But you probably still want the Y intercepts of Between to correlate (we know that aggregates typically do).
But to answer the question, try:
%WITHIN% F1@1; F1 BY Y1-Y5*0.75; Y1-Y5*0.4375; ! this makes the within variances = 1 %BETWEEN% Y1-Y5*x;
******* The analysis model: ANALYSIS: TYPE IS COMPLEX;
MODEL: f1 BY y1-y18*.75 y19-y90*0 (*1); f2 BY y1-y18*0 y19-y36*.75 y37-y90*0 (*1); f3 BY y1-y36*0 y37-y54*.75 y55-y90*0 (*1); f4 BY y1-y54*0 y55-y72*.75 y73-y90*0 (*1); f5 BY y1-y72*0 y73-y90*.75 (*1); f1-f5 with f1-f5*.85;
(part 2 of 2) So I'm going for standardized factor loadings of .75. But I'm typically getting standardized loadings of ~.6. Factor correlations are also low.
The below is for the ICC of .1. With an ICC of .01 in the generation (BETWEEN variances of Y at .01, no other changes), the standardized loadings are around .70-.71.
Is analyzing with EFA (vs CFA) part of the problem? This is a large model, so I'm only running 100 samples, ~90 of which converge, but the results are consistent across all loadings on all factors, so it seems to be "real."
I appreciate your help. *******
ESTIMATES S. E. % Sig Average Std. Dev. Average Coeff
F1 BY Y1 0.607 0.140 0.190 0.809 Y2 0.611 0.117 0.185 0.831 Y3 0.608 0.134 0.195 0.787 Y4 0.591 0.152 0.187 0.798 Y5 0.605 0.161 0.165 0.787 etc.
I think it has to do with using EFA. The default Geomin rotation criterion for a simple structure may not be compatible with your pattern of loadings. It sounds like Geomin gives you higher loadings in the zero spots and lower factor correlations; the non-zero loadings give you higher observed variable variances and therefore lower standardized loadings.
We mention this issue briefly in the FAQ
Lambda is not compatible with the notion of simplicity of the rotation criterion
Also, note that the variance of the outcomes in your step 2 Type=complex analysis is the total variance, so also including the 0.11. This would make your standardized loadings smaller than the true within-level loadings.
Jas Wer posted on Sunday, October 09, 2016 - 8:45 pm
I have a family sample, with two children per family. Using this data I've conducted analyses to see if differences between siblings in X predict differences in Y. Because the number of sibling pairs who differ in their Y is quite small for some Ys , I would like to estimate the power to detect a within-family effect on Y. I'm new to multilevel and Monte Carlo analyses and trying to write an Mplus syntax to estimate power, however the User's Guide examples seem almost too complex for this application. Could you point me to some answers for these questions:
1) My X is measured per child, so could be modeled as a between or within predictor. Since I'm only interested in within-effects, is it correct to specify within=X in the MONTE CARLO subcommand and %within% Y on X; in the MODEL subcommand, simply leaving out any "between" commands/models?
2) In the MODEL POPULATION section, I'm unsure which model parameters to use for X. I seem to need to specify "%within%"before the model parameters, so I think I can't just use the mean and variance across the whole sample. Which parameters do I use here?
3) Some of my outcomes are binary, others continuous. The binary outcomes are not always 50%/50% but also 90%/10% which I understand influences power. Is there a way to specify this, other than generate=Y(1); categorical=Y; ?
Why not take the simpler approach of a multivariate model instead of a multilevel model? That is, you have 2 Xs and 2 Ys, one for each child.
You can use 3 sources for guidance. First, each UG example has a Monte Carlo version on our website. Second, you can study the article on our website:
Muthťn, L.K. & Muthťn, B.O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 4, 599-620. Mplus inputs and outputs used in this paper can be viewed and/or downloaded from the Examples page. download paper contact first author show abstract
Regarding your question 3), you can specify [y$1*c] where c is the value of the threshold that gives you the percentage you want.
Ryan Snead posted on Monday, August 21, 2017 - 10:38 am
I am running a post hoc power analysis on a path model using complex survey data. When I run the path model for my analysis I use TYPE = COMPLEX. My Monte Carlo syntax specifies the correct NCSIZES/ CSIZES, but gives me an error for TYPE = COMPLEX and says to use TWOLEVEL. Using the latter type requires me to insert between/within syntax that I don't think is relevant for my model. Is there a different method you would recommend that still allows me to account for clustering in the data?
Thanks for your help in advance! I attended the Wednesday workshop at Hopkins last week. It was very helpful!
Clustered data are generated using TWOLEVEL even though you want to analyze the model using COMPLEX. Examples 12.6 shows this two-step procedure for a growth model. You would do it for a path model.
Ryan Snead posted on Tuesday, August 22, 2017 - 7:31 am
Thank your for your quick response and helpful recommendation. I have two follow up questions. 1) Our sample is multilevel because the design links mothers to providers. Is it necessary to specify between/within slope and intercept like in the 12.6 example? Is what I have come up with satisfactory given only one level 2 variable (x1)? 2) If I wanted to view the results given a larger sample, say 100 to 400, would I need to proportionally increase my CSIZES in this first step?
Yes, you need to proportionally increase CSIZES. Note that you have very small cluster sizes of 1, 2, 3, and 4 subjects. And you are estimating about 10 parameters within cluster - that is, the small cluster sizes can lead to poor estimates.
Regarding the model, I note that y2 is not on the Within list while at the same time it isn't mentioned on Between.
Ryan Snead posted on Thursday, August 24, 2017 - 6:45 am
Hello, I used Monte Carlo simulation to determine the power of multiple linear regression with 1 continuous dependent variable, 3 continuous independent variables and their interaction (4 interaction terms) (sample N=100). Since I did not find a syntax to generate interaction terms with observed variables, I generated them assuming that the 3 independent variables are latent (3 indicators each). Then, I saved the samples generated and analyzed the data treating the latent factors and their interactions as observed variables (using the DEFINE syntax). To minimize the measurement error when the latent factors are treated as observed, each factor loading is 0.95 (residual variance 0.09). Do you see any problem with this approach? Thanks!
MONTECARLO: NAMES ARE Y X1-X9; NOBSERVATIONS = 100; NREPS = 100; MODEL POPULATION: f1 by x1-x3*0.95; f2 by x4-x6*0.95; f3 by x7-x9*0.95; x1-x9*0.09; f1 f2 with f3*.40; f1 with f2*.40; f1-f3@1; int1 | f1 xwith f2; int2 | f3 xwith f2; int3 | f1 xwith f3; int4 | int1 xwith f3; Y on f1*0.20 f2*0.20 f3*0.20 int1*0.20 int2*0.20 int3*0.20 int4*0.20; Y*.72;
A simpler way is to do a 2-step Monte Carlo run. Step 1: generate all needed variables and save the data sets. Step 2: Do an external Monte Carlo run which uses Define to create the variables. See the V8 UG ex 12.16 for how to do a 2-step Monte Carlo.
You find the UG ex 12.6 in the V8 UG that is posted on our website. This shows what step 1 and step means and how they are carried out. In the second step, you would use the Define command to created the interaction. for instance:
inter = x*y;
Kelly S posted on Monday, April 02, 2018 - 4:59 pm
I am using multilevel sem and trying to check for mediation. Mplus suggests using Monte Carlo simulation for my hypothesised model (assuming due to the number of degrees of freedom). However, no modification indices are available for Monte Carlo and multilevel analysis. How can I check my model's goodness of fit? At the moment I only get AIC and BIC, which are not easily interpreted... how can I check RMSEA, SMMR, CFI and TLI? Any ideas?
Perhaps you are referring to Monte Carlo integration when using ML. If fit indices are not given it is because the usual sufficient statistics are not relevant in your case. You can always work with neighboring models that have fewer restrictions (e.g. more paths) and test the significance of those extra parameters.
Iím running a multigroup factor analysis on national data will cluster, weight, and stratum variables. Given a small sample size of our minority group (n=126 for minority, n=400 for majority), we wanted to generate new data using montecarlo simulation that can be used for additional analysis. Ran into a few problems. Please see below
Multigroup factor analysis on original data ran successfully. Sample code used, which ran correctly. Estimates produced.
Iím not lost on how to proceed with generating new data using montecarlo simulation. Based on the book, I entered the following code, but Iím not able to run the analysis. I looked at the file that I created, and it only has 12 data columns, so Iím not sure whats actually going on here. Thank you for your time and I look forward to your response. MONTECARLO: NAMES ARE ID WT STRAT CLUS RACE U1-U13; NOBSERVATIONS = 500 500; NREPS = 500; SEED = 12345; NGROUPS=2; POPULATION = NAME.dat; COVERAGE = NAME.dat; MODEL POPULATION: F1 - F3 BY U1-U13 (*1); [F1-F3@0];
MODEL POPULATION-g2: F1 - F3 BY RU1-U13 (*1);
MODEL: F1 BY U1-U2 (*.35); F2 BY U3-U5 (*.40); F3 BY U6-U10 (*.45);
F1 BY U1-U2 (*.35); F2 BY U3-U5 (*.40); F3 BY U6-U10 (*.45);
Thank you for your response. I tried something different to make it easier for me understand how to go about doing this. Im trying to follow example 12.7 in the book, but ran an ESEM. Given that I'm using national data, strata, weight, and cluster variables are included. The data are also categorical.
The model computed and I saved the estimated using the SAVEDATA: ESTIMATES= command. This model contained 9 variables. f1-f3 by y1-y9 (*1);
However, my Montecarlo simulation following Example 12.7 step two using ESEM is giving me the following error statement: The COVERAGE file contains too much data. Check your data. File: MONTEGRA.dat
I'm trying to calculate the power for a cross-classified model but I think there must be something I'm not understanding about the CSIZES and NOBSERVATIONS options. I've run the model and level2a has 245 clusters across 5 sizes and level2b has 69 clusters across 15 sizes. I also have a total of 396 observations and 1 observation per cell.
I understood from the UG that when specifying CSIZES that the numbers outside the brackets should sum to level2b's clusters and the numbers inside the brackets should sum to level2a's clusters. And that the NCSIZES should describe the number of unique cluster sizes for each level. But I can't seem to get it to where the number of clusters are correct at the same time as the number of observations.
In this iteration the numbers inside the brackets sum to level2a's clusters, and the observations are correct, but the numbers outside the brackets sum to 15 when I thought they should sum to 69. Hopefully someone can spot the mistake I'm making.
When I run the following model and vary the # of clusters I'm finding that power for the dv1 ON fw paths do not change from 1 and the coverage is always above .91. This seems curious and highly unlikely to me especially since I even see this when I lower the number of clusters to 20. In fact, the only thing I see worsen is SRMR-between. Can you explain why this is? Is there something off in my code? Thanks.
MONTECARLO: NAMES ARE y1-y10 exp1 exp2 exp3 dv1 dv2; CSIZES=30(20); NOBS = 600; NREPS = 1000; SEED = 4533; NCSIZES =1; WITHIN ARE exp1 exp2 exp3 y1-y10; ANALYSIS: TYPE IS TWOLEVEL RANDOM;
MODEL POPULATION: %WITHIN% fW by y1*0.5 y2*0.5 y3*0.5 y4*0.5 y5*0.5 y6*0.5 y7*.50 y8*.50 y9*.50 y10*.50; fw@1; fw ON exp1*.20 exp2*.20 exp3*.20; dv1 ON fw*.3; dv2 ON fw*.3; exp1*0.60 exp2*0.60 exp3*0.60; y1-y10*.75; dv1-dv2*.91; %BETWEEN% dv1-dv2@.25;
MODEL: %WITHIN% fW by y1*0.5 y2*0.5 y3*0.5 y4*0.5 y5*0.5 y6*0.5 y7*.50 y8*.50 y9*.50 y10*.50; fw@1; fw ON exp1*.20 exp2*.20 exp3*.20; dv1 ON fw*.3; dv2 ON fw*.3; y1-y10*.75; dv1-dv2*.91; %BETWEEN% dv1-dv2@.25;
Thanks for the response. A few additional questions regarding your response...
1. Shouldn't the small number of obs per cluster results in lower power? 2. I fixed the dv1-dv2 at .25 because I wanted an ICC of .20 for the DVs, but I guess I should have freely estimated it. Is that correct? 3. I see your point about 0 ICC on the indicators. If I want those to also have an ICC of .20, would I take them off the WITHIN line and just list them on BETWEEN with a variance of .25 so the code looks like the following? If that's correct, I'm still getting similar results (very high power with small # of clusters) and I'm not sure why.
MODEL POPULATION: %WITHIN% fW by y1*0.5 y2*0.5 y3*0.5 y4*0.5 y5*0.5 y6*0.5 y7*.50 y8*.50 y9*.50 y10*.50; fw@1; fw ON exp1*.20 exp2*.20 exp3*.20; dv1 ON fw*.3; dv2 ON fw*.3; exp1*0.60 exp2*0.60 exp3*0.60; y1-y10*.75; dv1-dv2*.91; %BETWEEN% dv1-dv2*.25; y1-y10*.25;
MODEL: %WITHIN% fW by y1*0.5 y2*0.5 y3*0.5 y4*0.5 y5*0.5 y6*0.5 y7*.50 y8*.50 y9*.50 y10*.50; fw@1; fw ON exp1*.20 exp2*.20 exp3*.20; dv1 ON fw*.3; dv2 ON fw*.3; y1-y10*.75; dv1-dv2*.91; %BETWEEN% dv1-dv2*.25; y1-y10*.25;
1. 20 per cluster isn't that small when you consider that you have many clusters.
2. Yes, free them.
3. Looks right - you can send your output to Support along with your license number.
Sara Riese posted on Thursday, October 25, 2018 - 6:11 am
Hello, I am running a Monte Carlo simulation to estimate power for a multilevel path analysis with a binary outcome, using ESTIMATOR=MLR, LINK=LOGIT, and TYPE=TWOLEVEL.
I want to be sure I am entering the hypothesized associations correctly at each level. At the within-level, I am entering the hypothesized log odds of the outcome, while at the between-level it is the linear regression coefficient. Is that correct?
Sara Riese posted on Friday, October 26, 2018 - 10:40 am
Thank you! Now when I run the MC simulation, most of the population parameters are 0, not the value from the model population statement. Below is my code (i removed the mean/variance values for size), do you see what could be causing this?
If I have three between-subjects covariates (Categorical) that I wish to use in the model: race (with 4 levels), gender (with 3 levels), proactive personality (with 5 levels). How do I specify this with? Should I be using the ngroups or NCSIZES commands? I would like all those between-subjects variables present in the model. I would like all the groups to be balanced with the same number of participants. Thank you.
Sorry for the beginner questions but: how, then, do I account for the fact that there is only one ngroups statement but there are several factors (3)? Do I just use the ngroups statement 3 times, once for each covariate? Or, multiple all my levels (5x4x4) and thus have 80 groups (ngroups=80)? And then how do would I code the model population statements after this? Thank a lot for your assistance.
I am trying a montecarlo simulation with a two level model, I have tried to give all the variance I can think of a starting value, I still get the fatal error. Could you give me some suggestions please? Thank you very much.
model population: %within% unhappy@1stress@1; cwb on unhappy*0.6 stress*0.3; s|unhappy on stress; cwb*1; %between% mindful@1; s on mindful*0.3; [s*0.3]; s*0.1; unhappy on mindful*0.2; unhappy*0.2; unhappy with s*0.2;
model: %within% cwb on unhappy(b) stress; s|unhappy on stress; %between% s on mindful(a1); [s](a0); unhappy on mindful; unhappy with s;
*** FATAL ERROR THE POPULATION COVARIANCE MATRIX THAT YOU GAVE AS INPUT IS NOT POSITIVE DEFINITE AS IT SHOULD BE.
for the general definition. If you have only variances on the between level then SRMR on the between level would be an approximate test for the covariances=0. If you have covariances as well then it would be a "test/check" whether potential misspecifications on the within level have distorted the within-between split.