Message/Author 

Anonymous posted on Friday, January 28, 2005  6:44 am



Hello, I am trying to use the method described in your aricle "How To Use A Monte Carlo Study To Decide on Sample Size and Determine Power" dated April 9, 2002 to estimate the power for a 2factor CFA. I understand the examples for the normally distributed variables, but am having difficulty with the nonnormal examples. Specifically, I am not sure why the residual variance for the 2nd factor (which contains the nonnormal data) is changed from .36 (as it was in the example for normally distributed data) to 9. I'm also unsure why the factor correlation in the MODEL MONTECARLO statement is .95 and in the MODEL statement is .20. Shouldn't these be the same? I'd appreciate any help. Thank you! 


I will have to look into this after I return to the US  after February 2. 

bmuthen posted on Friday, February 04, 2005  2:20 pm



See the "second step" on page 603 of the article. The modelmontecarlo values residual variance and factor covariance values are the 2class values needed to get the 1class reliability of 0.64 and the 1class factor correlation desired. 

Anonymous posted on Monday, February 28, 2005  3:44 pm



Dear Dr. Muthen, I tried to follow "How To Use A Monte Carlo Study To Decide on Sample Size and Determine Power" for MIMIC model, but could not make it right. I copied exactly the same code from the paper for CFA, and only added "f1f2 on var*0.3;" as the last line in both :model montecarlo" and "model" and try to see what I can get (hoping to test the hypothesis that the structure is invarient with the covariate "var", I made up the number 0.3), but I got error message "** FATAL ERROR A POPULATION VARIANCE FOR A COVARIATE IS ZERO." May I ask what should I code here? Thanks you very much! PS: I have to comment %overall% as I kept getting error message with this line. montecarlo: names are y1y10 var; nobservations = 400; nreps = 10000; seed = 53487; save = test.sav; variable: names are y1y10 var; categorical are var; analysis: type = general; estimator = ML; model montecarlo: !%overall% f1 by y1y5*0.8; f2 by y6y10*0.8; f1@1 f2@1; y1y10*0.36; f1 with f2*0.25; f1f2 on hist*0.3; model: !%overall% f1 by y1y5*0.8; f2 by y6y10*0.8; f1@1 f2@1; y1y10*0.36; f1 with f2*0.25; f1f2 on hist*0.3; output: tech9; 

bmuthen posted on Monday, February 28, 2005  6:34 pm



Your statement f1f2 on var*0.3 implies that the variable "var" is an "x variable" in Mplus terms, i.e. a covariate as referred to in the error message. Such a variable must also be given a population variance value in the Model MonteCarlo paragraph, e.g. var@1; Leaving this out gives the default of 0 variance  this is chosen so that user's become aware of having to make a choice. The statement should not be included in the Model paragraph. 


I'm also having problems to understand the example with nonnormally distributed data in your your aricle "How To Use A Monte Carlo Study To Decide on Sample Size and Determine Power". I'm trying to adapt the example to my own cfamodel with nonnormally distributed data. The indicator variables of one of my factors have on average a skewness of 2.3 and and kurtosis of 5.7. Let's say I use the same proportion of indivdiuals in the normal and outlierclass as in your example. Which mean and variance do I have to take for the outlier class to get the desired skewness and kurtosis generated for the population? Is there a formula to calculate these two numbers? 


There is no formula. I believe it is a matter of trial and error. 


Thanks! But how do I check the resulting skewness and kurtosis? 


Generate one replication for 10,000 observations, save the data, and compute skewness and kurtosis using one class and TECH12. 


Thanks! It worked out very well, but now I have still another problem. I tried to simulate a skewness of 3.12 and a kurtosis of 9.91 in one factor, since this is what I find in my raw data. I tried several combinations of means and variances in the outlier class, but only found the right combination by decreasing the proportion of individuels in the outlier class as well. In another factor, I want to simulate a skewness of 1.27 and a kurtosis of 0.9, but with the oulier proportion I set for the first factor, it seems not to be possible. I would have to increase the proportion of individuals in the oulier class to find the right combination of mean and variance of the second factor in the outlier class. How can I escape this dilemma? Another problem is, that the correlations between the nonnormal distributed factor and other factors is decreased quite a lot. Even by setting the factor correlation to 0.95 in the model population, I can not simulate the original size of the factor correlation, that I had in my raw data. Any solutions or thoughts are highly appreciated. Thanks! 


You could try having a separate latent class variable influencing each of your factors to have more flexibility. One point is that you say you saw a certain degree of nonnormality in your raw data. I assume you refer to the factor scores estimated for each person from the posterior distribution. Note that this does not necessarily give the true factor distribution, but could show a skewed distribution merely because there are no items covering one end of the factor dimension. In other words, data generated by a normal factor and such asymmetric items will give a skewed posterior. 


Thanks. Actually, I just estimated the factor skewness and kurtosis by averaging the skewness and kurtosis of the items loading on that factor. Since you mentioned skewness and kurtosis of the factor scores, I looked at that too and it's quite close to my previous estimates. I followed your advice and introduced a second latent class variable. Unfortunately, whenever I introduce it, I get the error message "unknown class label in model population %C#1%". Can you tell me, what I'm doing wrong? 


That would have to be looked at by support@statmodel.com. Send input, output, data, and license number. 


Although I can simulate the same degree of kurtosis and skewness as in my raw data by generating 1 or 2 latent class variables with 2 latent classes and analysing them as one, the resulting distribution seems to be quite different from my raw data. I generated one replication for 10'000 observations, saved the data, imported it into SPSS and generated histograms. I compared the resulting histograms with my raw data and figured out, that the distribution is totally different, although skewness and kurtosis is about the same. In my raw data, nonnormality is primarily the result of censoring from below or zeroinflation. Some variables even have a piling up on both ends. The piling up on the lower end is quite extreme. All variables have at least 10% zero cases, some variables have up to 70% zero cases. The variables are from visual analog scales with possible values of 0 to 100. I already tried to generate censored dependent variables in my cfamodel and then estimate the model, as if the variables were normally distributed, but It seems not to be allowed in Mplus. My idea was to estimate the bias of not taking zeroinflation into account. Do you see another possibility to do that? Anyway, I guess I have to go back to EFA and CFA taking zeroinflation and censoring into account from the beginning. 


The problem is also that some variables have a piling up on both ends and I read that you can only estimate variables with censoring at one end. How would you treat such variables? The problem is also that my questionnaire has 66items. It seems to be impossible to do an EFA with so many censored variables, since it needs numerical integration. 


It sounds like you want to generate censored data. See the Monte Carlo counterpart of Example 5.4. Mplus can only have censoring at one end. Numerical integration is influenced by the number of factors not the number of variables. 


Thank you. I don't think, I can use numerical integration, because I have to extract more than 4 factors. Do you think, it would be feasible to categorize the variables, so I can do my EFA with categorical variables and WLSMV? If so, how many categories would you suggest? I'm also not quite sure, whether I can treat my variables as censored from below, because the zero values are probably true zero values. As I've said earlier, the variables are visual analogue scales (with values from 0 to 100) from a questionnaire that measures altered states of consciousness. The lower end of every scale is labeled as "not more than usual" and the upper end as "much more than usual". Subjects are asked to rate certain aspects of altered states of consiousness and how much it changed compared to their usual state of consciousness. I suppose, one can only totally negate any change in consciousness and therefore values below zero wouldn't make any sense. I read that it's possible to model this kind of zeroinflation with a 2 part model, but so far, I only saw examples in the growth modeling literature for this. Has anybody ever done this in the test construction field? I know, that it's possible to generate censored variables with montecarlo, but it seems not to be allowed to analyse the same variables as uncensored. 


See Kim and Muthen 2008 on the website. With external Monte Carlo, you can generate censored and analyze continuous. See Example 11.6 for external Monte Carlo. 


I am trying to estimate the power of a simple one factor congeneric model. The reason behind my analysis is that excessive power due to large sample size may lead to the rejection of the proposed model. I have one latent with 7 observed variables. The indicators are ordinal hence I would like to use the WLSM estimator. Considering that my analysis is post hoc since I already have data, how do I estimate the power of such a model? Thx in adance Andrea 


The power estimation of Mplus' Monte Carlo facility concerns a single parameter, but it sounds like you are instead concerned with overall model fit measures such as chisquare. In those cases, I prefer a sensitivity analysis where you start with your initial model and use Modindices to relax the restrictions of the model until chisquare is acceptable. Then consider if this new model has changed the parameters of the initial model in substantively important ways. If not, then the test is overpowered in this sense. 


Thx Bengt. Is there any way to estimate the power using such a method or do I need to run the Mplus' Monte Carlo facility for each parameter? 


There is the power paper related to model fit MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1, 130149. 


Thx again. I had a look at the paper and in order to undertake such a procedure one needs to run a few SAS commands and SAS is not my forte. I then wonder if I could use the Mplus' Monte Carlo facility. Given that I already have the data do I need to estimate the implied population covariance matrix using the WLSM estimator first? And what then? I have read your paper Muthen and Muthen (2002, pubished in Structural Equation Modeling) but only synthetic data is used in that paper. 


The M & M (2002) paper is about power for a single parameter  so using the Mplus Monte Carlo facility. It sounded like you were instead interested in power of the test of overall model fit which is the MacCallum et al topic. If the MacCallum approach can be done in SAS, it should be possible to translate that into being done in Mplus. I can't recall the procedures in that paper, so I can't say off hand how it would be done. It is easy in Mplus to produce the implied population covariance matrix by fixing every parameter and asking for Residual output. 


Many thanks Bengt. It would be great if the next release of Mplus could embed the MacCallum et al's power estimation for both retrospective and prospective models. I think it would add a lot of value to the product. How's that for a marketer's speech ? 


We can certainly add it to our list. 


By the way, after some consultation on SEMNet it would appear that the method used by MacCallum et al. (1996) to computing power can only be applied to normal theory with continuous observed variables and a covariance matrix as input. Such a procedure involves the noncentral chisquare distribution (which is linked to RMSEA's distribution for the test of notclosefit). But with ordinal variables, polychoric correlations, and estimators such as i.e., WLSM, WLSMV or even WLS I wonder whether sure this method will yield the correct results. I was not able to find any literature in that regard. My gut feeling is that such literature is yet to be written. Could you please suggest any alternative method to estimate model power using estimators such as WLSM or WLSMV? Thanks in advance. 


This sounds like a methods research topic. We are not aware of any such methods. 

Lois Downey posted on Tuesday, March 24, 2009  8:17 pm



I want to use the Monte Carlo facility to assess power for several sample sizes for analyzing a 16indicator 4factor CFA model, in which the indicators are generated as 10category ordinal variables, but analyzed as continuous, using the MLR estimator. I've included the following two options in the MONTECARLO command: GENERATE = y1y16 (9 p); CATEGORICAL = y1y16; It was my understanding that y1y16 would be initially generated as normally distributed with mean=0 and variance=1, and that in order to recode them into 10 categories, I should specify the 9 thresholds as part of the MODEL POPULATION command. Therefore, I entered 9 statements similar to the following to define the 9 thresholds: [y1$1 y2$1 y3$1 y4$1 y5$1 y6$1 y7$1 y8$1 y9$1 y10$1 y11$1 y12$1 y13$1 y14$1 y15$1 y16$1](1.75); This produces error messages like the following for each threshold: *** ERROR in MODEL POPULATION command True value must be specified for [ Y1$1 ]. Where have I gone wrong? And a second question: The choice of 1.75 for the first threshold was based on a desire to have approximately 4% of the sample falling in the lowest category. Is this the correct method for determining the appropriate threshold for each category, given a goal of producing negatively skewed samples with a specific shape? Thanks. 


You need to give a population value for each threshold. 

Lois Downey posted on Wednesday, March 25, 2009  11:10 am



The full set of thresholds I specified are as follows: [y1$1 y2$1 y3$1 y4$1 y5$1 y6$1 y7$1 y8$1 y9$1 y10$1 y11$1 y12$1 y13$1 y14$1 y15$1 y16$1](1.75); [y1$2 y2$2 y3$2 y4$2 y5$2 y6$2 y7$2 y8$2 y9$2 y10$2 y11$2 y12$2 y13$2 y14$2 y15$2 y16$2](1.41); [y1$3 y2$3 y3$3 y4$3 y5$3 y6$3 y7$3 y8$3 y9$3 y10$3 y11$3 y12$3 y13$3 y14$3 y15$3 y16$3](1.17); [y1$4 y2$4 y3$4 y4$4 y5$4 y6$4 y7$4 y8$4 y9$4 y10$4 y11$4 y12$4 y13$4 y14$4 y15$4 y16$4](1.00); [y1$5 y2$5 y3$5 y4$5 y5$5 y6$5 y7$5 y8$5 y9$5 y10$5 y11$5 y12$5 y13$5 y14$5 y15$5 y16$5](0.50); [y1$6 y2$6 y3$6 y4$6 y5$6 y6$6 y7$6 y8$6 y9$6 y10$6 y11$6 y12$6 y13$6 y14$6 y15$6 y16$6](0.20); [y1$7 y2$7 y3$7 y4$7 y5$7 y6$7 y7$7 y8$7 y9$7 y10$7 y11$7 y12$7 y13$7 y14$7 y15$7 y16$7](0.08); [y1$8 y2$8 y3$8 y4$8 y5$8 y6$8 y7$8 y8$8 y9$8 y10$8 y11$8 y12$8 y13$8 y14$8 y15$8 y16$8](0.39); [y1$9 y2$9 y3$9 y4$9 y5$9 y6$9 y7$9 y8$9 y9$9 y10$9 y11$9 y12$9 y13$9 y14$9 y15$9 y16$9](0.84); Are you saying that I cannot indicate, say, the 8th threshold for all 16 indicators in one statement, but need to specify each threshold for EACH indicator in a separate statement? 


The assignment is to the last entry unless the variables are part of a list, for example, y1$1y16$1; 

ywang posted on Tuesday, April 20, 2010  12:50 pm



Hello, Dr. Muthens, I have two questions about Monte Carlo simulation study. In example 11.1. Monte Carlo Simulation study for a CFA with covariates with continuous factor indicators and patterns of missing data (user's guide), the variance for y1y5 are listed in Model population (y1y4*.5), but the means for y1y4 are not specified. Does this mean that all of the means for y1y5 are 0? The second question is about @ and *. Can you verify whether @ and * are exchangeable in Monte Carlo simulation study? If so, are they exchangeable for part of "model population" or both part of "model population" and part of "model"? Thanks a lot! 


Any parameter not specified has a population parameter value of zero. The @ and * symbols are interchangeable in MODEL POPULATION but not in MODEL. 

ywang posted on Tuesday, April 20, 2010  8:52 pm



Thank you very much for the reply. I have further questions for Monte Carlo study. For Monte Carlo simulation for path analysis with continuous dependent variables(example 3.11), the following is the input: MODEL POPULATION: [x1x3@0]; x1x3@1; y1 ON x1*1 x2*2 x3*3; y2 ON x1*3 x2*2 x3*1; y3 ON y1*.5 y2*.75 x2*1; [y1*1 y2*0 y3*1]; y1*1 y2*1.5 y3*2; I cannot figure why the variance of y1 is 17, the variance of y2 is 17 and y3 is 32 in the generated dataset. Why aren't they 1, 1.5 or 2 as specified in the Model population command (y1*1, y2*1.5, y3*2)? Do I miss anything? Thanks again! 


The population values given in MODEL POPULATION are residual variances not variances. It is a conditional model. 

Back to top 