Power Analysis in CFA
Message/Author
 Anonymous posted on Friday, January 28, 2005 - 6:44 am
Hello, I am trying to use the method described in your aricle "How To Use A Monte Carlo Study To Decide on Sample Size and Determine Power" dated April 9, 2002 to estimate the power for a 2-factor CFA. I understand the examples for the normally distributed variables, but am having difficulty with the non-normal examples. Specifically, I am not sure why the residual variance for the 2nd factor (which contains the non-normal data) is changed from .36 (as it was in the example for normally distributed data) to 9. I'm also unsure why the factor correlation in the MODEL MONTECARLO statement is .95 and in the MODEL statement is .20. Shouldn't these be the same? I'd appreciate any help. Thank you!
 Bengt O. Muthen posted on Friday, January 28, 2005 - 2:45 pm
I will have to look into this after I return to the US -- after February 2.
 bmuthen posted on Friday, February 04, 2005 - 2:20 pm
See the "second step" on page 603 of the article. The modelmontecarlo values residual variance and factor covariance values are the 2-class values needed to get the 1-class reliability of 0.64 and the 1-class factor correlation desired.
 Anonymous posted on Monday, February 28, 2005 - 3:44 pm
Dear Dr. Muthen,

I tried to follow "How To Use A Monte Carlo Study To Decide on Sample Size and Determine Power" for MIMIC model, but could not make it right. I copied exactly the same code from the paper for CFA, and only added "f1-f2 on var*0.3;" as the last line in both :model montecarlo" and "model" and try to see what I can get (hoping to test the hypothesis that the structure is invarient with the covariate "var", I made up the number 0.3), but I got error message "** FATAL ERROR A POPULATION VARIANCE FOR A COVARIATE IS ZERO." May I ask what should I code here? Thanks you very much!

PS: I have to comment %overall% as I kept getting error message with this line.

montecarlo: names are y1-y10 var;
nobservations = 400;
nreps = 10000;
seed = 53487;
save = test.sav;

variable: names are y1-y10 var;
categorical are var;

analysis: type = general;
estimator = ML;

model montecarlo: !%overall%
f1 by y1-y5*0.8;
f2 by y6-y10*0.8;
f1@1 f2@1;
y1-y10*0.36;
f1 with f2*0.25;
f1-f2 on hist*0.3;
model: !%overall%
f1 by y1-y5*0.8;
f2 by y6-y10*0.8;
f1@1 f2@1;
y1-y10*0.36;
f1 with f2*0.25;
f1-f2 on hist*0.3;

output: tech9;
 bmuthen posted on Monday, February 28, 2005 - 6:34 pm

f1-f2 on var*0.3

implies that the variable "var" is an "x variable" in Mplus terms, i.e. a covariate as referred to in the error message. Such a variable must also be given a population variance value in the Model MonteCarlo paragraph, e.g.

var@1;

Leaving this out gives the default of 0 variance - this is chosen so that user's become aware of having to make a choice. The statement should not be included in the Model paragraph.
 Erich Studerus posted on Wednesday, July 02, 2008 - 9:19 am
I'm also having problems to understand the example with non-normally distributed data in your your aricle "How To Use A Monte Carlo Study To Decide on Sample Size and Determine Power". I'm trying to adapt the example to my own cfa-model with non-normally distributed data.

The indicator variables of one of my factors have on average a skewness of 2.3 and and kurtosis of 5.7. Let's say I use the same proportion of indivdiuals in the normal and outlier-class as in your example. Which mean and variance do I have to take for the outlier class to get the desired skewness and kurtosis generated for the population? Is there a formula to calculate these two numbers?
 Linda K. Muthen posted on Wednesday, July 02, 2008 - 6:08 pm
There is no formula. I believe it is a matter of trial and error.
 Erich Studerus posted on Thursday, July 03, 2008 - 1:28 am
Thanks! But how do I check the resulting skewness and kurtosis?
 Linda K. Muthen posted on Thursday, July 03, 2008 - 4:08 pm
Generate one replication for 10,000 observations, save the data, and compute skewness and kurtosis using one class and TECH12.
 Erich Studerus posted on Monday, July 07, 2008 - 2:29 am
Thanks! It worked out very well, but now I have still another problem. I tried to simulate a skewness of 3.12 and a kurtosis of 9.91 in one factor, since this is what I find in my raw data. I tried several combinations of means and variances in the outlier class, but only found the right combination by decreasing the proportion of individuels in the outlier class as well. In another factor, I want to simulate a skewness of 1.27 and a kurtosis of 0.9, but with the oulier proportion I set for the first factor, it seems not to be possible. I would have to increase the proportion of individuals in the oulier class to find the right combination of mean and variance of the second factor in the outlier class. How can I escape this dilemma?

Another problem is, that the correlations between the non-normal distributed factor and other factors is decreased quite a lot. Even by setting the factor correlation to 0.95 in the model population, I can not simulate the original size of the factor correlation, that I had in my raw data.

Any solutions or thoughts are highly appreciated. Thanks!
 Bengt O. Muthen posted on Monday, July 07, 2008 - 9:43 am
You could try having a separate latent class variable influencing each of your factors to have more flexibility.

One point is that you say you saw a certain degree of non-normality in your raw data. I assume you refer to the factor scores estimated for each person from the posterior distribution. Note that this does not necessarily give the true factor distribution, but could show a skewed distribution merely because there are no items covering one end of the factor dimension. In other words, data generated by a normal factor and such asymmetric items will give a skewed posterior.
 Erich Studerus posted on Monday, July 07, 2008 - 1:25 pm
Thanks. Actually, I just estimated the factor skewness and kurtosis by averaging the skewness and kurtosis of the items loading on that factor. Since you mentioned skewness and kurtosis of the factor scores, I looked at that too and it's quite close to my previous estimates.

I followed your advice and introduced a second latent class variable. Unfortunately, whenever I introduce it, I get the error message "unknown class label in model population %C#1%". Can you tell me, what I'm doing wrong?
 Bengt O. Muthen posted on Monday, July 07, 2008 - 3:36 pm
That would have to be looked at by support@statmodel.com. Send input, output, data, and license number.
 Erich Studerus posted on Monday, July 14, 2008 - 3:41 pm
Although I can simulate the same degree of kurtosis and skewness as in my raw data by generating 1 or 2 latent class variables with 2 latent classes and analysing them as one, the resulting distribution seems to be quite different from my raw data. I generated one replication for 10'000 observations, saved the data, imported it into SPSS and generated histograms. I compared the resulting histograms with my raw data and figured out, that the distribution is totally different, although skewness and kurtosis is about the same.

In my raw data, non-normality is primarily the result of censoring from below or zero-inflation. Some variables even have a piling up on both ends. The piling up on the lower end is quite extreme. All variables have at least 10% zero cases, some variables have up to 70% zero cases. The variables are from visual analog scales with possible values of 0 to 100.

I already tried to generate censored dependent variables in my cfa-model and then estimate the model, as if the variables were normally distributed, but It seems not to be allowed in Mplus. My idea was to estimate the bias of not taking zero-inflation into account. Do you see another possibility to do that?

Anyway, I guess I have to go back to EFA and CFA taking zero-inflation and censoring into account from the beginning.
 Erich Studerus posted on Monday, July 14, 2008 - 3:45 pm
The problem is also that some variables have a piling up on both ends and I read that you can only estimate variables with censoring at one end. How would you treat such variables? The problem is also that my questionnaire has 66items. It seems to be impossible to do an EFA with so many censored variables, since it needs numerical integration.
 Linda K. Muthen posted on Tuesday, July 15, 2008 - 11:24 am
It sounds like you want to generate censored data. See the Monte Carlo counterpart of Example 5.4.

Mplus can only have censoring at one end.

Numerical integration is influenced by the number of factors not the number of variables.
 Erich Studerus posted on Tuesday, July 15, 2008 - 3:06 pm
Thank you. I don't think, I can use numerical integration, because I have to extract more than 4 factors. Do you think, it would be feasible to categorize the variables, so I can do my EFA with categorical variables and WLSMV? If so, how many categories would you suggest?

I'm also not quite sure, whether I can treat my variables as censored from below, because the zero values are probably true zero values. As I've said earlier, the variables are visual analogue scales (with values from 0 to 100) from a questionnaire that measures altered states of consciousness. The lower end of every scale is labeled as "not more than usual" and the upper end as "much more than usual". Subjects are asked to rate certain aspects of altered states of consiousness and how much it changed compared to their usual state of consciousness. I suppose, one can only totally negate any change in consciousness and therefore values below zero wouldn't make any sense. I read that it's possible to model this kind of zero-inflation with a 2 part model, but so far, I only saw examples in the growth modeling literature for this. Has anybody ever done this in the test construction field?

I know, that it's possible to generate censored variables with montecarlo, but it seems not to be allowed to analyse the same variables as uncensored.
 Linda K. Muthen posted on Wednesday, July 16, 2008 - 3:11 pm
See Kim and Muthen 2008 on the website.

With external Monte Carlo, you can generate censored and analyze continuous. See Example 11.6 for external Monte Carlo.
 Andrea Vocino posted on Thursday, August 14, 2008 - 6:30 am
I am trying to estimate the power of a simple one factor congeneric model. The reason behind my analysis is that excessive power due to large sample size may lead to the rejection of the proposed model.

I have one latent with 7 observed variables. The indicators are ordinal hence I would like to use the WLSM estimator. Considering that my analysis is post hoc since I already have data, how do I estimate the power of such a model?

Andrea
 Bengt O. Muthen posted on Thursday, August 14, 2008 - 8:33 am
The power estimation of Mplus' Monte Carlo facility concerns a single parameter, but it sounds like you are instead concerned with overall model fit measures such as chi-square. In those cases, I prefer a sensitivity analysis where you start with your initial model and use Modindices to relax the restrictions of the model until chi-square is acceptable. Then consider if this new model has changed the parameters of the initial model in substantively important ways. If not, then the test is overpowered in this sense.
 Andrea Vocino posted on Thursday, August 14, 2008 - 4:46 pm
Thx Bengt. Is there any way to estimate the power using such a method or do I need to run the Mplus' Monte Carlo facility for each parameter?
 Bengt O. Muthen posted on Thursday, August 14, 2008 - 5:16 pm
There is the power paper related to model fit

MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1, 130-149.
 Andrea Vocino posted on Friday, August 15, 2008 - 11:09 pm
Thx again. I had a look at the paper and in order to undertake such a procedure one needs to run a few SAS commands and SAS is not my forte.

I then wonder if I could use the Mplus' Monte Carlo facility. Given that I already have the data do I need to estimate the implied population covariance matrix using the WLSM estimator first? And what then?

I have read your paper Muthen and Muthen (2002, pubished in Structural Equation Modeling) but only synthetic data is used in that paper.
 Bengt O. Muthen posted on Saturday, August 16, 2008 - 9:39 am
The M & M (2002) paper is about power for a single parameter - so using the Mplus Monte Carlo facility. It sounded like you were instead interested in power of the test of overall model fit which is the MacCallum et al topic. If the MacCallum approach can be done in SAS, it should be possible to translate that into being done in Mplus. I can't recall the procedures in that paper, so I can't say off hand how it would be done. It is easy in Mplus to produce the implied population covariance matrix by fixing every parameter and asking for Residual output.
 Andrea Vocino posted on Tuesday, August 19, 2008 - 6:48 pm
Many thanks Bengt. It would be great if the next release of Mplus could embed the MacCallum et al's power estimation for both retrospective and prospective models. I think it would add a lot of value to the product. How's that for a marketer's speech ?
 Linda K. Muthen posted on Wednesday, August 20, 2008 - 9:34 am
We can certainly add it to our list.
 Andrea Vocino posted on Sunday, August 24, 2008 - 9:10 pm
By the way, after some consultation on SEMNet it would appear that the method used by MacCallum et al. (1996) to computing power can only be applied to normal theory with continuous observed variables and a covariance matrix as input. Such a procedure involves the non-central chi-square distribution (which is linked to RMSEA's distribution for the test of not-close-fit). But with ordinal variables, polychoric correlations, and estimators such as i.e., WLSM, WLSMV or even WLS I wonder whether sure this method will yield the correct results. I was not able to find any literature in that regard. My gut feeling is that such literature is yet to be written. Could you please suggest any alternative method to estimate model power using estimators such as WLSM or WLSMV? Thanks in advance.
 Linda K. Muthen posted on Monday, August 25, 2008 - 8:23 am
This sounds like a methods research topic. We are not aware of any such methods.
 Lois Downey posted on Tuesday, March 24, 2009 - 8:17 pm
I want to use the Monte Carlo facility to assess power for several sample sizes for analyzing a 16-indicator 4-factor CFA model, in which the indicators are generated as 10-category ordinal variables, but analyzed as continuous, using the MLR estimator.

I've included the following two options in the MONTECARLO command:
GENERATE = y1-y16 (9 p);
CATEGORICAL = y1-y16;

It was my understanding that y1-y16 would be initially generated as normally distributed with mean=0 and variance=1, and that in order to recode them into 10 categories, I should specify the 9 thresholds as part of the MODEL POPULATION command. Therefore, I entered 9 statements similar to the following to define the 9 thresholds:
[y1\$1 y2\$1 y3\$1 y4\$1 y5\$1 y6\$1 y7\$1 y8\$1 y9\$1 y10\$1 y11\$1 y12\$1 y13\$1 y14\$1 y15\$1 y16\$1](-1.75);

This produces error messages like the following for each threshold:
*** ERROR in MODEL POPULATION command
True value must be specified for [ Y1\$1 ].

Where have I gone wrong?

And a second question: The choice of -1.75 for the first threshold was based on a desire to have approximately 4% of the sample falling in the lowest category. Is this the correct method for determining the appropriate threshold for each category, given a goal of producing negatively skewed samples with a specific shape?

Thanks.
 Linda K. Muthen posted on Wednesday, March 25, 2009 - 10:32 am
You need to give a population value for each threshold.
 Lois Downey posted on Wednesday, March 25, 2009 - 11:10 am
The full set of thresholds I specified are as follows:
[y1\$1 y2\$1 y3\$1 y4\$1 y5\$1 y6\$1 y7\$1 y8\$1 y9\$1 y10\$1 y11\$1 y12\$1 y13\$1 y14\$1 y15\$1 y16\$1](-1.75);
[y1\$2 y2\$2 y3\$2 y4\$2 y5\$2 y6\$2 y7\$2 y8\$2 y9\$2 y10\$2 y11\$2 y12\$2 y13\$2 y14\$2 y15\$2 y16\$2](-1.41);
[y1\$3 y2\$3 y3\$3 y4\$3 y5\$3 y6\$3 y7\$3 y8\$3 y9\$3 y10\$3 y11\$3 y12\$3 y13\$3 y14\$3 y15\$3 y16\$3](-1.17);
[y1\$4 y2\$4 y3\$4 y4\$4 y5\$4 y6\$4 y7\$4 y8\$4 y9\$4 y10\$4 y11\$4 y12\$4 y13\$4 y14\$4 y15\$4 y16\$4](-1.00);
[y1\$5 y2\$5 y3\$5 y4\$5 y5\$5 y6\$5 y7\$5 y8\$5 y9\$5 y10\$5 y11\$5 y12\$5 y13\$5 y14\$5 y15\$5 y16\$5](-0.50);
[y1\$6 y2\$6 y3\$6 y4\$6 y5\$6 y6\$6 y7\$6 y8\$6 y9\$6 y10\$6 y11\$6 y12\$6 y13\$6 y14\$6 y15\$6 y16\$6](-0.20);
[y1\$7 y2\$7 y3\$7 y4\$7 y5\$7 y6\$7 y7\$7 y8\$7 y9\$7 y10\$7 y11\$7 y12\$7 y13\$7 y14\$7 y15\$7 y16\$7](0.08);
[y1\$8 y2\$8 y3\$8 y4\$8 y5\$8 y6\$8 y7\$8 y8\$8 y9\$8 y10\$8 y11\$8 y12\$8 y13\$8 y14\$8 y15\$8 y16\$8](0.39);
[y1\$9 y2\$9 y3\$9 y4\$9 y5\$9 y6\$9 y7\$9 y8\$9 y9\$9 y10\$9 y11\$9 y12\$9 y13\$9 y14\$9 y15\$9 y16\$9](0.84);

Are you saying that I cannot indicate, say, the 8th threshold for all 16 indicators in one statement, but need to specify each threshold for EACH indicator in a separate statement?
 Linda K. Muthen posted on Wednesday, March 25, 2009 - 11:19 am
The assignment is to the last entry unless the variables are part of a list, for example,

y1\$1-y16\$1;
 ywang posted on Tuesday, April 20, 2010 - 12:50 pm
Hello, Dr. Muthens, I have two questions about Monte Carlo simulation study. In example 11.1. Monte Carlo Simulation study for a CFA with covariates with continuous factor indicators and patterns of missing data (user's guide), the variance for y1-y5 are listed in Model population (y1-y4*.5), but the means for y1-y4 are not specified. Does this mean that all of the means for y1-y5 are 0?

The second question is about @ and *. Can you verify whether @ and * are exchangeable in Monte Carlo simulation study? If so, are they exchangeable for part of "model population" or both part of "model population" and part of "model"?

Thanks a lot!
 Linda K. Muthen posted on Tuesday, April 20, 2010 - 1:01 pm
Any parameter not specified has a population parameter value of zero.

The @ and * symbols are interchangeable in MODEL POPULATION but not in MODEL.
 ywang posted on Tuesday, April 20, 2010 - 8:52 pm
Thank you very much for the reply. I have further questions for Monte Carlo study. For Monte Carlo simulation for path analysis with continuous dependent variables(example 3.11), the following is the input:
MODEL POPULATION:
[x1-x3@0];
x1-x3@1;
y1 ON x1*1 x2*2 x3*3;
y2 ON x1*3 x2*2 x3*1;
y3 ON y1*.5 y2*.75 x2*1;
[y1*-1 y2*0 y3*1];
y1*1 y2*1.5 y3*2;

I cannot figure why the variance of y1 is 17, the variance of y2 is 17 and y3 is 32 in the generated dataset. Why aren't they 1, 1.5 or 2 as specified in the Model population command (y1*1, y2*1.5, y3*2)? Do I miss anything?

Thanks again!
 Linda K. Muthen posted on Wednesday, April 21, 2010 - 9:56 am
The population values given in MODEL POPULATION are residual variances not variances. It is a conditional model.
 Xuan Chen posted on Saturday, December 02, 2017 - 9:34 pm
Hi Dr Muthens,

I would like to compute the power for my cfa model, which is consisted of 25 zero-inflated categorical respond items. Do you have example in UG that I can follow to do that?

And could the power calculation be applied to MCFA?

Many many thanks

Xuan
 Bengt O. Muthen posted on Sunday, December 03, 2017 - 11:21 am
The UG does not have examples of zero-inflated categorical modeling. I assume you are not referring to counts. A high probability for the zero category is typically handled by ordinal DV modeling, but you can try to add zero-inflation by adding a mixture of two classes.
 Xuan Chen posted on Sunday, December 03, 2017 - 12:06 pm
It is not couting data. Only a three-likert scale with many 0 zeros. Do you have the UG example of power calculation for categorical data in CFA? For the real data power calculation, do I need bootstraping to get it? And then I can try to add more features i need.

Xuan
 Bengt O. Muthen posted on Monday, December 04, 2017 - 3:02 pm
See a UG example with categorical FA. It has a corresponding monte carlo version on our website at

http://www.statmodel.com/ugexcerpts.shtml

Real-data power analysis is still a montecarlo run but with real-data parameter estimates.