Anonymous posted on Friday, January 28, 2005 - 6:44 am
Hello, I am trying to use the method described in your aricle "How To Use A Monte Carlo Study To Decide on Sample Size and Determine Power" dated April 9, 2002 to estimate the power for a 2-factor CFA. I understand the examples for the normally distributed variables, but am having difficulty with the non-normal examples. Specifically, I am not sure why the residual variance for the 2nd factor (which contains the non-normal data) is changed from .36 (as it was in the example for normally distributed data) to 9. I'm also unsure why the factor correlation in the MODEL MONTECARLO statement is .95 and in the MODEL statement is .20. Shouldn't these be the same? I'd appreciate any help. Thank you!
I will have to look into this after I return to the US -- after February 2.
bmuthen posted on Friday, February 04, 2005 - 2:20 pm
See the "second step" on page 603 of the article. The modelmontecarlo values residual variance and factor covariance values are the 2-class values needed to get the 1-class reliability of 0.64 and the 1-class factor correlation desired.
Anonymous posted on Monday, February 28, 2005 - 3:44 pm
Dear Dr. Muthen,
I tried to follow "How To Use A Monte Carlo Study To Decide on Sample Size and Determine Power" for MIMIC model, but could not make it right. I copied exactly the same code from the paper for CFA, and only added "f1-f2 on var*0.3;" as the last line in both :model montecarlo" and "model" and try to see what I can get (hoping to test the hypothesis that the structure is invarient with the covariate "var", I made up the number 0.3), but I got error message "** FATAL ERROR A POPULATION VARIANCE FOR A COVARIATE IS ZERO." May I ask what should I code here? Thanks you very much!
PS: I have to comment %overall% as I kept getting error message with this line.
montecarlo: names are y1-y10 var; nobservations = 400; nreps = 10000; seed = 53487; save = test.sav;
variable: names are y1-y10 var; categorical are var;
analysis: type = general; estimator = ML;
model montecarlo: !%overall% f1 by y1-y5*0.8; f2 by y6-y10*0.8; f1@1f2@1; y1-y10*0.36; f1 with f2*0.25; f1-f2 on hist*0.3; model: !%overall% f1 by y1-y5*0.8; f2 by y6-y10*0.8; f1@1f2@1; y1-y10*0.36; f1 with f2*0.25; f1-f2 on hist*0.3;
bmuthen posted on Monday, February 28, 2005 - 6:34 pm
f1-f2 on var*0.3
implies that the variable "var" is an "x variable" in Mplus terms, i.e. a covariate as referred to in the error message. Such a variable must also be given a population variance value in the Model MonteCarlo paragraph, e.g.
I'm also having problems to understand the example with non-normally distributed data in your your aricle "How To Use A Monte Carlo Study To Decide on Sample Size and Determine Power". I'm trying to adapt the example to my own cfa-model with non-normally distributed data.
The indicator variables of one of my factors have on average a skewness of 2.3 and and kurtosis of 5.7. Let's say I use the same proportion of indivdiuals in the normal and outlier-class as in your example. Which mean and variance do I have to take for the outlier class to get the desired skewness and kurtosis generated for the population? Is there a formula to calculate these two numbers?
Thanks! It worked out very well, but now I have still another problem. I tried to simulate a skewness of 3.12 and a kurtosis of 9.91 in one factor, since this is what I find in my raw data. I tried several combinations of means and variances in the outlier class, but only found the right combination by decreasing the proportion of individuels in the outlier class as well. In another factor, I want to simulate a skewness of 1.27 and a kurtosis of 0.9, but with the oulier proportion I set for the first factor, it seems not to be possible. I would have to increase the proportion of individuals in the oulier class to find the right combination of mean and variance of the second factor in the outlier class. How can I escape this dilemma?
Another problem is, that the correlations between the non-normal distributed factor and other factors is decreased quite a lot. Even by setting the factor correlation to 0.95 in the model population, I can not simulate the original size of the factor correlation, that I had in my raw data.
Any solutions or thoughts are highly appreciated. Thanks!
You could try having a separate latent class variable influencing each of your factors to have more flexibility.
One point is that you say you saw a certain degree of non-normality in your raw data. I assume you refer to the factor scores estimated for each person from the posterior distribution. Note that this does not necessarily give the true factor distribution, but could show a skewed distribution merely because there are no items covering one end of the factor dimension. In other words, data generated by a normal factor and such asymmetric items will give a skewed posterior.
Thanks. Actually, I just estimated the factor skewness and kurtosis by averaging the skewness and kurtosis of the items loading on that factor. Since you mentioned skewness and kurtosis of the factor scores, I looked at that too and it's quite close to my previous estimates.
I followed your advice and introduced a second latent class variable. Unfortunately, whenever I introduce it, I get the error message "unknown class label in model population %C#1%". Can you tell me, what I'm doing wrong?
Although I can simulate the same degree of kurtosis and skewness as in my raw data by generating 1 or 2 latent class variables with 2 latent classes and analysing them as one, the resulting distribution seems to be quite different from my raw data. I generated one replication for 10'000 observations, saved the data, imported it into SPSS and generated histograms. I compared the resulting histograms with my raw data and figured out, that the distribution is totally different, although skewness and kurtosis is about the same.
In my raw data, non-normality is primarily the result of censoring from below or zero-inflation. Some variables even have a piling up on both ends. The piling up on the lower end is quite extreme. All variables have at least 10% zero cases, some variables have up to 70% zero cases. The variables are from visual analog scales with possible values of 0 to 100.
I already tried to generate censored dependent variables in my cfa-model and then estimate the model, as if the variables were normally distributed, but It seems not to be allowed in Mplus. My idea was to estimate the bias of not taking zero-inflation into account. Do you see another possibility to do that?
Anyway, I guess I have to go back to EFA and CFA taking zero-inflation and censoring into account from the beginning.
The problem is also that some variables have a piling up on both ends and I read that you can only estimate variables with censoring at one end. How would you treat such variables? The problem is also that my questionnaire has 66items. It seems to be impossible to do an EFA with so many censored variables, since it needs numerical integration.
Thank you. I don't think, I can use numerical integration, because I have to extract more than 4 factors. Do you think, it would be feasible to categorize the variables, so I can do my EFA with categorical variables and WLSMV? If so, how many categories would you suggest?
I'm also not quite sure, whether I can treat my variables as censored from below, because the zero values are probably true zero values. As I've said earlier, the variables are visual analogue scales (with values from 0 to 100) from a questionnaire that measures altered states of consciousness. The lower end of every scale is labeled as "not more than usual" and the upper end as "much more than usual". Subjects are asked to rate certain aspects of altered states of consiousness and how much it changed compared to their usual state of consciousness. I suppose, one can only totally negate any change in consciousness and therefore values below zero wouldn't make any sense. I read that it's possible to model this kind of zero-inflation with a 2 part model, but so far, I only saw examples in the growth modeling literature for this. Has anybody ever done this in the test construction field?
I know, that it's possible to generate censored variables with montecarlo, but it seems not to be allowed to analyse the same variables as uncensored.
I am trying to estimate the power of a simple one factor congeneric model. The reason behind my analysis is that excessive power due to large sample size may lead to the rejection of the proposed model.
I have one latent with 7 observed variables. The indicators are ordinal hence I would like to use the WLSM estimator. Considering that my analysis is post hoc since I already have data, how do I estimate the power of such a model?
The power estimation of Mplus' Monte Carlo facility concerns a single parameter, but it sounds like you are instead concerned with overall model fit measures such as chi-square. In those cases, I prefer a sensitivity analysis where you start with your initial model and use Modindices to relax the restrictions of the model until chi-square is acceptable. Then consider if this new model has changed the parameters of the initial model in substantively important ways. If not, then the test is overpowered in this sense.
Thx again. I had a look at the paper and in order to undertake such a procedure one needs to run a few SAS commands and SAS is not my forte.
I then wonder if I could use the Mplus' Monte Carlo facility. Given that I already have the data do I need to estimate the implied population covariance matrix using the WLSM estimator first? And what then?
I have read your paper Muthen and Muthen (2002, pubished in Structural Equation Modeling) but only synthetic data is used in that paper.
The M & M (2002) paper is about power for a single parameter - so using the Mplus Monte Carlo facility. It sounded like you were instead interested in power of the test of overall model fit which is the MacCallum et al topic. If the MacCallum approach can be done in SAS, it should be possible to translate that into being done in Mplus. I can't recall the procedures in that paper, so I can't say off hand how it would be done. It is easy in Mplus to produce the implied population covariance matrix by fixing every parameter and asking for Residual output.
Many thanks Bengt. It would be great if the next release of Mplus could embed the MacCallum et al's power estimation for both retrospective and prospective models. I think it would add a lot of value to the product. How's that for a marketer's speech ?
By the way, after some consultation on SEMNet it would appear that the method used by MacCallum et al. (1996) to computing power can only be applied to normal theory with continuous observed variables and a covariance matrix as input. Such a procedure involves the non-central chi-square distribution (which is linked to RMSEA's distribution for the test of not-close-fit). But with ordinal variables, polychoric correlations, and estimators such as i.e., WLSM, WLSMV or even WLS I wonder whether sure this method will yield the correct results. I was not able to find any literature in that regard. My gut feeling is that such literature is yet to be written. Could you please suggest any alternative method to estimate model power using estimators such as WLSM or WLSMV? Thanks in advance.
This sounds like a methods research topic. We are not aware of any such methods.
Lois Downey posted on Tuesday, March 24, 2009 - 8:17 pm
I want to use the Monte Carlo facility to assess power for several sample sizes for analyzing a 16-indicator 4-factor CFA model, in which the indicators are generated as 10-category ordinal variables, but analyzed as continuous, using the MLR estimator.
I've included the following two options in the MONTECARLO command: GENERATE = y1-y16 (9 p); CATEGORICAL = y1-y16;
It was my understanding that y1-y16 would be initially generated as normally distributed with mean=0 and variance=1, and that in order to recode them into 10 categories, I should specify the 9 thresholds as part of the MODEL POPULATION command. Therefore, I entered 9 statements similar to the following to define the 9 thresholds: [y1$1 y2$1 y3$1 y4$1 y5$1 y6$1 y7$1 y8$1 y9$1 y10$1 y11$1 y12$1 y13$1 y14$1 y15$1 y16$1](-1.75);
This produces error messages like the following for each threshold: *** ERROR in MODEL POPULATION command True value must be specified for [ Y1$1 ].
Where have I gone wrong?
And a second question: The choice of -1.75 for the first threshold was based on a desire to have approximately 4% of the sample falling in the lowest category. Is this the correct method for determining the appropriate threshold for each category, given a goal of producing negatively skewed samples with a specific shape?
The assignment is to the last entry unless the variables are part of a list, for example,
ywang posted on Tuesday, April 20, 2010 - 12:50 pm
Hello, Dr. Muthens, I have two questions about Monte Carlo simulation study. In example 11.1. Monte Carlo Simulation study for a CFA with covariates with continuous factor indicators and patterns of missing data (user's guide), the variance for y1-y5 are listed in Model population (y1-y4*.5), but the means for y1-y4 are not specified. Does this mean that all of the means for y1-y5 are 0?
The second question is about @ and *. Can you verify whether @ and * are exchangeable in Monte Carlo simulation study? If so, are they exchangeable for part of "model population" or both part of "model population" and part of "model"?
Thank you very much for the reply. I have further questions for Monte Carlo study. For Monte Carlo simulation for path analysis with continuous dependent variables(example 3.11), the following is the input: MODEL POPULATION: [x1-x3@0]; x1-x3@1; y1 ON x1*1 x2*2 x3*3; y2 ON x1*3 x2*2 x3*1; y3 ON y1*.5 y2*.75 x2*1; [y1*-1 y2*0 y3*1]; y1*1 y2*1.5 y3*2;
I cannot figure why the variance of y1 is 17, the variance of y2 is 17 and y3 is 32 in the generated dataset. Why aren't they 1, 1.5 or 2 as specified in the Model population command (y1*1, y2*1.5, y3*2)? Do I miss anything?
The UG does not have examples of zero-inflated categorical modeling. I assume you are not referring to counts. A high probability for the zero category is typically handled by ordinal DV modeling, but you can try to add zero-inflation by adding a mixture of two classes.
Xuan Chen posted on Sunday, December 03, 2017 - 12:06 pm
Thanks for your clarification. It is not couting data. Only a three-likert scale with many 0 zeros. Do you have the UG example of power calculation for categorical data in CFA? For the real data power calculation, do I need bootstraping to get it? And then I can try to add more features i need.