Daniel posted on Monday, August 16, 2004 - 8:19 am
Hi, I'm working on a proposal where I am looking at the effects of physical activity on smoking. However, I only have two time points and a relatively small budget. The issue is that I need to have as small a sample as possible to find an adequate result. Since I only have two time points, I'm thinking the benefits of repeated measures are not there like there would be had I had a larger sample. Do you have any suggestions of how I would approach this question? By the way, my outcome will be an ordered categorical variable.
I would have to see the saved data set to know for sure, but I suspect that you are reading it with a free format and have blanks in the data for missing values causing it to be read incorrectly. You would need to send the input and data to email@example.com for me to give a definite answer.
Todd Huschka posted on Tuesday, September 28, 2004 - 11:41 am
Is there a way to perform Power Calculations with Mplus?
On the website, we are given the following SAS code to calculate power after obtaining an estimate of the noncentrality parameter:
DATA POWER; DF=1; CRIT=3.841459; LAMBDA=9.286; POWER=(1-(PROBCHI(CRIT,DF,LAMBDA))); RUN;
I don't have access to SAS and am hoping someone can give me some code that will allow me to make this calculation in SYSTAT, SPSS or even Excel. Or, can someone recommend a free-standing executable or a web-based java program I can use for this?
(Of course, I can always ask colleagues who use SAS to run this for me, but I'd prefer to be able to do the calculation myself if possible.)
Now, there is an online non-central chi-square calculator at UCLA at http://calculators.stat.ucla.edu/cdf. Can I just use that to perform the power calculation described in the "How To" on the Mplus web site?
I'm not sure what the X Value parameter is in the calculator web form. I'm trying to replicate the result in the Mplus power calculation example as a check to make sure I can do this correctly. I assume that DF=1, that the Noncentrality Parameter=9.286, and that I should leave Probability as a ? to be solved, but I don't know what the X Value represents. Is it the sample size?
I apologize if this is a really basic question. I couldn't find any help on the calculator, and I'm hoping it will solve my problem.
Thanks for your patience.
bmuthen posted on Sunday, October 30, 2005 - 6:37 pm
I would think X is the chi-square value (the value on the x axis). Have you tried using X = CRIT?
When I set X=3.841459, df=1, and NCP=9.286, the online calculator solves p=0.138 for power of .862.
I assume that the difference between that estimate and the .85 you show in the table in the example is due to a difference in precision of the web-based calculator and the higher precision calculations performed in SAS. Does that sound right?
By the way, would you consider adding a utility/procedure for non-central chi-square calculations into a future version of Mplus? I don't know how much of an effort that would be or how it fits into your vision of what belongs in Mplus, but looking at the code for the online calculator on the UCLA site it doesn't look like it would be a major undertaking.
Thanks again for all your help and for this wonderful program.
bmuthen posted on Monday, October 31, 2005 - 2:59 pm
Yes, rounding matters. The 4 values on our web site should be:
I am currently estimating power/sample size for a study in which the outcomes are longitudinal binary measures. I'm planning on using a growth curve for the anticiapted analysis, and I may elect to model the treatment effect in the form of a multi-group analysis. If I simulate a full and restricted model (to test a treatment effect), can the the difference between the -2*loglikehoods be used in approximating the non-centrality parameter used Satorra-Saris approach to estimating power?
Satorra-Saris was developed for continuous-normal outcomes where the mean vector and covariance matrix are sufficient statistics, whereas with binary outcomes all moments are needed. I don't know of studies to simulate the non-centrality parameter. If you simulate it would seem that using the last column in the Mplus Monte Carlo output would be most straightforward - the proportion of replications rejecting the zero value for the key parameter.
My sample in a paper has 133 observations and a reviewer doubts whether I can run structural equation modeling on such a small sample size (I have three equations, each of which has about 2-3 endogenous variables and 6-8 exogenous variables – MPLUS finishes the computing in a normal manner). I remember that MPLUS is particularly useful for small-sample SEM, but I think the reviewer wants more technical details – could you please give me suggestions on this issue?
Critical factors are if the outcomes are continuous or categorical, how skewed the outcomes are, how many parameters the model has, how much missing data there is, etc. n=133 may or may not be sufficient depending on these factors. Also, see the web site's Muthen & Muthen (2002) article on Monte Carlo simulation to determine if n is large enough.
I have a question regarding an error message that I received when running a model: "THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS -0.180D-16. PROBLEM INVOLVING PARAMETER 35. THIS IS MOST LIKELY DUE TO HAVING MORE PARAMETERS THAN THE SAMPLE SIZE IN ONE OF THE GROUPS."
I have a sample size of 172 and I am doing a group comparison and one of the groups has n=35. Is this error message regarding my sample size or something else? Thanks!
Hi, I am doing a multiple group SEM with continuous (actually 5pt likert) indicators. I have a small sample size (N=161) with 49 in one group and 112 in the other. I am examining a multiple group model that estimates 80 or so parameters (80 free parms). Everything seems fine - model terminates normally, solution is positive definite, etc. Model fit is generally marginal (e.g. CFI = .89, RMSEA = .07). My question is whether there is anything wrong with this. I know this sometimes called "empirical under-identification", but with no problems with convergence (etc) do I need to worry?
In the case that I do need to worry, would you recommend that I test to see how many parameters I can (tenably) constrain to equality across groups, such that I can (perhaps) get the number of free parameters under 49 (n of smallest group). Maybe I can impute the loadings and residuals for my latent variables by obtaining them from a related CFA to get the number of free parms down as well? (I already know that measurement invariance (loadings) is tenable across groups).
Last thing - I am using the MLR estimator in this case which I believe to be the best for a small sample SEM with continuous indicators - Is this accurate, should I be using another estimator perhaps?
When you say several more parameters "in a group" I assume you mean the number of parms estimated for that group and not overall (e.g. 80 parameters being estimated overall, but 40 parms for each group)??
Another option I have to increase N is to include more observations that have missing data on 2 of the 3 waves. But this makes it such that some of my indicators will have up to 50% missing data at waves 2 & 3. Can FIML handle this? Is there any indication, perhaps a reference, of how much missing data is allowable, or that FIML (or any other imputation method can handle?
I am estimating two SEM models similar in all respects exept in the final outcome (dependent) variable. In model 1 this variable is continuous. In Model 2 it is categorical (3x categ). By default, MPLUS estimates the first model with ML the second with WLSMV.
I also know that the final outcome categorical variable (in model 2) has four missing cases.
However, when I estimate the models, the differences in the number of observations between model 1 & 2 is 16 (not four). How is this possible?
alia aishah posted on Wednesday, April 04, 2012 - 11:15 am
WHEN I RUN MY PATH ANALYSIS THIS MESSAGE CAME UP = THE MINIMUM COVARIANCE COVERAGE WAS NOT FULFILLED FOR ALL GROUPS. CATEGORICAL VARIABLE ATLEASTO HAS ZERO OBSERVATIONS IN CATEGORY 1.
I noticed that mplus output showed that my outcome variable had 4 categories but in truth it is a binary variable. what do i do? also, i used the USEOBS command which means i only used a subsample with no missing data, but the output said that there were missing dat for x, and the observations noted in the output did not tally with my subset. how do i solve this, am i missing something
It sounds like you are not reading your data correctly. This could be caused by blanks in the data set. If you can't figure it out, please send your input, data, output, and license number to firstname.lastname@example.org.
Chie Kotake posted on Tuesday, April 29, 2014 - 7:01 pm
I'm new to MPlus. I'm doing a multi group analysis with 5 groups. Unfortunately, the 2 of the groups are quite small (24 and 27). I am trying to test for configural invariance for a model that includes one construct (with 4indicators) and a manifest outcome variable.
I'm testing each group separately to see if model actually works, and for the group of 24 participants, I get the error of standard error not being trustworthy. From reading the forums, it can be due to my small sample size? Even after fixing the problematic parameter, I keep getting an error.
1). Am I correct in the issue is my small sample size? The model actually works with my other group with 40.
2) Is there a way to use this group and still make the multi group analysis to work?
I am using the MPlus 7.3 demo version to run the code in Muthén, L.K. & Muthén, B.O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 4, 599-620.
However, I am getting error messages. For example, gclasses in the original code didn't work. So I changed it to genclasses, but that didn't work either.
Also, I am unable to find the complete outputs from the above study on the website.
Thank you for your previous response on GENCLASSES. I have another question that is more procedural related.
In general, I am interested in sample size determination, so I ran the syntax for Example 12.5 in the User's Guide, but with a smaller sample size (N=50). This resulted in error messages. I increased the sample size incrementally until there were no more error messages in the Tech9 output (at N=350). From there, I checked bias, CI, and power. Is this an appropriate strategy?
I am using Monte Carlo simulation to estimate sample size for second order CFA model. Anything wrong with this input? The chi-square test, the expected value is very large different from the observed (0.050 vs 0.000). The %Sig Coeff under “model result”, all are 0.0000.
It depends too much on the specific model. You can do a Monte Carlo simulation to get a better feel for it. See our book Regression and Mediation Analysis using Mplus, Chapter 3.
Xu, Man posted on Wednesday, July 11, 2018 - 11:36 am
Dear Dr.s Muthen,
I am trying SEM approach on a small sample (n=100) for an experimental study. There are a few latent variables each with 3 items, so the total number of parameters approaches the number of sample size. However the model converges and fit is fine.
Now the issue is, I know there are some rules of thumbs for sample size such as 5 cases per parameter but there also seem to be some papers indicating sample size is not so much an issue for parameter estimation as for overall fit indices. Also, a quick monte carlo simulation analysis does not seem to indicate severe power problems.
Another approach may be to estimate factor scores for each factor separately, then use them in a path analysis to estimate the experimental effect.
However, do you think it is generally acceptable if one just go for the full SEM approach?