Mplus Discussion >> Sample size

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Sample size

Mplus Discussion > Structural Equation Modeling >

Message/Author

Daniel posted on Monday, August 16, 2004 - 8:19 am

Hi, I'm working on a proposal where I am looking at the effects of physical activity on smoking. However, I only have two time points and a relatively small budget. The issue is that I need to have as small a sample as possible to find an adequate result. Since I only have two time points, I'm thinking the benefits of repeated measures are not there like there would be had I had a larger sample. Do you have any suggestions of how I would approach this question?
By the way, my outcome will be an ordered categorical variable.

Linda K. Muthen posted on Monday, August 16, 2004 - 6:01 pm

You can use Monte Carlo simulations to determine how large of a sample you will need with two time points to have the power to detect the effect that you are interested in.

Anonymous posted on Tuesday, August 17, 2004 - 5:14 am

Hi, I am new to MPlus. I have some troubles with converting sav-files into dat--files. I just saved an existing sav-file as a ascii dat file and ran the programm. It gave me following hint:

My categorical variable has 39 categories, which exceeds the maximum of 10 for a categorical variable.

My categorical variable has only 2 categories 0 and 1 and some missing cases.
What went wrong. Do you have an idea?

Linda K. Muthen posted on Tuesday, August 17, 2004 - 7:22 am

I would have to see the saved data set to know for sure, but I suspect that you are reading it with a free format and have blanks in the data for missing values causing it to be read incorrectly. You would need to send the input and data to support@statmodel.com for me to give a definite answer.

Todd Huschka posted on Tuesday, September 28, 2004 - 11:41 am

Is there a way to perform Power Calculations with Mplus?

Linda K. Muthen posted on Wednesday, September 29, 2004 - 3:51 pm

There are two ways to do this in Mplus. One is listed in the left margin under Power Calculation. The other is described in the following paper:

Muthén, L.K. & Muthén, B.O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 4, 599-620.

which can be downloaded from the Mplus website.

RO posted on Sunday, October 30, 2005 - 11:35 am

On the website, we are given the following SAS code to calculate power after obtaining an estimate of the noncentrality parameter:

DATA POWER;
DF=1; CRIT=3.841459;
LAMBDA=9.286;
POWER=(1-(PROBCHI(CRIT,DF,LAMBDA)));
RUN;

I don't have access to SAS and am hoping someone can give me some code that will allow me to make this calculation in SYSTAT, SPSS or even Excel. Or, can someone recommend a free-standing executable or a web-based java program I can use for this?

(Of course, I can always ask colleagues who use SAS to run this for me, but I'd prefer to be able to do the calculation myself if possible.)

Thanks.

RO

Linda K. Muthen posted on Sunday, October 30, 2005 - 12:53 pm

I don't know how to do this in any other program. Perhaps someone else does??? It is basically calling the non-central chi-square distribution.

RO posted on Sunday, October 30, 2005 - 2:14 pm

Linda,

Thanks. That moves me a step forward.

Now, there is an online non-central chi-square calculator at UCLA at http://calculators.stat.ucla.edu/cdf. Can I just use that to perform the power calculation described in the "How To" on the Mplus web site?

I'm not sure what the X Value parameter is in the calculator web form. I'm trying to replicate the result in the Mplus power calculation example as a check to make sure I can do this correctly. I assume that DF=1, that the Noncentrality Parameter=9.286, and that I should leave Probability as a ? to be solved, but I don't know what the X Value represents. Is it the sample size?

I apologize if this is a really basic question. I couldn't find any help on the calculator, and I'm hoping it will solve my problem.

Thanks for your patience.

RO

bmuthen posted on Sunday, October 30, 2005 - 6:37 pm

I would think X is the chi-square value (the value on the x axis). Have you tried using X = CRIT?

RO posted on Monday, October 31, 2005 - 10:10 am

Bengt and Linda,

Thanks.

When I set X=3.841459, df=1, and NCP=9.286, the online calculator solves p=0.138 for power of .862.

I assume that the difference between that estimate and the .85 you show in the table in the example is due to a difference in precision of the web-based calculator and the higher precision calculations performed in SAS. Does that sound right?

By the way, would you consider adding a utility/procedure for non-central chi-square calculations into a future version of Mplus? I don't know how much of an effort that would be or how it fits into your vision of what belongs in Mplus, but looking at the code for the online calculator on the UCLA site it doesn't look like it would be a major undertaking.

Thanks again for all your help and for this wonderful program.

RO

bmuthen posted on Monday, October 31, 2005 - 2:59 pm

Yes, rounding matters. The 4 values on our web site should be:

0.815583
0.861557
0.990604
0.999982

Linda K. Muthen posted on Saturday, November 05, 2005 - 8:09 am

Here's a post that came via Webmaster:

Linda and RO,

One way is to download a free probability
calculator available at

http://www.ncss.com/download.html

Once installed, choose the right-hand radio button
next to the chi-square option and fill in the
appropriate values shown in the SAS syntax and
then press Calculate.

Power is given in the box under "Prob(x >= X)",
which in this case is 0.8615547557

The left-hand radio button next to the chi-square
option gives chi-square values for specified df and
chosen P values, and therefore can provide
critical chi-squares values.

Best wishes,

Paul Dudgeon

Charles Green posted on Friday, September 15, 2006 - 8:33 am

I am currently estimating power/sample size for a study in which the outcomes are longitudinal binary measures. I'm planning on using a growth curve for the anticiapted analysis, and I may elect to model the treatment effect in the form of a multi-group analysis. If I simulate a full and restricted model (to test a treatment effect), can the the difference between the -2*loglikehoods be used in approximating the non-centrality parameter used Satorra-Saris approach to estimating power?

Bengt O. Muthen posted on Sunday, October 01, 2006 - 11:36 am

Satorra-Saris was developed for continuous-normal outcomes where the mean vector and covariance matrix are sufficient statistics, whereas with binary outcomes all moments are needed. I don't know of studies to simulate the non-centrality parameter. If you simulate it would seem that using the last column in the Mplus Monte Carlo output would be most straightforward - the proportion of replications rejecting the zero value for the key parameter.

dm posted on Sunday, May 20, 2007 - 12:57 pm

Hi,

My sample in a paper has 133 observations and a reviewer doubts whether I can run structural equation modeling on such a small sample size (I have three equations, each of which has about 2-3 endogenous variables and 6-8 exogenous variables � MPLUS finishes the computing in a normal manner). I remember that MPLUS is particularly useful for small-sample SEM, but I think the reviewer wants more technical details � could you please give me suggestions on this issue?

Thanks!

Bengt O. Muthen posted on Sunday, May 20, 2007 - 1:30 pm

Critical factors are if the outcomes are continuous or categorical, how skewed the outcomes are, how many parameters the model has, how much missing data there is, etc. n=133 may or may not be sufficient depending on these factors. Also, see the web site's Muthen & Muthen (2002) article on Monte Carlo simulation to determine if n is large enough.

Lisa Melander posted on Wednesday, February 06, 2008 - 9:56 am

Drs. Muthen,

I have a question regarding an error message that I received when running a model: "THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS -0.180D-16. PROBLEM INVOLVING PARAMETER 35. THIS IS MOST LIKELY DUE TO HAVING MORE PARAMETERS THAN THE SAMPLE SIZE IN ONE OF THE GROUPS."

I have a sample size of 172 and I am doing a group comparison and one of the groups has n=35. Is this error message regarding my sample size or something else? Thanks!

Linda K. Muthen posted on Wednesday, February 06, 2008 - 10:33 am

It sounds like you have more than 35 parameters in the group with n=35.

James L. Lewis posted on Thursday, February 25, 2010 - 2:29 pm

Hi,
I am doing a multiple group SEM with continuous (actually 5pt likert) indicators. I have a small sample size (N=161) with 49 in one group and 112 in the other. I am examining a multiple group model that estimates 80 or so parameters (80 free parms). Everything seems fine - model terminates normally, solution is positive definite, etc. Model fit is generally marginal (e.g. CFI = .89, RMSEA = .07). My question is whether there is anything wrong with this. I know this sometimes called "empirical under-identification", but with no problems with convergence (etc) do I need to worry?

In the case that I do need to worry, would you recommend that I test to see how many parameters I can (tenably) constrain to equality across groups, such that I can (perhaps) get the number of free parameters under 49 (n of smallest group). Maybe I can impute the loadings and residuals for my latent variables by obtaining them from a related CFA to get the number of free parms down as well? (I already know that measurement invariance (loadings) is tenable across groups).

Last thing - I am using the MLR estimator in this case which I believe to be the best for a small sample SEM with continuous indicators - Is this accurate, should I be using another estimator perhaps?

Thanks much.

James

Linda K. Muthen posted on Friday, February 26, 2010 - 9:47 am

At a minimum you need several more observations in a group than you have parameters in the group.

If your Likert variables have floor or ceiling effects, you should declare them as categorical. You can use either maximum likelihood or weighted least squares in this case.

James L. Lewis posted on Friday, February 26, 2010 - 9:56 am

Thanks.

When you say several more parameters "in a group" I assume you mean the number of parms estimated for that group and not overall (e.g. 80 parameters being estimated overall, but 40 parms for each group)??

Another option I have to increase N is to include more observations that have missing data on 2 of the 3 waves. But this makes it such that some of my indicators will have up to 50% missing data at waves 2 & 3. Can FIML handle this? Is there any indication, perhaps a reference, of how much missing data is allowable, or that FIML (or any other imputation method can handle?

Linda K. Muthen posted on Friday, February 26, 2010 - 10:15 am

You need to compare the number of free parameters in a group to the number of observations in the group. You can see this in TECH1.

The only way to really understand your data is to do a simulation study. To me 50% missing is not a good thing. You might want to consider a simpler model for these data.

James L. Lewis posted on Friday, February 26, 2010 - 10:30 am

Thank you!

Ramzi Mabsout posted on Tuesday, April 27, 2010 - 12:49 pm

Hello,

I am estimating two SEM models similar in all respects exept in the final outcome (dependent) variable. In model 1 this variable is continuous. In Model 2 it is categorical (3x categ). By default, MPLUS estimates the first model with ML the second with WLSMV.

I also know that the final outcome categorical variable (in model 2) has four missing cases.

However, when I estimate the models, the differences in the number of observations between model 1 & 2 is 16 (not four). How is this possible?

Linda K. Muthen posted on Tuesday, April 27, 2010 - 1:14 pm

Please send the two outputs and your license number to support@statmodel.com.

alia aishah posted on Wednesday, April 04, 2012 - 11:15 am

WHEN I RUN MY PATH ANALYSIS THIS MESSAGE CAME UP = THE MINIMUM COVARIANCE COVERAGE WAS NOT FULFILLED FOR ALL GROUPS.
CATEGORICAL VARIABLE ATLEASTO HAS ZERO OBSERVATIONS IN CATEGORY 1.

I noticed that mplus output showed that my outcome variable had 4 categories but in truth it is a binary variable. what do i do? also, i used the USEOBS command which means i only used a subsample with no missing data, but the output said that there were missing dat for x, and the observations noted in the output did not tally with my subset. how do i solve this, am i missing something

Linda K. Muthen posted on Wednesday, April 04, 2012 - 1:27 pm

It sounds like you are not reading your data correctly. This could be caused by blanks in the data set. If you can't figure it out, please send your input, data, output, and license number to support@statmodel.com.

Chie Kotake posted on Tuesday, April 29, 2014 - 7:01 pm

Hi!

I'm new to MPlus. I'm doing a multi group analysis with 5 groups. Unfortunately, the 2 of the groups are quite small (24 and 27). I am trying to test for configural invariance for a model that includes one construct (with 4indicators) and a manifest outcome variable.

I'm testing each group separately to see if model actually works, and for the group of 24 participants, I get the error of standard error not being trustworthy. From reading the forums, it can be due to my small sample size? Even after fixing the problematic parameter, I keep getting an error.

1). Am I correct in the issue is my small sample size? The model actually works with my other group with 40.

2) Is there a way to use this group and still make the multi group analysis to work?

Thank you!

Linda K. Muthen posted on Wednesday, April 30, 2014 - 8:24 am

Your samples are very small.

Please send the output with the error and your license number to support@statmodel.com.

Wanchen Chang posted on Tuesday, April 21, 2015 - 8:20 pm

I am using the MPlus 7.3 demo version to run the code in
Muth�n, L.K. & Muth�n, B.O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 4, 599-620.

However, I am getting error messages. For example, gclasses in the original code didn't work. So I changed it to genclasses, but that didn't work either.

Also, I am unable to find the complete outputs from the above study on the website.

Thanks in advance for your help!

Linda K. Muthen posted on Wednesday, April 22, 2015 - 5:58 am

What error message do you get. GENCLASSES is the option name.

Wanchen Chang posted on Wednesday, April 22, 2015 - 6:33 am

Here is the error message:

*** ERROR in MONTECARLO command
The number of classes must be specified with each categorical latent variable
in GENCLASSES option.
1

Linda K. Muthen posted on Wednesday, April 22, 2015 - 9:23 am

The GENCLASSES option is specified

GENCLASSES - c1 (2) c2 (3) c3 (4);

It sounds like you are not doing that.

Wanchen Chang posted on Thursday, April 30, 2015 - 9:56 am

Hello,

Thank you for your previous response on GENCLASSES. I have another question that is more procedural related.

In general, I am interested in sample size determination, so I ran the syntax for Example 12.5 in the User's Guide, but with a smaller sample size (N=50). This resulted in error messages. I increased the sample size incrementally until there were no more error messages in the Tech9 output (at N=350). From there, I checked bias, CI, and power. Is this an appropriate strategy?

Bengt O. Muthen posted on Thursday, April 30, 2015 - 6:21 pm

Well, it doesn't help you if you want to know about n=200, say. Perhaps the parameter value choices were too difficult to handle at smaller sample sizes.

Cheng posted on Sunday, May 31, 2015 - 12:20 am

I am using Monte Carlo simulation to estimate sample size for second order CFA model. Anything wrong with this input? The chi-square test, the expected value is very large different from the observed (0.050 vs 0.000). The %Sig Coeff under �model result�, all are 0.0000.

MODEL POPULATION:
[X1-X19@0];
F1 BY X1-X7@0.75;
F2 BY X8-X12@0.75;
F3 BY X13-X15@0.75;
F4 BY X16-X19@0.75;
Stress BY F1@0.75 F3@0.75;
Recovery BY F2@0.75 F4@0.75;
F1-F4@1;
Stress@1;
Recovery@1;
X1-X19@0.30;
Stress WITH Recovery@0.60;
MODEL:
[X1-X19*0];
F1 BY X1-X7*0.75;
F2 BY X8-X12*0.75;
F3 BY X13-X15*0.75;
F4 BY X16-X19*0.75;
Stress BY F1*0.75 F3*0.75;
Recovery BY F2*0.75 F4*0.75;
F1-F4*1;
Stress*1;
Recovery*1;
X1-X19*0.30;
Stress WITH Recovery*0.60;
OUTPUT: TECH9;

Linda K. Muthen posted on Sunday, May 31, 2015 - 11:53 am

Please send the output and your license number to support@statmodel.com.

cecilia posted on Monday, February 05, 2018 - 12:25 pm

Dear, I commonly use SEM (or path) in the framework of psychological or social problems. Now, a colleague who works in Psychobiology showed me some papers were path analysis is used with really very small samples (n < 20) (see e.g. https://www.ncbi.nlm.nih.gov/pubmed/24933661)
I have also see other applications of SEM with small samples: https://www.ncbi.nlm.nih.gov/pubmed/9408041
https://www.ncbi.nlm.nih.gov/pubmed/1884204

I would like knowing what do you think and suggest about that types of applications.

Thanks!

Bengt O. Muthen posted on Monday, February 05, 2018 - 1:39 pm

If you don't have repeated measures, N=20 samples don't sound like they could be very useful for SEM analysis; for one, there can't be much power to reject models. Check with SEMNET as well.

Jinxin ZHU posted on Thursday, June 14, 2018 - 7:05 pm

Is there any rule of thumb about the requirement of sample size for path analysis (no latent trait was specified)?

For instance, I have a very complex model with 150 parameters to be estimated, whereas each dependent variable has only 3 independent variables.

The sample size is 200. The standard error seems fine. I am wondering whether it is problematic to have so many parameters with such a small sample size.

Bengt O. Muthen posted on Friday, June 15, 2018 - 2:32 pm

It depends too much on the specific model. You can do a Monte Carlo simulation to get a better feel for it. See our book Regression and Mediation Analysis using Mplus, Chapter 3.

Xu, Man posted on Wednesday, July 11, 2018 - 11:36 am

Dear Dr.s Muthen,

I am trying SEM approach on a small sample (n=100) for an experimental study. There are a few latent variables each with 3 items, so the total number of parameters approaches the number of sample size. However the model converges and fit is fine.

Now the issue is, I know there are some rules of thumbs for sample size such as 5 cases per parameter but there also seem to be some papers indicating sample size is not so much an issue for parameter estimation as for overall fit indices.
Also, a quick monte carlo simulation analysis does not seem to indicate severe power problems.

Another approach may be to estimate factor scores for each factor separately, then use them in a path analysis to estimate the experimental effect.

However, do you think it is generally acceptable if one just go for the full SEM approach?

Thanks for your suggestions!

Kate

Bengt O. Muthen posted on Wednesday, July 11, 2018 - 5:45 pm

Seems ok to do the full SEM if you can show a relevant Monte Carlo run that supports good estimates. I assume the SEs won't be great and the fit performance worse.