Message/Author 

Daniel posted on Monday, August 16, 2004  8:19 am



Hi, I'm working on a proposal where I am looking at the effects of physical activity on smoking. However, I only have two time points and a relatively small budget. The issue is that I need to have as small a sample as possible to find an adequate result. Since I only have two time points, I'm thinking the benefits of repeated measures are not there like there would be had I had a larger sample. Do you have any suggestions of how I would approach this question? By the way, my outcome will be an ordered categorical variable. 


You can use Monte Carlo simulations to determine how large of a sample you will need with two time points to have the power to detect the effect that you are interested in. 

Anonymous posted on Tuesday, August 17, 2004  5:14 am



Hi, I am new to MPlus. I have some troubles with converting savfiles into datfiles. I just saved an existing savfile as a ascii dat file and ran the programm. It gave me following hint: My categorical variable has 39 categories, which exceeds the maximum of 10 for a categorical variable. My categorical variable has only 2 categories 0 and 1 and some missing cases. What went wrong. Do you have an idea? 


I would have to see the saved data set to know for sure, but I suspect that you are reading it with a free format and have blanks in the data for missing values causing it to be read incorrectly. You would need to send the input and data to support@statmodel.com for me to give a definite answer. 

Todd Huschka posted on Tuesday, September 28, 2004  11:41 am



Is there a way to perform Power Calculations with Mplus? 


There are two ways to do this in Mplus. One is listed in the left margin under Power Calculation. The other is described in the following paper: Muthén, L.K. & Muthén, B.O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 4, 599620. which can be downloaded from the Mplus website. 

RO posted on Sunday, October 30, 2005  11:35 am



On the website, we are given the following SAS code to calculate power after obtaining an estimate of the noncentrality parameter: DATA POWER; DF=1; CRIT=3.841459; LAMBDA=9.286; POWER=(1(PROBCHI(CRIT,DF,LAMBDA))); RUN; I don't have access to SAS and am hoping someone can give me some code that will allow me to make this calculation in SYSTAT, SPSS or even Excel. Or, can someone recommend a freestanding executable or a webbased java program I can use for this? (Of course, I can always ask colleagues who use SAS to run this for me, but I'd prefer to be able to do the calculation myself if possible.) Thanks. RO 


I don't know how to do this in any other program. Perhaps someone else does??? It is basically calling the noncentral chisquare distribution. 

RO posted on Sunday, October 30, 2005  2:14 pm



Linda, Thanks. That moves me a step forward. Now, there is an online noncentral chisquare calculator at UCLA at http://calculators.stat.ucla.edu/cdf. Can I just use that to perform the power calculation described in the "How To" on the Mplus web site? I'm not sure what the X Value parameter is in the calculator web form. I'm trying to replicate the result in the Mplus power calculation example as a check to make sure I can do this correctly. I assume that DF=1, that the Noncentrality Parameter=9.286, and that I should leave Probability as a ? to be solved, but I don't know what the X Value represents. Is it the sample size? I apologize if this is a really basic question. I couldn't find any help on the calculator, and I'm hoping it will solve my problem. Thanks for your patience. RO 

bmuthen posted on Sunday, October 30, 2005  6:37 pm



I would think X is the chisquare value (the value on the x axis). Have you tried using X = CRIT? 

RO posted on Monday, October 31, 2005  10:10 am



Bengt and Linda, Thanks. When I set X=3.841459, df=1, and NCP=9.286, the online calculator solves p=0.138 for power of .862. I assume that the difference between that estimate and the .85 you show in the table in the example is due to a difference in precision of the webbased calculator and the higher precision calculations performed in SAS. Does that sound right? By the way, would you consider adding a utility/procedure for noncentral chisquare calculations into a future version of Mplus? I don't know how much of an effort that would be or how it fits into your vision of what belongs in Mplus, but looking at the code for the online calculator on the UCLA site it doesn't look like it would be a major undertaking. Thanks again for all your help and for this wonderful program. RO 

bmuthen posted on Monday, October 31, 2005  2:59 pm



Yes, rounding matters. The 4 values on our web site should be: 0.815583 0.861557 0.990604 0.999982 


Here's a post that came via Webmaster: Linda and RO, One way is to download a free probability calculator available at http://www.ncss.com/download.html Once installed, choose the righthand radio button next to the chisquare option and fill in the appropriate values shown in the SAS syntax and then press Calculate. Power is given in the box under "Prob(x >= X)", which in this case is 0.8615547557 The lefthand radio button next to the chisquare option gives chisquare values for specified df and chosen P values, and therefore can provide critical chisquares values. Best wishes, Paul Dudgeon 


I am currently estimating power/sample size for a study in which the outcomes are longitudinal binary measures. I'm planning on using a growth curve for the anticiapted analysis, and I may elect to model the treatment effect in the form of a multigroup analysis. If I simulate a full and restricted model (to test a treatment effect), can the the difference between the 2*loglikehoods be used in approximating the noncentrality parameter used SatorraSaris approach to estimating power? 


SatorraSaris was developed for continuousnormal outcomes where the mean vector and covariance matrix are sufficient statistics, whereas with binary outcomes all moments are needed. I don't know of studies to simulate the noncentrality parameter. If you simulate it would seem that using the last column in the Mplus Monte Carlo output would be most straightforward  the proportion of replications rejecting the zero value for the key parameter. 

dm posted on Sunday, May 20, 2007  12:57 pm



Hi, My sample in a paper has 133 observations and a reviewer doubts whether I can run structural equation modeling on such a small sample size (I have three equations, each of which has about 23 endogenous variables and 68 exogenous variables – MPLUS finishes the computing in a normal manner). I remember that MPLUS is particularly useful for smallsample SEM, but I think the reviewer wants more technical details – could you please give me suggestions on this issue? Thanks! 


Critical factors are if the outcomes are continuous or categorical, how skewed the outcomes are, how many parameters the model has, how much missing data there is, etc. n=133 may or may not be sufficient depending on these factors. Also, see the web site's Muthen & Muthen (2002) article on Monte Carlo simulation to determine if n is large enough. 


Drs. Muthen, I have a question regarding an error message that I received when running a model: "THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NONPOSITIVE DEFINITE FIRSTORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.180D16. PROBLEM INVOLVING PARAMETER 35. THIS IS MOST LIKELY DUE TO HAVING MORE PARAMETERS THAN THE SAMPLE SIZE IN ONE OF THE GROUPS." I have a sample size of 172 and I am doing a group comparison and one of the groups has n=35. Is this error message regarding my sample size or something else? Thanks! 


It sounds like you have more than 35 parameters in the group with n=35. 


Hi, I am doing a multiple group SEM with continuous (actually 5pt likert) indicators. I have a small sample size (N=161) with 49 in one group and 112 in the other. I am examining a multiple group model that estimates 80 or so parameters (80 free parms). Everything seems fine  model terminates normally, solution is positive definite, etc. Model fit is generally marginal (e.g. CFI = .89, RMSEA = .07). My question is whether there is anything wrong with this. I know this sometimes called "empirical underidentification", but with no problems with convergence (etc) do I need to worry? In the case that I do need to worry, would you recommend that I test to see how many parameters I can (tenably) constrain to equality across groups, such that I can (perhaps) get the number of free parameters under 49 (n of smallest group). Maybe I can impute the loadings and residuals for my latent variables by obtaining them from a related CFA to get the number of free parms down as well? (I already know that measurement invariance (loadings) is tenable across groups). Last thing  I am using the MLR estimator in this case which I believe to be the best for a small sample SEM with continuous indicators  Is this accurate, should I be using another estimator perhaps? Thanks much. James 


At a minimum you need several more observations in a group than you have parameters in the group. If your Likert variables have floor or ceiling effects, you should declare them as categorical. You can use either maximum likelihood or weighted least squares in this case. 


Thanks. When you say several more parameters "in a group" I assume you mean the number of parms estimated for that group and not overall (e.g. 80 parameters being estimated overall, but 40 parms for each group)?? Another option I have to increase N is to include more observations that have missing data on 2 of the 3 waves. But this makes it such that some of my indicators will have up to 50% missing data at waves 2 & 3. Can FIML handle this? Is there any indication, perhaps a reference, of how much missing data is allowable, or that FIML (or any other imputation method can handle? 


You need to compare the number of free parameters in a group to the number of observations in the group. You can see this in TECH1. The only way to really understand your data is to do a simulation study. To me 50% missing is not a good thing. You might want to consider a simpler model for these data. 


Thank you! 


Hello, I am estimating two SEM models similar in all respects exept in the final outcome (dependent) variable. In model 1 this variable is continuous. In Model 2 it is categorical (3x categ). By default, MPLUS estimates the first model with ML the second with WLSMV. I also know that the final outcome categorical variable (in model 2) has four missing cases. However, when I estimate the models, the differences in the number of observations between model 1 & 2 is 16 (not four). How is this possible? 


Please send the two outputs and your license number to support@statmodel.com. 

alia aishah posted on Wednesday, April 04, 2012  11:15 am



WHEN I RUN MY PATH ANALYSIS THIS MESSAGE CAME UP = THE MINIMUM COVARIANCE COVERAGE WAS NOT FULFILLED FOR ALL GROUPS. CATEGORICAL VARIABLE ATLEASTO HAS ZERO OBSERVATIONS IN CATEGORY 1. I noticed that mplus output showed that my outcome variable had 4 categories but in truth it is a binary variable. what do i do? also, i used the USEOBS command which means i only used a subsample with no missing data, but the output said that there were missing dat for x, and the observations noted in the output did not tally with my subset. how do i solve this, am i missing something 


It sounds like you are not reading your data correctly. This could be caused by blanks in the data set. If you can't figure it out, please send your input, data, output, and license number to support@statmodel.com. 

Chie Kotake posted on Tuesday, April 29, 2014  7:01 pm



Hi! I'm new to MPlus. I'm doing a multi group analysis with 5 groups. Unfortunately, the 2 of the groups are quite small (24 and 27). I am trying to test for configural invariance for a model that includes one construct (with 4indicators) and a manifest outcome variable. I'm testing each group separately to see if model actually works, and for the group of 24 participants, I get the error of standard error not being trustworthy. From reading the forums, it can be due to my small sample size? Even after fixing the problematic parameter, I keep getting an error. 1). Am I correct in the issue is my small sample size? The model actually works with my other group with 40. 2) Is there a way to use this group and still make the multi group analysis to work? Thank you! 


Your samples are very small. Please send the output with the error and your license number to support@statmodel.com. 


I am using the MPlus 7.3 demo version to run the code in Muthén, L.K. & Muthén, B.O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 4, 599620. However, I am getting error messages. For example, gclasses in the original code didn't work. So I changed it to genclasses, but that didn't work either. Also, I am unable to find the complete outputs from the above study on the website. Thanks in advance for your help! 


What error message do you get. GENCLASSES is the option name. 


Here is the error message: *** ERROR in MONTECARLO command The number of classes must be specified with each categorical latent variable in GENCLASSES option. 1 


The GENCLASSES option is specified GENCLASSES  c1 (2) c2 (3) c3 (4); It sounds like you are not doing that. 


Hello, Thank you for your previous response on GENCLASSES. I have another question that is more procedural related. In general, I am interested in sample size determination, so I ran the syntax for Example 12.5 in the User's Guide, but with a smaller sample size (N=50). This resulted in error messages. I increased the sample size incrementally until there were no more error messages in the Tech9 output (at N=350). From there, I checked bias, CI, and power. Is this an appropriate strategy? 


Well, it doesn't help you if you want to know about n=200, say. Perhaps the parameter value choices were too difficult to handle at smaller sample sizes. 

Cheng posted on Sunday, May 31, 2015  12:20 am



I am using Monte Carlo simulation to estimate sample size for second order CFA model. Anything wrong with this input? The chisquare test, the expected value is very large different from the observed (0.050 vs 0.000). The %Sig Coeff under “model result”, all are 0.0000. MODEL POPULATION: [X1X19@0]; F1 BY X1X7@0.75; F2 BY X8X12@0.75; F3 BY X13X15@0.75; F4 BY X16X19@0.75; Stress BY F1@0.75 F3@0.75; Recovery BY F2@0.75 F4@0.75; F1F4@1; Stress@1; Recovery@1; X1X19@0.30; Stress WITH Recovery@0.60; MODEL: [X1X19*0]; F1 BY X1X7*0.75; F2 BY X8X12*0.75; F3 BY X13X15*0.75; F4 BY X16X19*0.75; Stress BY F1*0.75 F3*0.75; Recovery BY F2*0.75 F4*0.75; F1F4*1; Stress*1; Recovery*1; X1X19*0.30; Stress WITH Recovery*0.60; OUTPUT: TECH9; 


Please send the output and your license number to support@statmodel.com. 

cecilia posted on Monday, February 05, 2018  12:25 pm



Dear, I commonly use SEM (or path) in the framework of psychological or social problems. Now, a colleague who works in Psychobiology showed me some papers were path analysis is used with really very small samples (n < 20) (see e.g. https://www.ncbi.nlm.nih.gov/pubmed/24933661) I have also see other applications of SEM with small samples: https://www.ncbi.nlm.nih.gov/pubmed/9408041 https://www.ncbi.nlm.nih.gov/pubmed/1884204 I would like knowing what do you think and suggest about that types of applications. Thanks! 


If you don't have repeated measures, N=20 samples don't sound like they could be very useful for SEM analysis; for one, there can't be much power to reject models. Check with SEMNET as well. 

Jinxin ZHU posted on Thursday, June 14, 2018  7:05 pm



Is there any rule of thumb about the requirement of sample size for path analysis (no latent trait was specified)? For instance, I have a very complex model with 150 parameters to be estimated, whereas each dependent variable has only 3 independent variables. The sample size is 200. The standard error seems fine. I am wondering whether it is problematic to have so many parameters with such a small sample size. 


It depends too much on the specific model. You can do a Monte Carlo simulation to get a better feel for it. See our book Regression and Mediation Analysis using Mplus, Chapter 3. 

Xu, Man posted on Wednesday, July 11, 2018  11:36 am



Dear Dr.s Muthen, I am trying SEM approach on a small sample (n=100) for an experimental study. There are a few latent variables each with 3 items, so the total number of parameters approaches the number of sample size. However the model converges and fit is fine. Now the issue is, I know there are some rules of thumbs for sample size such as 5 cases per parameter but there also seem to be some papers indicating sample size is not so much an issue for parameter estimation as for overall fit indices. Also, a quick monte carlo simulation analysis does not seem to indicate severe power problems. Another approach may be to estimate factor scores for each factor separately, then use them in a path analysis to estimate the experimental effect. However, do you think it is generally acceptable if one just go for the full SEM approach? Thanks for your suggestions! Kate 


Seems ok to do the full SEM if you can show a relevant Monte Carlo run that supports good estimates. I assume the SEs won't be great and the fit performance worse. 

Back to top 