Sample size PreviousNext
Mplus Discussion > Missing Data Modeling >
 Daniel posted on Wednesday, February 16, 2005 - 10:24 am
I'm running an analysis type=meanstructure, with option "missing" added, and some covariates. What determines the sample size? I ask this question because my sample size is inflated above the true sample size. My four repeated measures have sample sizes of s9,n=1115; s10,n=1068; s11,n=1043; s12, n=1002. However, my sample with the option "missing" is 1133.
 Linda K. Muthen posted on Wednesday, February 16, 2005 - 10:40 am
The sample size should be the total number of observations. I would have to see the output and data to understand what is going on. Please send them to You may be reading your data incorrectly.
 Linda K. Muthen posted on Wednesday, February 16, 2005 - 12:06 pm
The data set you sent has 1143 observations. Seven cases were eliminated because all variables to be used in the analysis had missing data. This results in 1136 cases being used in the analysis. You have 26 variable names in the NAMES statement and 27 variables in the data set. Perhaps you are not using the data that you mean to be using.
 Anonymous posted on Monday, May 30, 2005 - 8:17 pm
If I have non-normal data but a very large sample size (>9000) am I ok if using MLE?
I found that: "GLS (generalized least squares) is the second most popular method after MLE. GLS works well for large samples (n>2500) even for non-normal data." Thank you:)
 bmuthen posted on Monday, May 30, 2005 - 8:25 pm
I answered the first part of the question earlier. Conventional GLS is not robust to non-normality. The so called ADF version of GLS is robust to non-normality and does need very large samples for this robustness to come into effect. "ADF" is obtained using the Mplus WLS estimator with continuous outcomes.
 Anonymous posted on Friday, July 15, 2005 - 1:23 pm
I have seen references to 10/1 and 5/1 ratio guidelines for "sample size" adequacy in SEM (among other discussion of the issue). Is the reference referring to sample size/# parameters in the measurement and structural models or degrees of freedom/# parameters in the measurement and structural models, or some other ratio?

For example, suppose a paper is using a sample of 180 and is estimating a model with 25 indicators of 6 latent variables in the structural model. If the output indicates 80 parameters are being estimated, is the relevant ratio 180/80 = 2.25 or 325/80 = 4.06, where 325 = (25*26)/2.

Do you have a good reference with a straightforward discussion of this issue?

Thank you.
 Linda K. Muthen posted on Friday, July 15, 2005 - 6:32 pm
I believe these refer to the number of observations per parameter in the model. So for a model with 80 parameters, using 10 observations per parameter would require 800 observations. I don't think these rules of thumb have been studied extensively and probably don't give a very good estimate of the necessary sample size becuase this depends on the model and the data. In the following paper, Bengt and I suggest a way to determine sample size using a Monte Carlo study that is tailored to your model and data:

Muthén, L.K. & Muthén, B.O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 4, 599-620.

This paper is available on the Mplus website.
 Carillon J Skrzynski posted on Saturday, October 22, 2016 - 8:21 am

I have a dataset I'm running analyses on where only 660 people out of ~710 have information on the outcome variable, but there are 664 observations reported in the output. I believe this is possible due to the default in Mplus to estimate the model under missing data theory using all available data (which is stated throughout the manual), but I was wondering where I could find a more detailed explanation of this for reporting purposes?

Thanks so much,
 Bengt O. Muthen posted on Saturday, October 22, 2016 - 8:37 am
A detailed explanation is given in Chapter 10 of our new book:

To explain why you get 664 in your case, send the output from your run and a Type=Basic run for those variables to Support along with your license number.
 Lauren Molloy Elreda posted on Tuesday, September 05, 2017 - 10:03 am

I used USEOBSERVATIONS to select cases for an MLM analysis in MPlus (so as to only include consented students, only include classrooms with 70% participation rates, and only include classrooms with at least one participating English language learner student).

Here is the useobservations syntax that I used, in case it is helpful:

USEOBSERVATIONS ARE consent eq 1 and fsgte70 eq 1 and numELp1 gt 0 and numELp3 gt 0;

The MPlus output shows the resulting sample for this analysis is 501. However, when I use these exact same selection criteria in SPSS and SAS, the sample size meeting these criteria is only 421. Could you help me understand why this discrepancy might be happening?

Thanks in advance!
 Bengt O. Muthen posted on Tuesday, September 05, 2017 - 1:44 pm
Please send a pdf of the SAS output, your data, and your Mplus output to Mplus Support along with your license number.
 Anshuman Sharma posted on Monday, September 10, 2018 - 12:26 am
Hi Dr Muthen,

This is regarding your paper Muthén, L.K. & Muthén, B.O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 4, 599-620. The paper is beautifully written and very informative.

I have the following questions:
1) Can I calculate power if I have only one factor in CFA? As mentioned in the paper "The focus of the power investigation in the CFA model is the factor correlation."

2)How to select the population values in MC simulation if there is not enough or no literature to provide population values? Any reference regarding this would be of great help.

Thanks and regards,
 Bengt O. Muthen posted on Monday, September 10, 2018 - 2:47 pm
1) You need to first decide which parameter the power is for: The power to reject that a parameter is zero.

2) Draw on real-data analyses.
 Anshuman Sharma posted on Monday, September 10, 2018 - 6:11 pm
Thank you so much Dr Muthen.

I am estimating a SEM model with small sample size (70).

Could you please suggest how can I ensure that the estimated parameters
are robust and reliable?

Please note that I am estimating simple models e.g., one latent factor
with 5 indicators and the latent factor is regressed on one predictor say age.
 Bengt O. Muthen posted on Tuesday, September 11, 2018 - 2:43 pm
One way is to do a Monte Carlo study for the particular model and sample size that you have. See our UG examples which have Monte Carlo counterparts on our web site. See also the Muthen & Muthen 202 article on MC on our website.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message