Message/Author 

Daniel posted on Wednesday, February 16, 2005  10:24 am



Hi, I'm running an analysis type=meanstructure, with option "missing" added, and some covariates. What determines the sample size? I ask this question because my sample size is inflated above the true sample size. My four repeated measures have sample sizes of s9,n=1115; s10,n=1068; s11,n=1043; s12, n=1002. However, my sample with the option "missing" is 1133. 


The sample size should be the total number of observations. I would have to see the output and data to understand what is going on. Please send them to support@statmodel.com. You may be reading your data incorrectly. 


The data set you sent has 1143 observations. Seven cases were eliminated because all variables to be used in the analysis had missing data. This results in 1136 cases being used in the analysis. You have 26 variable names in the NAMES statement and 27 variables in the data set. Perhaps you are not using the data that you mean to be using. 

Anonymous posted on Monday, May 30, 2005  8:17 pm



If I have nonnormal data but a very large sample size (>9000) am I ok if using MLE? I found that: "GLS (generalized least squares) is the second most popular method after MLE. GLS works well for large samples (n>2500) even for nonnormal data." Thank you 

bmuthen posted on Monday, May 30, 2005  8:25 pm



I answered the first part of the question earlier. Conventional GLS is not robust to nonnormality. The so called ADF version of GLS is robust to nonnormality and does need very large samples for this robustness to come into effect. "ADF" is obtained using the Mplus WLS estimator with continuous outcomes. 

Anonymous posted on Friday, July 15, 2005  1:23 pm



I have seen references to 10/1 and 5/1 ratio guidelines for "sample size" adequacy in SEM (among other discussion of the issue). Is the reference referring to sample size/# parameters in the measurement and structural models or degrees of freedom/# parameters in the measurement and structural models, or some other ratio? For example, suppose a paper is using a sample of 180 and is estimating a model with 25 indicators of 6 latent variables in the structural model. If the output indicates 80 parameters are being estimated, is the relevant ratio 180/80 = 2.25 or 325/80 = 4.06, where 325 = (25*26)/2. Do you have a good reference with a straightforward discussion of this issue? Thank you. 


I believe these refer to the number of observations per parameter in the model. So for a model with 80 parameters, using 10 observations per parameter would require 800 observations. I don't think these rules of thumb have been studied extensively and probably don't give a very good estimate of the necessary sample size becuase this depends on the model and the data. In the following paper, Bengt and I suggest a way to determine sample size using a Monte Carlo study that is tailored to your model and data: Muthén, L.K. & Muthén, B.O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 4, 599620. This paper is available on the Mplus website. 


Hello, I have a dataset I'm running analyses on where only 660 people out of ~710 have information on the outcome variable, but there are 664 observations reported in the output. I believe this is possible due to the default in Mplus to estimate the model under missing data theory using all available data (which is stated throughout the manual), but I was wondering where I could find a more detailed explanation of this for reporting purposes? Thanks so much, Cari 


A detailed explanation is given in Chapter 10 of our new book: http://www.statmodel.com/Mplus_Book.shtml To explain why you get 664 in your case, send the output from your run and a Type=Basic run for those variables to Support along with your license number. 


Hello, I used USEOBSERVATIONS to select cases for an MLM analysis in MPlus (so as to only include consented students, only include classrooms with 70% participation rates, and only include classrooms with at least one participating English language learner student). Here is the useobservations syntax that I used, in case it is helpful: USEOBSERVATIONS ARE consent eq 1 and fsgte70 eq 1 and numELp1 gt 0 and numELp3 gt 0; The MPlus output shows the resulting sample for this analysis is 501. However, when I use these exact same selection criteria in SPSS and SAS, the sample size meeting these criteria is only 421. Could you help me understand why this discrepancy might be happening? Thanks in advance! Lauren 


Please send a pdf of the SAS output, your data, and your Mplus output to Mplus Support along with your license number. 

Back to top 