Hello, I have been working with very large government databases, which include treatment episodes for 100,000 individuals over many timepoints. I am interested in possibly using Mplus to explore trajectories of treatment utilization over time in such a large database. Can Mplus handle databases of this size?
The number of observations will not pose a problem. How many time points do you want to model?
Kurt Beron posted on Wednesday, September 08, 2010 - 12:45 pm
I did not see other posts related to this and so wanted to ask a related question. I have a data set that has approximately 2 million observations spread over, depending on how they are cut, 8 or so time periods, AND where these observations are themselves nested within a grouping structure of somewhere around 8,000 groups.
I actually have been able to use Stata to work with these but would like to use some of Mplus's features (ideally a multilevel SEM, but realistically some subset of this).
Is this possible with the current version and, of course, the right Windows configuration (I assume no server software is available?)?
There is no limit, except computing time in cases where raw data is needed in the iterations. The Monte Carlo simulation example below took only 28 seconds and running it as "real" data took 2.5 minutes. You can try your own hypothetical model this way.
An Mplus Linux version is forthcoming this fall.
montecarlo: names are y1-y8; nobservations = 2000000; ncsizes = 1; csizes = 8000(250); save = ex9.1.dat;
ANALYSIS: TYPE = TWOLEVEL;
model population: %within% e by y1-y8*1; e@1; y1-y8*1; %between% eb by y1-y8*1; eb@1; y1-y8*1;
model: %within% e by y1-y8*1; e@1; y1-y8*1; %between% eb by y1-y8*1; eb@1; y1-y8*1;