

Data simuluation in Mplus 

Message/Author 

ffan posted on Wednesday, November 20, 2013  7:36 pm



Hi everyone, I am a new user of Mplus and I wonder if anybody is familiar with simulating data in Mplus and if you could recommend some resources for me to read for doing my project. The data I want to simulate is something like this. First I want to simulate item responses to 40 items, all multiple choices. Then I want to simulate 2 conditions. condition 1 (structural equivalence): For both groups, 36 items load on dimension 1 and 4 items load on dimension 2. condition 2(structural nonequivalence): 1st condition: for both groups, 36 items load on dimension 1 and 4 items load on dimension 2; 2nd condition: for group 1, 36 items load on dimension 1 and 4 items load on dimension 2 as in condition 1; For group 2, 36 items load on dimension 1 (among the 36 item, 2 items are from the original dimension 2) and 4 items load on dimension 2 (2 items are from the 36 items, and 2 are the original 2 out of 4). factor correlation: 0.8 Thank you! Fen 


Should be no problem doing that in Mplus. See the V7 User's Guide chapter 12 and also the Muthen & Muthen article from 2002 referred to in the UG. 

ffan posted on Monday, November 25, 2013  3:04 am



Thank you Bengt. I followed the example in the user guide v7, I have the following syntax run, but I have a few questions. MONTECARLO: NAMES ARE y1y40; NOBSERVATIONS = 2000; NREPS = 50; SEED = 4533; GENERATE = y1y40(1); SAVE = simulated_data_5.dat; MODEL POPULATION: f1 BY y1y36*.25; f2 BY y37y40*.5; f1f2*.5; f1 WITH f2*.8; y1y40*.5; 1.In model population, I found that the estimation starting values such as 0.25, 0.5 have influence on my factor correlation because if I choose different starting value, r=0.8 will not run and the output indicates the matrix is not positive definite. So I wonder if you could give me more information about the starting values and its effect on the data matrix or factor correlation? 2. So to specify multiple groups, what should I add in the commands? 3. If I want to specify factorindicator loadings for all 40 items, say f1y1, 0.3, f1y2, 0.5, etc, what commands should I use? Thank you very much. Fan 


1. The values given in MODEL POPULATION are not starting values, they are population parameter values. Please read Example 12.1. 2. See the Monte Carol counterpart for Example 5.16. 3. This is what the following does: f1 BY y1y36*.25; f2 BY y37y40*.5; 

ffan posted on Tuesday, November 26, 2013  3:49 am



Thank you Linda. The example you pointed out was helpful, but I am still not sure if I get everything correct and want to see if you could double check with me. Thank you very much. [f1f2@0]; f1f2@1; The above two lines specify the distribution of factor 1 and 2, which has mean of 0 and standard deviation of 1? f1 BY y1y36*.25; f2 BY y37y40*.5; The two lines are about how to specify factorindicator loadings? f1f2*.5; This is to specify variance of f1f2 to be 0.5? f1 WITH f2*.8; This is to specify covariance of f1 and f2 to be 0.8? y1y40*.5; This is to specify error variance of y1y40 to be 0.5? I still have several questions: 1. Except of fixing the 1st indicator of each factor to be @1, I assume we free all the other parameters therefore we use mostly * instead of @ in simulation model population? 2. What does f ON x1*1 x2*0.3 in the example mean? 3. As I am simulating data for MIRT model, I wonder how I can incorporate the value range of item parameters in the simulation? Say if I want to specify that b parameters range from 2 to 2, or c parameters range from 0 to 0.25, etc.? Thank you very much. Fan 


1. In MODEL POPULATION, * and @ have the same meaning. In the MODEL command, they do not. 2. It gives population parameter values for the regression coefficients. 3. We do not have c. You cannot give a range of values, only a single value for each parameter. 

Back to top 

