Dan Bauer posted on Friday, August 11, 2000 - 8:34 am
I'm interested in running a simulation study to see how well mixture models will recover the structure of a population under various conditions. It's not clear to me whether MPlus can generate data for this purpose. My first thought is to (1) Write a Monte Carlo statement that simulates data from multiple groups; and (2) Use the Analysis and Model commands to specify a mixture model. Would that be correct? Is there a more direct way to do this?
Mplus does not currently do Monte Carlo for mixtures. You can generate data outside the program. Or, you can do what you suggest - generate data from multiple groups in which case the Monte Carlo option allows you to print out the data from the first replication. In either case, Mplus will have to be run separately for each data set.
I have conducted a similar study using MECOSA, and the next phase of the study will be to compare those results with those generated by Mplus.
I have submitted a paper for publication, but if you are interested in a copy, let me know.
Dan Bauer posted on Tuesday, May 15, 2001 - 7:44 am
I'm trying to use your RUNALL utility to work with simulated data files that I generated externally. I followed the instructions given for setting the environment variables in RUNSTART.DAT, and compared my setup to the example you provided. Everything looks okay, but when I run the utility from the command prompt I get the error messages: 'RUNONE.BAT' is not recognized as an internal or external command, operable program or batch file. (repeated 100 times, once for each data file) 'RUNEND.BAT' is not recognized as an internal or external command, operable program or batch file. (once at the end) I didn't modify either RUNONE or RUNEND, some I'm puzzled that it doesn't recognize them. I'd appreciate any suggestions you have for solving this problem. Thanks in advance.
Thuy Nguyen posted on Wednesday, May 16, 2001 - 9:56 am
It sounds like RUNONE.BAT and RUNEND.BAT are not located in a directory that is listed in your PATH environment. In an MS-DOS window, type SET to see all the settings for all environment variables. The directory containing Mplus should be listed in your PATH environment. This directory should also be the directory in which you should place all your RUN*.BAT files.
If this doesn't help, can you give us more information about the all directories involved, ie. where Mplus is installed, if that directory is in the PATH environment, where the RUN*.BAT files are located, the directory you are running RUNALL in, etc?
Hanno Petras posted on Wednesday, October 23, 2002 - 2:43 pm
Dear Linda & Bengt,
I am conducting a Monte Carlo simulation to determine the Power to detect a class specific intervention impact on the slope in a three class growth mixture model. While I learned how to determine the effect size of an intervention impact on a distal outcome in a growth mixture model, I am unclear about the effect size of the intervention impact on a class specific slope. Any hints would be greatly appreciated.
bmuthen posted on Wednesday, October 23, 2002 - 5:50 pm
That's a good question. The Muthen-Curran (1997) article in Psych Methods struggled with the same thing. One can certainly define effect size of a slope by dividing the slope mean difference with the slope SD. But this doesn't get to the tangible observed-data level. So in MC we settled on how the slope mean difference affected the outcome mean difference divided by its SD. This then calls for deciding on the time point that you want to evaluate power for.
Hanno Petras posted on Thursday, October 24, 2002 - 12:57 pm
Thank you for the response Bengt. Given the class specific slope and the intervention impact on that slope, it is fairly easy to draw the two trajectory (control, intervention) and compute the differences between the time specific means. However, I am not so sure which SD to use. The MC output provides the SD for the slope as well as the intervention impact on the slope. My guess would be to use the SD of the intervention impact on the slope. In my example, the mean difference at the last time point is 0.5 and the SD of the intervention impact on the slope is 0.1329. Since I am simulating data, I do not really know the overall SD for the group in that class. Dividing the mean difference by the slope equals 3.76. Based on Cohen's measure of nonoverlap (U) this would be a large effect. Is this correct? I am looking forward to your response.
bmuthen posted on Friday, October 25, 2002 - 9:30 am
To get the model-estimated SD for time specific outcome means, you should look at the formula for how the model estimates this outcome and get the SD from that formula.
For example, at time 1 where y1 = i + e1, the situation is simple becasue y1 is not influenced by the slope. Here the model-estimated variance is V(i) + V(e). For later time points, you need to involve the slope(s) which may covary with the intercept. A simple, approx., approach is to just use the sample SD for the time point.
Anonymous posted on Wednesday, September 29, 2004 - 11:38 pm
I am using Mplus 3 to conduct a simulation study on mixture modeling. Is it possible to specify the percentages for the mixtures? Say 70% for the 1st mixture and 30% for the 2nd mixture. Thanks!
bmuthen posted on Thursday, September 30, 2004 - 10:57 am
You give logit starting values that correspond to the proportions that you want. As explained in Chapter 13 of the User's Guide, the logit is the log of the ratio of the two proportions.
Anonymous posted on Sunday, October 10, 2004 - 2:15 am
I am trying to conduct a simulation study on mixture modeling with Mplus 3. Although I know that it is possible to generate data with different mixtures, I want to generate data based on multiple groups and then test the data with mixture models. Mplus complained that mixture models are not allowed with multiple-group analysis. Are there any ways to handle this issue?
A second question is that Mplus save data from the first replication only. Is it possible to save data from all replications for other further analysis? Thanks in advance!
Multiple group analysis with mixture models is carried out using the KNOWNCLASS option. Examples 7.21 and 8.8 show how this option is used. The Monte Carlo counterpart of these examples which comes with Version 3 show how to generate data for these examples.
You can save data from all replications in Version 3. See the REPSAVE option of the MONTECARLO command.
Anonymous posted on Friday, October 15, 2004 - 10:56 pm
Thanks a lot for the reply. Could you tell me what are the differences between mixture modeling with known classes vs. multiple-group analysis? Or could you suggest some references in this topic?
bmuthen posted on Saturday, October 16, 2004 - 7:56 am
They amount to exactly the same analysis. Knowing the class membership is like observing group membership. It is just that in the mixture context, multiple-group analysis is arranged via a KNOWNCLASS approach.
Sorry to bother you again. I have a similar question to the one that has been asked here on Wednesday, September 29, 2004 - 11:38 pm. I conducted a simulation study on a 3 class LCA model. I took the parameter estimates of my model as population values using the POPULATION = estimates3.dat; option in the MONTECARLO command. However, this seemed to work only for the threshold parameters but not for the class proportions which where different in my MC model. How exactly can I fix the class proportions to equal the real data class proportions in the MC input? Thanks again.
You give logit starting values to the intercepts/means of the categorical latent variable in the overall part of the model.
would put 50 percent of the cases in each of the two classes. If you continue to have problems send your output to email@example.com.
Anonymous posted on Tuesday, December 14, 2004 - 10:04 am
I did a simulation study using Mplus 2.14. The reviewers asked that I update this using Mplus 3, given possible improvements in estimation/optimization. I complied with this request, only to find a lower rate of convergence in Mplus 3 than Mplus 2.14. For instance, where I had 100% convergence before, the rate dropped to 90% with Mplus 3. I used the exact same script files, except that I turned off the random starts in Mplus 3 so it would be estimating from the same start values. Have any of the defaults changed between versions (e.g., convergence criteria) or can you think of any other reason to explain these results?
Thanks in advance for your help.
bmuthen posted on Tuesday, December 14, 2004 - 10:39 am
Anonymous posted on Tuesday, December 14, 2004 - 10:46 am
I generated the data myself rather than using the Monte Carlo feature in Mplus.
bmuthen posted on Tuesday, December 14, 2004 - 10:49 am
In your Version 3 run you used the new "external MC" feature, right, where you submit several data replications for an MC run to get summaries across these replications? If so, please send the Mplus input for that.
I'll just add my two cents here. If you are generating your own data, then it is not possible to do the same thing in 2.14 and 3 because you would have to use external Monte Carlo and that was not available in 2.14. So I suspect you have generated your data differently than in 2.14 where it was generated by Mplus and that this is the cause of your convergence problems.
Anonymous posted on Tuesday, December 14, 2004 - 11:10 am
My mistake -- I had omitted the STARTS command thinking this would turn off the random starts -- didn't realize it would do this by default. Turns out that it was the random start routine that was causing the increased rate of non-convergence. When I override this with STARTS=0 I once again get 100% convergence. Sorry for the trouble.
I am trying to do a Monte Carlo study using the results of one research as population values. With this data, I found two classes. When I do the monte carlo study specifying two classes, it works well, However, when I try to specify 3 classes (again using results from an analysis of the same data with 3 classes as population values), the program seems to get stuck in one of the replications. That is, several replications go well enough, then suddenly, Mplus gets stuck. I tried changing the seed and putting more starting values (STARTS = 50 10) but neither worked.
I was able to solve my last problem alone. However, I have a new problem. Still doing the same Monte Carlo analysis, I used an input that worked for 1500 subjects to evaluate the model with 2000 subjects. This new input does not work even though it is almost exactly the same as the last one that worked perfectly well. It computes and then suddenly stops and no output appears in Mplus. When I open the output, it contains only two lines after the input: INPUT READING TERMINATED NORMALLY then the title I gave
I am trying to simulate categorical variables with specific probability of occurence per category (I used the command CUTPOINTS with Z scores). I then want to indicate a specific model with continuous latent variables. When I use generate, I get error messages indicating that I cannot use this command for y-variable. When I use only cutpoints, I get the same message. Whether or not I use categorical does not seem to change anything. Is there a way to simulate categorical Y variables with specific endorsement probabilities and then specify a model to indicate the relation between these variables. Thank you in advance
I am writing an article using mixture modeling and doing Monte Carlo studies on the results of the application. One reviewer asked how we knew label switching was not a problem. One known solution to limit label switching is to constrain the classes so that the smallest (or the largest) is the first and so on to the last class. I would like to know if Mplus order the classes according to class size and, if yes, in what order. Thank you very much in advance.
Mplus does not order the classes in any way. Label switching can be a problem. You need to give good starting values to avoid this.
sallua posted on Wednesday, February 28, 2007 - 8:13 pm
What is the appropriate use of the STSTART value in simulation research? For example, should a different STSTART value be used for each replication within a condition or should the same STSTART value be used within the same condition or should the same STSTART value be used across all replications and conditions?
I need to conduct a Monte Carlo study to determine if my sample size is adequate. Is there a syntax for conducting Monte Carlo studies using LCA. I know that Nylund & Muthen conducted a Monte Carlo for LCA. Is the syntax they used available?
All of the examples in Chapter 7 come with an input for their Monte Carlo counterpart. These are on the website and the Mplus CD. This is a good place to start for a Monte Carlo for an LCA. The files you allude to are not available.
I am conducting a external simulation for an LCA. I created 500 datasets in other software, and then analyzed the datasets in Mplus. In the data statement, I used "type=monte carlo" to do the monte carlo study, and used "savedata: results is results.txt" to save the results. But it seems 498 datasets were computed successfully by Mplus, and Mplus just gave the results of 498 datasets. So I wonder if there is some way to determine which datasets were not computed successfully by Mplus.
I'm trying to conduct an external Monte Carlo study of FMA and LPA. Is it possible to obtain the information criteria corresponding to each individual data file and the LMR, and BLRT so that I can obtain hit rates for both methods?
The results option produces averages fit statistics and percentiles and proportions but no statistics for individual data. Also I can't seem to locate the SET ALL_RESULTS, SET ERROR_LOG,and the COMPLETED_LOG file that I specified in the runstart.bat of the runall utility. This is the input that I am using for the test run.
TITLE: Factor Mixture Analysis on Dissertation Data
DATA: FILE IS "C:\LVM\LVM Data\runalldata.inp"; TYPE = MONTECARLO;
VARIABLE: NAMES ARE u1 u2 u3 u4; USEVARIABLES ARE u1 u2 u3 u4; CLASSES = c(2);
ANALYSIS: TYPE = MIXTURE; STARTS = 100 20; STITERATIONS = 20; LRTBOOTSTRAP = 10; LRTSTARTS = 100 20 100 20; MODEL: %OVERALL% f BY u1-u4; [f@0]; %c#1% f BY u1@1 u2-u4; f; [u1-u4]; %c#2% f BY u1@1 u2-u4; f; [u1-u4]; OUTPUT: TECH1 TECH2 TECH3 TECH8 TECH9 TECH11 TECH13 TECH14;
You are using external Monte Carlo which is a replacement of the RUNALL utility. So RUNALL is not needed. You should get results for each replication if you are using a current version of the program. You will not obtain TECH11 and TECH14 unless you use internal Monte Carlo. Please send further questions on this topic and your license number to firstname.lastname@example.org.
In-Hee Choi posted on Thursday, February 04, 2010 - 11:43 pm
I am doing a simulation study about the effects of DIF patterns on detecting latent classes. I have 27 conditions and in each condition 100 data sets have been generated using MONTECARLO command. And what I want to do is fitting the model with one, two and three latent classes using one data set of 100 (one by one) within the condition. So now I am looking for the way to run the multiple runs with the same model but different data.
This is an example input file for the one latent class model in the first condition.
DATA: FILE IS 11.dat;
VARIABLE: NAMES ARE y1-y30; USEVARIABLES ARE y1-y30; CATEGORICAL=y1-y30;
ANALYSIS: ALGORITHM=INTEGRATION; MODEL: f BY y1-y30@1; SAVEDATA: FILE IS gmem1_1_1.dat; SAVE=CPROB;
DATA: FILE IS 1100.dat; ... SAVEDATA: FILE IS gmem1_100_1.dat; SAVE=CPROB;
I hope to find some way to let Mplus run all of the 100 runs with just one input file. Can I make some loop specifying the data set Mplus uses for each loop (as R?) (because except for the data set, other commands in input file are the same in each loop)?
I have a general question regarding simulated data. I am attempting to simulate data under different conditions to analyze. Is there a way to simulate data with a certain level of skewness and heterogeneous data with classes/clusters a certain distance apart?
Dear Dr. Muthen, I am very interested in studying mixture modeling after reading the study, "Performance of factor mixture models as a function of model size, covariate effects, and class-specific parameters". However, I am wondering how you calculate Mahalanobis distance using the formula provided in the article, especially, population parameter values for the part2 in the appendix. I am not clear how to use the formula for the part 2. For example, regression intercepts class 2 MDy=1.5, v=[.2 -.2 .36 -.4 .2 -.2 .36 -.4] covariate effect, MDx=0.5, rc=[.5]
How can I calculate those things using the formula provided in the article?
Hello, I have a question regarding simulating mixture data. I am having problems with one issue in particular. Is it possible to specify class separation between clusters in Mplus? If so, how? For example, if I want to simulate 3 classes 2SD apart, how do I specify such in Mplus? Where / how is this included in the syntax? Thank you.
You would separate the classes by the population parameter values you give for the means or thresholds of the latent class indicators. All of the mixture examples in Chapters 7 and 8 come with a Monte Carlo counterpart where the data for the examples were generated. These would be a good place to start.
QianLi Xue posted on Thursday, May 26, 2011 - 5:31 pm
Hello, I would like simulate a latent class model with 4 classes and 6 binary latent class indicators y1-y6. What's wrong with the statements below? Help pls.
MONTECARLO: NAMES ARE y1-y6; NOBSERVATIONS = 300; NREPS = 500; SEED = 4533; GENERATE = y1-y6 (1 p); CATEGORICAL ARE y1-y6; GENCLASSES = c (4); Classes = c(4);
Please send the output and your license number to email@example.com so I can see what error message you get.
QianLi Xue posted on Thursday, May 26, 2011 - 5:50 pm
Never mind. I found the problem. The threshold level following y2$ should be 1 not 2, becuase all ys are binary.
Here are two more question:
1) Do I need the statement: GENERATE = y1-y6 (1 p);
2) I assume that the thresholds for generating the binary variables y1-y6 are calculated internally using the latent class prevalence estimates given under %overall% and the logits of conditional probabilities given under %C#1%, %C#2%, etc. right?
With maximum likelihood, logistic regression is used.
2. The variables are generated using the population parameter values that you give in MODEL POPULATION. Almost all of the Mplus examples have a Monte Carlo counterpart that was used to generate the data for the examples. You should try to find an example that is close to what you want and use the Monte Carlo counterpart for that example as a starting point.
Hello, How do you determine a reasonable number of starts? I am doing a FMM simulation study. I was not getting some results (results for more than one class) and when I ran a Tech9, I see I need more starts. The problem I am running 200 different models with 1000 datasets each. I do not want to use more starts than necessary due to the amount of time to run the models. I am really unclear on how to determine such.
I am conducting a montecarlo study for LC model with categorical covariates and saved the results using the RESULT option in MONTECARLO. Can I read this RESULTS file/ output file using R to extract summary statistics from a montecarlo study? I understand that the results can be read for a non-simulation study fitted in Mplus(using MplusAutomation). But how about results from an MC study
I am the developer of MplusAutomation and got your message. I recently finished an update to the package that reads the parameters from Monte Carlo output. Please run updatePackages and check that you have 0.4-3, which is the latest version. This should give you access to the parameters you mentioned. Please feel free to contact me directly if you have further troubles with Monte Carlo output following this update.
Hello, I would like to run an external Monte Carlo analysis for a correctly specified and mis-specified mixture models. And I want to compare fit indices such as AIC, BIC, and aBIC across correctly and mis-specified models to see whether fit indices perform well in supporting better fit for the correctly specified model than for the mis-specified models. When I looked at the output of Monte Carlo analysis, only average fit statistics were provided. Is there any option to save each replication result to compare fit indices across replications? In other words, is there any way to know how many times fit indices select the correct model? Thank you in advance.
You do not need data for a Monte Carlo simulation. The data are generated. However, you may need a pilot study to know the population parameter values to use for data generation.
Unkyung No posted on Wednesday, September 11, 2013 - 10:43 pm
I want to know about setting the entropy values in MonteCarlo simulation study. "We also vary the values of alpha1c to obtain differenct entropy levels. Choosing alpha11=1, alpha12=-1 yields entropy of 0.6. Choosing alpha11=2, alpha12=-2 yields entropy of 0.85. Choosing alpha11=3, alpha12=-3 yields entropy of 0.95." (Webnote 15, 15page). In these statements, alpha1c is related to entropy level. - Is there the equation to calculate the entropy values using alpha1c? - Is it possible to set accurately the entropy values?
I'll try to set the entropy value (.5, .7, .9) in GMM with three classes. Please let me know the answer. Thank you!!
Hi, I am working on a Monte Carlo study looking at the power of latent transition probabilities in LTA. I have two questions:
1) Simulation output doesn't include the estimates for the last column of transition probabilities in a probability matrix. Is there a way to get the estimates for the last column for the sake of understanding power for all transition probabilities in the matrix?
2) When reporting power of latent transition probabilities, would it be best to report all power values for each transition probability or can I average them to show the average power of latent transition probabilities for the overall model?
I am running a power simulation for an LTA model with two time points, multiple (nine) covariates, and a distal outcome. As in MPlus Web Note 13, I want to vary the effect of the covariates on the latent variable at Time 2 (c2) for different classes of the latent variable at Time 1(c1).
However, when I try to implement this in my power simulation, similar to mcex8.14, I get the following error: "The following MODEL POPULATION statements are ignored: * Statements in Class %C1#1% of MODEL C1: C2#1 on X2 ... C2#1 on X9 * Statements in Class %C1#2% of MODEL C1: C2#1 on X2 ... C2#1 on X9
I am using MPlus Version 7.11. Please let me know if you have any ideas on how to run a Monte Carlo simulation for an LTA model with multiple covariates while taking into account the interaction of the covariates between c1 and their influence on c2. Thanks for your help.
Thanks for your prompt reply. I really just wanted to know if there was a way to extend Monte Carlo example 8.14 (MC simulation for LTA with a continuous covariate) to LTA models with multiple covariates. Sorry for the confusion and thanks for understanding.
I am analyzing Army data sets as a government contractor and am using MPlus in a virtual enclave, so I don't have a license number to share. However, I can provide a snapshot of my LTA Monte Carlo code:
TITLE: Monte Carlo simulation for LTA with 9 covariates
Mplus allows regression of c2 on all covariates x1-x9. To find out why your setup gets stopped we need to see your full output and that you have a support contract.
Peng qian posted on Monday, October 31, 2016 - 1:34 am
Firstly, sorry to trouble!
I want to compute the correct classification rates(CCRs) in LTA simulations. I have three questions:
1)How mplus simulate true latent classes for each person in LCA?
I can find the latent class for each person based on the estimated model in the generate data file. However,no matter in the input or output file,I cannot find signs to simulate the true latent class for each person in mixture models.
2)How mplus simulate true latent classes for each person at each time in LTA?
3)In addition,I think it is meaningful to compute the first CCRs,but I wonder it is meaningful to compute CCRs in the other times or all times as a whole.
In the LTA ,I can set the class proportions for the first time,however for the other times,the proportions of each classes are computed by the transtion probabilities and the first time class proportions.
Look at the Monte Carlo versions of each mixture example. For instance, for ex7.3 you can use the lines
nrep = 100; repsave = all; save = ex7.3rep*.dat;
and you will find the true (generated) class membership of each subject shown in the last column of the saved data.
I think it is meaningful for all time points.
Peng qian posted on Tuesday, November 01, 2016 - 12:51 am
Dr. Muthen, Thanks for your helpful and prompt reply. Thank you very much,I think I mistook the generated class membership of each subject for the estimated data.
I have another three questions: 1): I want to understand deeply how mplus to simulate the class membership of each subject?
In the simulation of the LCA model, I can set the numbers of subjects, replications, items, item thresholds, but I do not know how to simulate the class membership of each subject in the mplus?
2-1): Can I get the model estimated data results of each subject in the Monte Carlo study at the same time?
*** ERROR in SAVEDATA command The SAVEDATA command is not available in conjunction with the MONTECARLO commnand. Use the SAVE and RESULTS options in the MONTECARLO command as desired.
2-2): If 2-1) not, that is to say, I have to estimate each generated data in mplus. I want to know if there is some way to use all the generated data(eg, nrep = 100;) in one input file to get the all estimated data , just like in the simulation( repsave = all; save = ex7.3rep*.dat;).
3): I want to computer the mean absolute parameter recovery deviations(MADs),but I just get one output files. Is it the last replication results or the average results as a whole? How can I get the each output file for all replications.
you choose the [c#1] value (assuming 2 classes) and this gives the probability of being in the first class. Then you do a random draw (Mplus does) and a person gets put in one of the two classes (as a function of that probability).
3) You would have to do External Monte Carlo (see UG Monte Carlo chapter 12 Step 1 and Step 2 examples) and then run each generated data set at a time (like you do real data), saving the information you want. Summaries can be done using "MplusAutomation" - see our website.
Ni Yuhan posted on Tuesday, November 15, 2016 - 1:49 am
Dear Dr. Muthen,
I'm confusing that how to define a correlation between two categorical indicators in LCA Montecarlo.
Assuming there are 5 categorical indicators y1-y5, here is my code:
f1 by y1-y2 @ 0.4;
f2 by y3-y5 @ 0.4;
Does this mean that there is a correlation between y1 and y2, and there are correlations among y3-y5?
Those zero logits are correct. But you have weak class separation in that the variable means are only about half a standard deviation apart. This means that there may be class switching over the replications, particularly when the sample is relatively small as it is here. We recommend the means to be at least 2 SDs apart.