Dan Bauer posted on Friday, August 11, 2000  8:34 am



I'm interested in running a simulation study to see how well mixture models will recover the structure of a population under various conditions. It's not clear to me whether MPlus can generate data for this purpose. My first thought is to (1) Write a Monte Carlo statement that simulates data from multiple groups; and (2) Use the Analysis and Model commands to specify a mixture model. Would that be correct? Is there a more direct way to do this? 


Mplus does not currently do Monte Carlo for mixtures. You can generate data outside the program. Or, you can do what you suggest  generate data from multiple groups in which case the Monte Carlo option allows you to print out the data from the first replication. In either case, Mplus will have to be run separately for each data set. 


I have conducted a similar study using MECOSA, and the next phase of the study will be to compare those results with those generated by Mplus. I have submitted a paper for publication, but if you are interested in a copy, let me know. 

Dan Bauer posted on Tuesday, May 15, 2001  7:44 am



I'm trying to use your RUNALL utility to work with simulated data files that I generated externally. I followed the instructions given for setting the environment variables in RUNSTART.DAT, and compared my setup to the example you provided. Everything looks okay, but when I run the utility from the command prompt I get the error messages: 'RUNONE.BAT' is not recognized as an internal or external command, operable program or batch file. (repeated 100 times, once for each data file) 'RUNEND.BAT' is not recognized as an internal or external command, operable program or batch file. (once at the end) I didn't modify either RUNONE or RUNEND, some I'm puzzled that it doesn't recognize them. I'd appreciate any suggestions you have for solving this problem. Thanks in advance. 

Thuy Nguyen posted on Wednesday, May 16, 2001  9:56 am



It sounds like RUNONE.BAT and RUNEND.BAT are not located in a directory that is listed in your PATH environment. In an MSDOS window, type SET to see all the settings for all environment variables. The directory containing Mplus should be listed in your PATH environment. This directory should also be the directory in which you should place all your RUN*.BAT files. If this doesn't help, can you give us more information about the all directories involved, ie. where Mplus is installed, if that directory is in the PATH environment, where the RUN*.BAT files are located, the directory you are running RUNALL in, etc? 

Hanno Petras posted on Wednesday, October 23, 2002  2:43 pm



Dear Linda & Bengt, I am conducting a Monte Carlo simulation to determine the Power to detect a class specific intervention impact on the slope in a three class growth mixture model. While I learned how to determine the effect size of an intervention impact on a distal outcome in a growth mixture model, I am unclear about the effect size of the intervention impact on a class specific slope. Any hints would be greatly appreciated. Best, Hanno 

bmuthen posted on Wednesday, October 23, 2002  5:50 pm



That's a good question. The MuthenCurran (1997) article in Psych Methods struggled with the same thing. One can certainly define effect size of a slope by dividing the slope mean difference with the slope SD. But this doesn't get to the tangible observeddata level. So in MC we settled on how the slope mean difference affected the outcome mean difference divided by its SD. This then calls for deciding on the time point that you want to evaluate power for. 

Hanno Petras posted on Thursday, October 24, 2002  12:57 pm



Thank you for the response Bengt. Given the class specific slope and the intervention impact on that slope, it is fairly easy to draw the two trajectory (control, intervention) and compute the differences between the time specific means. However, I am not so sure which SD to use. The MC output provides the SD for the slope as well as the intervention impact on the slope. My guess would be to use the SD of the intervention impact on the slope. In my example, the mean difference at the last time point is 0.5 and the SD of the intervention impact on the slope is 0.1329. Since I am simulating data, I do not really know the overall SD for the group in that class. Dividing the mean difference by the slope equals 3.76. Based on Cohen's measure of nonoverlap (U) this would be a large effect. Is this correct? I am looking forward to your response. Best, Hanno 

bmuthen posted on Friday, October 25, 2002  9:30 am



To get the modelestimated SD for time specific outcome means, you should look at the formula for how the model estimates this outcome and get the SD from that formula. For example, at time 1 where y1 = i + e1, the situation is simple becasue y1 is not influenced by the slope. Here the modelestimated variance is V(i) + V(e). For later time points, you need to involve the slope(s) which may covary with the intercept. A simple, approx., approach is to just use the sample SD for the time point. 

Anonymous posted on Wednesday, September 29, 2004  11:38 pm



I am using Mplus 3 to conduct a simulation study on mixture modeling. Is it possible to specify the percentages for the mixtures? Say 70% for the 1st mixture and 30% for the 2nd mixture. Thanks! 

bmuthen posted on Thursday, September 30, 2004  10:57 am



You give logit starting values that correspond to the proportions that you want. As explained in Chapter 13 of the User's Guide, the logit is the log of the ratio of the two proportions. 

Anonymous posted on Sunday, October 10, 2004  2:15 am



I am trying to conduct a simulation study on mixture modeling with Mplus 3. Although I know that it is possible to generate data with different mixtures, I want to generate data based on multiple groups and then test the data with mixture models. Mplus complained that mixture models are not allowed with multiplegroup analysis. Are there any ways to handle this issue? A second question is that Mplus save data from the first replication only. Is it possible to save data from all replications for other further analysis? Thanks in advance! 


Multiple group analysis with mixture models is carried out using the KNOWNCLASS option. Examples 7.21 and 8.8 show how this option is used. The Monte Carlo counterpart of these examples which comes with Version 3 show how to generate data for these examples. You can save data from all replications in Version 3. See the REPSAVE option of the MONTECARLO command. 

Anonymous posted on Friday, October 15, 2004  10:56 pm



Thanks a lot for the reply. Could you tell me what are the differences between mixture modeling with known classes vs. multiplegroup analysis? Or could you suggest some references in this topic? 

bmuthen posted on Saturday, October 16, 2004  7:56 am



They amount to exactly the same analysis. Knowing the class membership is like observing group membership. It is just that in the mixture context, multiplegroup analysis is arranged via a KNOWNCLASS approach. 


Sorry to bother you again. I have a similar question to the one that has been asked here on Wednesday, September 29, 2004  11:38 pm. I conducted a simulation study on a 3 class LCA model. I took the parameter estimates of my model as population values using the POPULATION = estimates3.dat; option in the MONTECARLO command. However, this seemed to work only for the threshold parameters but not for the class proportions which where different in my MC model. How exactly can I fix the class proportions to equal the real data class proportions in the MC input? Thanks again. 


You give logit starting values to the intercepts/means of the categorical latent variable in the overall part of the model. %OVERALL% [c#1*0] would put 50 percent of the cases in each of the two classes. If you continue to have problems send your output to support@statmodel.com. 

Anonymous posted on Tuesday, December 14, 2004  10:04 am



I did a simulation study using Mplus 2.14. The reviewers asked that I update this using Mplus 3, given possible improvements in estimation/optimization. I complied with this request, only to find a lower rate of convergence in Mplus 3 than Mplus 2.14. For instance, where I had 100% convergence before, the rate dropped to 90% with Mplus 3. I used the exact same script files, except that I turned off the random starts in Mplus 3 so it would be estimating from the same start values. Have any of the defaults changed between versions (e.g., convergence criteria) or can you think of any other reason to explain these results? Thanks in advance for your help. 

bmuthen posted on Tuesday, December 14, 2004  10:39 am



Our experience is the opposite. Please send your MC input to support@statmodel.com. 

Anonymous posted on Tuesday, December 14, 2004  10:46 am



I generated the data myself rather than using the Monte Carlo feature in Mplus. 

bmuthen posted on Tuesday, December 14, 2004  10:49 am



In your Version 3 run you used the new "external MC" feature, right, where you submit several data replications for an MC run to get summaries across these replications? If so, please send the Mplus input for that. 


I'll just add my two cents here. If you are generating your own data, then it is not possible to do the same thing in 2.14 and 3 because you would have to use external Monte Carlo and that was not available in 2.14. So I suspect you have generated your data differently than in 2.14 where it was generated by Mplus and that this is the cause of your convergence problems. 

Anonymous posted on Tuesday, December 14, 2004  11:10 am



My mistake  I had omitted the STARTS command thinking this would turn off the random starts  didn't realize it would do this by default. Turns out that it was the random start routine that was causing the increased rate of nonconvergence. When I override this with STARTS=0 I once again get 100% convergence. Sorry for the trouble. 


I am trying to do a Monte Carlo study using the results of one research as population values. With this data, I found two classes. When I do the monte carlo study specifying two classes, it works well, However, when I try to specify 3 classes (again using results from an analysis of the same data with 3 classes as population values), the program seems to get stuck in one of the replications. That is, several replications go well enough, then suddenly, Mplus gets stuck. I tried changing the seed and putting more starting values (STARTS = 50 10) but neither worked. 


Please send the output that generated the data and the Monte Carlo output to support@statmodel.com. It is hard to say without seeing the details of the generation and subsequent analysis. 


I was able to solve my last problem alone. However, I have a new problem. Still doing the same Monte Carlo analysis, I used an input that worked for 1500 subjects to evaluate the model with 2000 subjects. This new input does not work even though it is almost exactly the same as the last one that worked perfectly well. It computes and then suddenly stops and no output appears in Mplus. When I open the output, it contains only two lines after the input: INPUT READING TERMINATED NORMALLY then the title I gave 


This is once again something where I would have to see the output that worked and the output that didn't work at support@statmodel.com. What you have experienced could happen for a variety of reasons. 


I am trying to simulate categorical variables with specific probability of occurence per category (I used the command CUTPOINTS with Z scores). I then want to indicate a specific model with continuous latent variables. When I use generate, I get error messages indicating that I cannot use this command for yvariable. When I use only cutpoints, I get the same message. Whether or not I use categorical does not seem to change anything. Is there a way to simulate categorical Y variables with specific endorsement probabilities and then specify a model to indicate the relation between these variables. Thank you in advance 


You should be able to do what you are trying to do. Please send your input, output, and license number to support@statmodel.com. 


Dear Dr. Muthen, I am writing an article using mixture modeling and doing Monte Carlo studies on the results of the application. One reviewer asked how we knew label switching was not a problem. One known solution to limit label switching is to constrain the classes so that the smallest (or the largest) is the first and so on to the last class. I would like to know if Mplus order the classes according to class size and, if yes, in what order. Thank you very much in advance. best regards, 


Mplus does not order the classes in any way. Label switching can be a problem. You need to give good starting values to avoid this. 

sallua posted on Wednesday, February 28, 2007  8:13 pm



What is the appropriate use of the STSTART value in simulation research? For example, should a different STSTART value be used for each replication within a condition or should the same STSTART value be used within the same condition or should the same STSTART value be used across all replications and conditions? thanks! 


I don't know of an STSTART option. 

sallua posted on Thursday, March 01, 2007  7:23 am



sorry  STSEED 


STSEED is not for Monte Carlo studies. The SEED option is. 


I need to conduct a Monte Carlo study to determine if my sample size is adequate. Is there a syntax for conducting Monte Carlo studies using LCA. I know that Nylund & Muthen conducted a Monte Carlo for LCA. Is the syntax they used available? 


All of the examples in Chapter 7 come with an input for their Monte Carlo counterpart. These are on the website and the Mplus CD. This is a good place to start for a Monte Carlo for an LCA. The files you allude to are not available. 


Hi, I am conducting a external simulation for an LCA. I created 500 datasets in other software, and then analyzed the datasets in Mplus. In the data statement, I used "type=monte carlo" to do the monte carlo study, and used "savedata: results is results.txt" to save the results. But it seems 498 datasets were computed successfully by Mplus, and Mplus just gave the results of 498 datasets. So I wonder if there is some way to determine which datasets were not computed successfully by Mplus. Thanks a lot! 


Ask for TECH9 in the OUTPUT command. 


Hi, I'm trying to conduct an external Monte Carlo study of FMA and LPA. Is it possible to obtain the information criteria corresponding to each individual data file and the LMR, and BLRT so that I can obtain hit rates for both methods? Thanks 


Yes, you can do this using the RESULTS option of the MONTECARLO command. 


The results option produces averages fit statistics and percentiles and proportions but no statistics for individual data. Also I can't seem to locate the SET ALL_RESULTS, SET ERROR_LOG,and the COMPLETED_LOG file that I specified in the runstart.bat of the runall utility. This is the input that I am using for the test run. TITLE: Factor Mixture Analysis on Dissertation Data DATA: FILE IS "C:\LVM\LVM Data\runalldata.inp"; TYPE = MONTECARLO; VARIABLE: NAMES ARE u1 u2 u3 u4; USEVARIABLES ARE u1 u2 u3 u4; CLASSES = c(2); ANALYSIS: TYPE = MIXTURE; STARTS = 100 20; STITERATIONS = 20; LRTBOOTSTRAP = 10; LRTSTARTS = 100 20 100 20; MODEL: %OVERALL% f BY u1u4; [f@0]; %c#1% f BY u1@1 u2u4; f; [u1u4]; %c#2% f BY u1@1 u2u4; f; [u1u4]; OUTPUT: TECH1 TECH2 TECH3 TECH8 TECH9 TECH11 TECH13 TECH14; SAVEDATA: RESULTS = disall1.txt; 


You are using external Monte Carlo which is a replacement of the RUNALL utility. So RUNALL is not needed. You should get results for each replication if you are using a current version of the program. You will not obtain TECH11 and TECH14 unless you use internal Monte Carlo. Please send further questions on this topic and your license number to support@statmodel.com. 

InHee Choi posted on Thursday, February 04, 2010  11:43 pm



I am doing a simulation study about the effects of DIF patterns on detecting latent classes. I have 27 conditions and in each condition 100 data sets have been generated using MONTECARLO command. And what I want to do is fitting the model with one, two and three latent classes using one data set of 100 (one by one) within the condition. So now I am looking for the way to run the multiple runs with the same model but different data. This is an example input file for the one latent class model in the first condition. DATA: FILE IS 11.dat; VARIABLE: NAMES ARE y1y30; USEVARIABLES ARE y1y30; CATEGORICAL=y1y30; ANALYSIS: ALGORITHM=INTEGRATION; MODEL: f BY y1y30@1; SAVEDATA: FILE IS gmem1_1_1.dat; SAVE=CPROB; ....through.... DATA: FILE IS 1100.dat; ... SAVEDATA: FILE IS gmem1_100_1.dat; SAVE=CPROB; I hope to find some way to let Mplus run all of the 100 runs with just one input file. Can I make some loop specifying the data set Mplus uses for each loop (as R?) (because except for the data set, other commands in input file are the same in each loop)? 


There's a DOS bat that is part of Web Note 10 that shows a way to do this. See Web Note 10. 


I have a general question regarding simulated data. I am attempting to simulate data under different conditions to analyze. Is there a way to simulate data with a certain level of skewness and heterogeneous data with classes/clusters a certain distance apart? Thank you. 


Nonnormality can be generated using mixtures as we did in the paper on our web site: Muthén, L.K. & Muthén, B.O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 4, 599620. You have to experiment to get the nonnormality you want. There are other ways as well, one being the "ValeMaurelli" approach, but that is not in Mplus. 


Thank you for the follow up. I will have a look at the paper. How about the distance between classes using Mahanalobis Distance or some other method. Is that possible? 


There have been a couple of mixture papers on that advocating at least 2 SD apart in the means for good estimation. I think some is in: Lubke, G. & Muthén, B. (2007). Performance of factor mixture models as a function of model size, covariate effects, and classspecific parameters. Structural Equation Modeling, 14(1), 26–47. 


Dear Dr. Muthen, I am very interested in studying mixture modeling after reading the study, "Performance of factor mixture models as a function of model size, covariate effects, and classspecific parameters". However, I am wondering how you calculate Mahalanobis distance using the formula provided in the article, especially, population parameter values for the part2 in the appendix. I am not clear how to use the formula for the part 2. For example, regression intercepts class 2 MDy=1.5, v=[.2 .2 .36 .4 .2 .2 .36 .4] covariate effect, MDx=0.5, rc=[.5] How can I calculate those things using the formula provided in the article? Thank you so much. 


Please contact the first author of the article with these questions. 


Hello, I have a question regarding simulating mixture data. I am having problems with one issue in particular. Is it possible to specify class separation between clusters in Mplus? If so, how? For example, if I want to simulate 3 classes 2SD apart, how do I specify such in Mplus? Where / how is this included in the syntax? Thank you. 


You would separate the classes by the population parameter values you give for the means or thresholds of the latent class indicators. All of the mixture examples in Chapters 7 and 8 come with a Monte Carlo counterpart where the data for the examples were generated. These would be a good place to start. 

QianLi Xue posted on Thursday, May 26, 2011  5:31 pm



Hello, I would like simulate a latent class model with 4 classes and 6 binary latent class indicators y1y6. What's wrong with the statements below? Help pls. MONTECARLO: NAMES ARE y1y6; NOBSERVATIONS = 300; NREPS = 500; SEED = 4533; GENERATE = y1y6 (1 p); CATEGORICAL ARE y1y6; GENCLASSES = c (4); Classes = c(4); Model Population: %overall% [c#1*1.944 c#2*1.476 c#3*1.695]; %c#1% [y1$1@1.325 y2$2@1.658 y3$1@1.735 y4$1@1.735 y5$1@1.516 y6$1@1.208]; %c#2% [y1$1@0.364 y2$2@0.364 y3$1@0 y4$1@0.754 y5$1@1.992 y6$1@3.892]; ... 


Please send the output and your license number to support@statmodel.com so I can see what error message you get. 

QianLi Xue posted on Thursday, May 26, 2011  5:50 pm



Never mind. I found the problem. The threshold level following y2$ should be 1 not 2, becuase all ys are binary. Here are two more question: 1) Do I need the statement: GENERATE = y1y6 (1 p); 2) I assume that the thresholds for generating the binary variables y1y6 are calculated internally using the latent class prevalence estimates given under %overall% and the logits of conditional probabilities given under %C#1%, %C#2%, etc. right? 


1. The statement should be either GENERATE = y1y6 (1 l); or GENERATE = y1y6 (1); With maximum likelihood, logistic regression is used. 2. The variables are generated using the population parameter values that you give in MODEL POPULATION. Almost all of the Mplus examples have a Monte Carlo counterpart that was used to generate the data for the examples. You should try to find an example that is close to what you want and use the Monte Carlo counterpart for that example as a starting point. 


Hello, How do you determine a reasonable number of starts? I am doing a FMM simulation study. I was not getting some results (results for more than one class) and when I ran a Tech9, I see I need more starts. The problem I am running 200 different models with 1000 datasets each. I do not want to use more starts than necessary due to the amount of time to run the models. I am really unclear on how to determine such. Thank you. 


I think this is just a matter of trial and error. You can start with 100 25 and if that is not enough try 200 50. The second number should be about 1/4th of the first. 


Thank you. A follow up question....should I ONLY use my own start values if the tech9 says I should use more starts (I am running FMM using MC simulations)? Also, what is the difference between starts 100 and starts 100 25, for example? Are the starts explained in the manual somewhere, I can't seem to find. 


Starting values and random starts are not the same thing. See the STARTS option in the user's guide for a full description of this option. The Version 6 index shows it is on pages 550551. 


Okay. Thank you. I understand the difference now. However, I still remain with the question...should I just use the default and only use my own if required by the tech9 output? 


If you don't replicate the best loglikelihood, you need to increase the number of random starts using the STARTS option. 


I am conducting a montecarlo study for LC model with categorical covariates and saved the results using the RESULT option in MONTECARLO. Can I read this RESULTS file/ output file using R to extract summary statistics from a montecarlo study? I understand that the results can be read for a nonsimulation study fitted in Mplus(using MplusAutomation). But how about results from an MC study 


Hello, Puneet, I am the developer of MplusAutomation and got your message. I recently finished an update to the package that reads the parameters from Monte Carlo output. Please run updatePackages and check that you have 0.43, which is the latest version. This should give you access to the parameters you mentioned. Please feel free to contact me directly if you have further troubles with Monte Carlo output following this update. Best wishes, Michael 


Hello, I would like to run an external Monte Carlo analysis for a correctly specified and misspecified mixture models. And I want to compare fit indices such as AIC, BIC, and aBIC across correctly and misspecified models to see whether fit indices perform well in supporting better fit for the correctly specified model than for the misspecified models. When I looked at the output of Monte Carlo analysis, only average fit statistics were provided. Is there any option to save each replication result to compare fit indices across replications? In other words, is there any way to know how many times fit indices select the correct model? Thank you in advance. 


Use the RESULTS option of the SAVEDATA command. 


I think I might be missing a simple point here  can you do a monte carlo simulation with no data (i.e. prior to collecting any data at all)? Or do you need pilot data of some sort? Thank you. 


You do not need data for a Monte Carlo simulation. The data are generated. However, you may need a pilot study to know the population parameter values to use for data generation. 

Unkyung No posted on Wednesday, September 11, 2013  10:43 pm



Hello, I want to know about setting the entropy values in MonteCarlo simulation study. "We also vary the values of alpha1c to obtain differenct entropy levels. Choosing alpha11=1, alpha12=1 yields entropy of 0.6. Choosing alpha11=2, alpha12=2 yields entropy of 0.85. Choosing alpha11=3, alpha12=3 yields entropy of 0.95." (Webnote 15, 15page). In these statements, alpha1c is related to entropy level.  Is there the equation to calculate the entropy values using alpha1c?  Is it possible to set accurately the entropy values? I'll try to set the entropy value (.5, .7, .9) in GMM with three classes. Please let me know the answer. Thank you!! 


There is no formula. This is done by trial and error. The farther apart the means of the classes are, the better the entropy. 

Unkyung No posted on Wednesday, October 02, 2013  3:22 am



Thank you. I have another question. As far as I know, entropy is related to sample size. In the example of my previous post(Webnote 15, 15page), if sample size is different from 5000, should the value of alpha11 and alpha12 be changed? The smaller the sample size, the larger the entropy. So, the means of the classes will be reset closer.Is it right? Thanks in advance. 


Entropy is not related to sample size. 

Unkyung No posted on Wednesday, October 02, 2013  7:10 pm



Ah.. I was misunderstanding. Thank you very much! 


Hi, I am working on a Monte Carlo study looking at the power of latent transition probabilities in LTA. I have two questions: 1) Simulation output doesn't include the estimates for the last column of transition probabilities in a probability matrix. Is there a way to get the estimates for the last column for the sake of understanding power for all transition probabilities in the matrix? 2) When reporting power of latent transition probabilities, would it be best to report all power values for each transition probability or can I average them to show the average power of latent transition probabilities for the overall model? 


You can use MODEL CONSTRAINT to compute the probability for the reference class. I would not average as the power is probably different for each. 

