Mplus Discussion >> Monte Carlo Study

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Monte Carlo Study

Mplus Discussion > Latent Variable Mixture Modeling >

Message/Author

Dan Bauer posted on Friday, August 11, 2000 - 8:34 am

I'm interested in running a simulation study to see how well mixture models will recover the structure of a population under various conditions. It's not clear to me whether MPlus can generate data for this purpose. My first thought is to (1) Write a Monte Carlo statement that simulates data from multiple groups; and (2) Use the Analysis and Model commands to specify a mixture model. Would that be correct? Is there a more direct way to do this?

Bengt O. Muthen posted on Friday, August 11, 2000 - 4:33 pm

Mplus does not currently do Monte Carlo for mixtures. You can generate data outside the program. Or, you can do what you suggest - generate data from multiple groups in which case the Monte Carlo option allows you to print out the data from the first replication. In either case, Mplus will have to be run separately for each data set.

John Williams posted on Tuesday, November 07, 2000 - 1:00 pm

I have conducted a similar study using MECOSA, and
the next phase of the study will be to compare
those results with those generated by Mplus.

I have submitted a paper for publication, but if
you are interested in a copy, let me know.

Dan Bauer posted on Tuesday, May 15, 2001 - 7:44 am

I'm trying to use your RUNALL utility to work with simulated data files that I generated externally. I followed the instructions given for setting the environment variables in RUNSTART.DAT, and compared my setup to the example you provided. Everything looks okay, but when I run the utility from the command prompt I get the error messages:
'RUNONE.BAT' is not recognized as an internal or external command, operable program or batch file.
(repeated 100 times, once for each data file)
'RUNEND.BAT' is not recognized as an internal or external command, operable program or batch file.
(once at the end)
I didn't modify either RUNONE or RUNEND, some I'm puzzled that it doesn't recognize them. I'd appreciate any suggestions you have for solving this problem. Thanks in advance.

Thuy Nguyen posted on Wednesday, May 16, 2001 - 9:56 am

It sounds like RUNONE.BAT and RUNEND.BAT are not located in a directory that is listed in your PATH environment. In an MS-DOS window, type SET to see all the settings for all environment variables. The directory containing Mplus should be listed in your PATH environment. This directory should also be the directory in which you should place all your RUN*.BAT files.

If this doesn't help, can you give us more information about the all directories involved, ie. where Mplus is installed, if that directory is in the PATH environment, where the RUN*.BAT files are located, the directory you are running RUNALL in, etc?

Hanno Petras posted on Wednesday, October 23, 2002 - 2:43 pm

Dear Linda & Bengt,

I am conducting a Monte Carlo simulation to determine the Power to detect a class specific intervention impact on the slope in a three class growth mixture model. While I learned how to determine the effect size of an intervention impact on a distal outcome in a growth mixture model, I am unclear about the effect size of the intervention impact on a class specific slope. Any hints would be greatly appreciated.

Best,

Hanno

bmuthen posted on Wednesday, October 23, 2002 - 5:50 pm

That's a good question. The Muthen-Curran (1997) article in Psych Methods struggled with the same thing. One can certainly define effect size of a slope by dividing the slope mean difference with the slope SD. But this doesn't get to the tangible observed-data level. So in MC we settled on how the slope mean difference affected the outcome mean difference divided by its SD. This then calls for deciding on the time point that you want to evaluate power for.

Hanno Petras posted on Thursday, October 24, 2002 - 12:57 pm

Thank you for the response Bengt. Given the class specific slope and the intervention impact on that slope, it is fairly easy to draw the two trajectory (control, intervention) and compute the differences between the time specific means. However, I am not so sure which SD to use. The MC output provides the SD for the slope as well as the intervention impact on the slope. My guess would be to use the SD of the intervention impact on the slope. In my example, the mean difference at the last time point is 0.5 and the SD of the intervention impact on the slope is 0.1329. Since I am simulating data, I do not really know the overall SD for the group in that class. Dividing the mean difference by the slope equals 3.76. Based on Cohen's measure of nonoverlap (U) this would be a large effect. Is this correct? I am looking forward to your response.

Best,

Hanno

bmuthen posted on Friday, October 25, 2002 - 9:30 am

To get the model-estimated SD for time specific outcome means, you should look at the formula for how the model estimates this outcome and get the SD from that formula.

For example, at time 1 where y1 = i + e1, the situation is simple becasue y1 is not influenced by the slope. Here the model-estimated variance is V(i) + V(e). For later time points, you need to involve the slope(s) which may covary with the intercept. A simple, approx., approach is to just use the sample SD for the time point.

Anonymous posted on Wednesday, September 29, 2004 - 11:38 pm

I am using Mplus 3 to conduct a simulation study on mixture modeling. Is it possible to specify the percentages for the mixtures? Say 70% for the 1st mixture and 30% for the 2nd mixture. Thanks!

bmuthen posted on Thursday, September 30, 2004 - 10:57 am

You give logit starting values that correspond to the proportions that you want. As explained in Chapter 13 of the User's Guide, the logit is the log of the ratio of the two proportions.

Anonymous posted on Sunday, October 10, 2004 - 2:15 am

I am trying to conduct a simulation study on mixture modeling with Mplus 3. Although I know that it is possible to generate data with different mixtures, I want to generate data based on multiple groups and then test the data with mixture models. Mplus complained that mixture models are not allowed with multiple-group analysis. Are there any ways to handle this issue?

A second question is that Mplus save data from the first replication only. Is it possible to save data from all replications for other further analysis? Thanks in advance!

Linda K. Muthen posted on Tuesday, October 12, 2004 - 5:09 pm

Multiple group analysis with mixture models is carried out using the KNOWNCLASS option. Examples 7.21 and 8.8 show how this option is used. The Monte Carlo counterpart of these examples which comes with Version 3 show how to generate data for these examples.

You can save data from all replications in Version 3. See the REPSAVE option of the MONTECARLO command.

Anonymous posted on Friday, October 15, 2004 - 10:56 pm

Thanks a lot for the reply. Could you tell me what are the differences between mixture modeling with known classes vs. multiple-group analysis? Or could you suggest some references in this topic?

bmuthen posted on Saturday, October 16, 2004 - 7:56 am

They amount to exactly the same analysis. Knowing the class membership is like observing group membership. It is just that in the mixture context, multiple-group analysis is arranged via a KNOWNCLASS approach.

Christian Geiser posted on Wednesday, November 17, 2004 - 3:46 am

Sorry to bother you again. I have a similar question to the one that has been asked here on Wednesday, September 29, 2004 - 11:38 pm. I conducted a simulation study on a 3 class LCA model. I took the parameter estimates of my model as population values using the POPULATION = estimates3.dat; option in the MONTECARLO command. However, this seemed to work only for the threshold parameters but not for the class proportions which where different in my MC model. How exactly can I fix the class proportions to equal the real data class proportions in the MC input? Thanks again.

Linda K. Muthen posted on Wednesday, November 17, 2004 - 8:47 am

You give logit starting values to the intercepts/means of the categorical latent variable in the overall part of the model.

%OVERALL%

[c#1*0]

would put 50 percent of the cases in each of the two classes. If you continue to have problems send your output to support@statmodel.com.

Anonymous posted on Tuesday, December 14, 2004 - 10:04 am

I did a simulation study using Mplus 2.14. The reviewers asked that I update this using Mplus 3, given possible improvements in estimation/optimization. I complied with this request, only to find a lower rate of convergence in Mplus 3 than Mplus 2.14. For instance, where I had 100% convergence before, the rate dropped to 90% with Mplus 3. I used the exact same script files, except that I turned off the random starts in Mplus 3 so it would be estimating from the same start values. Have any of the defaults changed between versions (e.g., convergence criteria) or can you think of any other reason to explain these results?

Thanks in advance for your help.

bmuthen posted on Tuesday, December 14, 2004 - 10:39 am

Our experience is the opposite. Please send your MC input to support@statmodel.com.

Anonymous posted on Tuesday, December 14, 2004 - 10:46 am

I generated the data myself rather than using the Monte Carlo feature in Mplus.

bmuthen posted on Tuesday, December 14, 2004 - 10:49 am

In your Version 3 run you used the new "external MC" feature, right, where you submit several data replications for an MC run to get summaries across these replications? If so, please send the Mplus input for that.

Linda K. Muthen posted on Tuesday, December 14, 2004 - 10:58 am

I'll just add my two cents here. If you are generating your own data, then it is not possible to do the same thing in 2.14 and 3 because you would have to use external Monte Carlo and that was not available in 2.14. So I suspect you have generated your data differently than in 2.14 where it was generated by Mplus and that this is the cause of your convergence problems.

Anonymous posted on Tuesday, December 14, 2004 - 11:10 am

My mistake -- I had omitted the STARTS command thinking this would turn off the random starts -- didn't realize it would do this by default. Turns out that it was the random start routine that was causing the increased rate of non-convergence. When I override this with STARTS=0 I once again get 100% convergence. Sorry for the trouble.

Delphine Gross posted on Sunday, January 30, 2005 - 8:21 am

I am trying to do a Monte Carlo study using the results of one research as population values. With this data, I found two classes. When I do the monte carlo study specifying two classes, it works well, However, when I try to specify 3 classes (again using results from an analysis of the same data with 3 classes as population values), the program seems to get stuck in one of the replications. That is, several replications go well enough, then suddenly, Mplus gets stuck. I tried changing the seed and putting more starting values (STARTS = 50 10) but neither worked.

Linda K. Muthen posted on Sunday, January 30, 2005 - 3:43 pm

Please send the output that generated the data and the Monte Carlo output to support@statmodel.com. It is hard to say without seeing the details of the generation and subsequent analysis.

Delphine Gross posted on Monday, March 21, 2005 - 6:54 am

I was able to solve my last problem alone. However, I have a new problem. Still doing the same Monte Carlo analysis, I used an input that worked for 1500 subjects to evaluate the model with 2000 subjects. This new input does not work even though it is almost exactly the same as the last one that worked perfectly well. It computes and then suddenly stops and no output appears in Mplus. When I open the output, it contains only two lines after the input:
INPUT READING TERMINATED NORMALLY
then the title I gave

Linda K. Muthen posted on Monday, March 21, 2005 - 8:14 am

This is once again something where I would have to see the output that worked and the output that didn't work at support@statmodel.com. What you have experienced could happen for a variety of reasons.

Delphine Gross posted on Monday, March 20, 2006 - 3:11 am

I am trying to simulate categorical variables with specific probability of occurence per category (I used the command CUTPOINTS with Z scores). I then want to indicate a specific model with continuous latent variables.
When I use generate, I get error messages indicating that I cannot use this command for y-variable. When I use only cutpoints, I get the same message. Whether or not I use categorical does not seem to change anything. Is there a way to simulate categorical Y variables with specific endorsement probabilities and then specify a model to indicate the relation between these variables.
Thank you in advance

Bengt O. Muthen posted on Monday, March 20, 2006 - 6:16 am

You should be able to do what you are trying to do. Please send your input, output, and license number to support@statmodel.com.

Delphine Gross posted on Tuesday, June 27, 2006 - 6:34 am

Dear Dr. Muthen,

I am writing an article using mixture modeling and doing Monte Carlo studies on the results of the application. One reviewer asked how we knew label switching was not a problem. One known solution to limit label switching is to constrain the classes so that the smallest (or the largest) is the first and so on to the last class. I would like to know if Mplus order the classes according to class size and, if yes, in what order. Thank you very much in advance.

best regards,

Linda K. Muthen posted on Wednesday, June 28, 2006 - 2:21 am

Mplus does not order the classes in any way. Label switching can be a problem. You need to give good starting values to avoid this.

sallua posted on Wednesday, February 28, 2007 - 8:13 pm

What is the appropriate use of the STSTART value in simulation research? For example, should a different STSTART value be used for each replication within a condition or should the same STSTART value be used within the same condition or should the same STSTART value be used across all replications and conditions?

thanks!

Linda K. Muthen posted on Thursday, March 01, 2007 - 7:09 am

I don't know of an STSTART option.

sallua posted on Thursday, March 01, 2007 - 7:23 am

sorry - STSEED

Linda K. Muthen posted on Thursday, March 01, 2007 - 2:04 pm

STSEED is not for Monte Carlo studies. The SEED option is.

Chiquitia Welch posted on Thursday, May 17, 2007 - 3:07 pm

I need to conduct a Monte Carlo study to determine if my sample size is adequate. Is there a syntax for conducting Monte Carlo studies using LCA. I know that Nylund & Muthen conducted a Monte Carlo for LCA. Is the syntax they used available?

Linda K. Muthen posted on Thursday, May 17, 2007 - 3:37 pm

All of the examples in Chapter 7 come with an input for their Monte Carlo counterpart. These are on the website and the Mplus CD. This is a good place to start for a Monte Carlo for an LCA. The files you allude to are not available.

Qingqing Yang posted on Tuesday, September 25, 2007 - 1:51 pm

Hi,

I am conducting a external simulation for an LCA. I created 500 datasets in other software, and then analyzed the datasets in Mplus. In the data statement, I used "type=monte carlo" to do the monte carlo study, and used "savedata: results is results.txt" to save the results. But it seems 498 datasets were computed successfully by Mplus, and Mplus just gave the results of 498 datasets. So I wonder if there is some way to determine which datasets were not computed successfully by Mplus.

Thanks a lot!

Linda K. Muthen posted on Tuesday, September 25, 2007 - 1:59 pm

Ask for TECH9 in the OUTPUT command.

Anthony Ahmed posted on Wednesday, March 11, 2009 - 8:05 am

Hi,

I'm trying to conduct an external Monte Carlo study of FMA and LPA. Is it possible to obtain the information criteria corresponding to each individual data file and the LMR, and BLRT so that I can obtain hit rates for both methods?

Thanks

Linda K. Muthen posted on Wednesday, March 11, 2009 - 9:01 am

Yes, you can do this using the RESULTS option of the MONTECARLO command.

Anthony Ahmed posted on Wednesday, March 11, 2009 - 12:06 pm

The results option produces averages fit statistics and percentiles and proportions but no statistics for individual data. Also I can't seem to locate the SET ALL_RESULTS, SET ERROR_LOG,and the COMPLETED_LOG file that I specified in the runstart.bat of the runall utility. This is the input that I am using for the test run.

TITLE: Factor Mixture Analysis on Dissertation Data

DATA:
FILE IS "C:\LVM\LVM Data\runalldata.inp";
TYPE = MONTECARLO;

VARIABLE:
NAMES ARE u1 u2 u3 u4;
USEVARIABLES ARE u1 u2 u3 u4;
CLASSES = c(2);

ANALYSIS:
TYPE = MIXTURE;
STARTS = 100 20;
STITERATIONS = 20;
LRTBOOTSTRAP = 10;
LRTSTARTS = 100 20 100 20;
MODEL:
%OVERALL%
f BY u1-u4;
[f@0];
%c#1%
f BY u1@1 u2-u4;
f;
[u1-u4];
%c#2%
f BY u1@1 u2-u4;
f;
[u1-u4];
OUTPUT: TECH1 TECH2 TECH3 TECH8 TECH9 TECH11 TECH13 TECH14;

SAVEDATA:
RESULTS = disall1.txt;

Linda K. Muthen posted on Wednesday, March 11, 2009 - 5:11 pm

You are using external Monte Carlo which is a replacement of the RUNALL utility. So RUNALL is not needed. You should get results for each replication if you are using a current version of the program. You will not obtain TECH11 and TECH14 unless you use internal Monte Carlo. Please send further questions on this topic and your license number to support@statmodel.com.

In-Hee Choi posted on Thursday, February 04, 2010 - 11:43 pm

I am doing a simulation study about the effects of DIF patterns on detecting latent classes.
I have 27 conditions and in each condition 100 data sets have been generated using MONTECARLO command.
And what I want to do is fitting the model with one, two and three latent classes using one data set of 100 (one by one) within the condition.
So now I am looking for the way to run the multiple runs with the same model but different data.

This is an example input file for the one latent class model in the first condition.

DATA:
FILE IS 11.dat;

VARIABLE:
NAMES ARE y1-y30;
USEVARIABLES ARE y1-y30;
CATEGORICAL=y1-y30;

ANALYSIS: ALGORITHM=INTEGRATION;
MODEL: f BY y1-y30@1;
SAVEDATA:
FILE IS gmem1_1_1.dat;
SAVE=CPROB;

....through....

DATA:
FILE IS 1100.dat;
...
SAVEDATA:
FILE IS gmem1_100_1.dat;
SAVE=CPROB;

I hope to find some way to let Mplus run all of the 100 runs with just one input file.
Can I make some loop specifying the data set Mplus uses for each loop (as R?)
(because except for the data set, other commands in input file are the same in each loop)?

Linda K. Muthen posted on Friday, February 05, 2010 - 10:01 am

There's a DOS bat that is part of Web Note 10 that shows a way to do this. See Web Note 10.

Renee McDonald posted on Thursday, February 17, 2011 - 2:05 pm

I have a general question regarding simulated data. I am attempting to simulate data under different conditions to analyze. Is there a way to simulate data with a certain level of skewness and heterogeneous data with classes/clusters a certain distance apart?

Thank you.

Bengt O. Muthen posted on Thursday, February 17, 2011 - 4:29 pm

Non-normality can be generated using mixtures as we did in the paper on our web site:

Muth�n, L.K. & Muth�n, B.O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 4, 599-620.

You have to experiment to get the non-normality you want.

There are other ways as well, one being the "Vale-Maurelli" approach, but that is not in Mplus.

Renee McDonald posted on Thursday, February 17, 2011 - 8:02 pm

Thank you for the follow up. I will have a look at the paper. How about the distance between classes using Mahanalobis Distance or some other method. Is that possible?

Bengt O. Muthen posted on Thursday, February 17, 2011 - 8:30 pm

There have been a couple of mixture papers on that advocating at least 2 SD apart in the means for good estimation. I think some is in:

Lubke, G. & Muth�n, B. (2007). Performance of factor mixture models as a function of model size, covariate effects, and class-specific parameters. Structural Equation Modeling, 14(1), 26�47.

HwaYoung Lee posted on Tuesday, March 15, 2011 - 2:56 pm

Dear Dr. Muthen,
I am very interested in studying mixture modeling after reading the study, "Performance of factor mixture models as a function of model size, covariate effects, and class-specific parameters".
However, I am wondering how you calculate Mahalanobis distance using the formula provided in the article, especially, population parameter values for the part2 in the appendix.
I am not clear how to use the formula for the part 2.
For example, regression intercepts class 2 MDy=1.5, v=[.2 -.2 .36 -.4 .2 -.2 .36 -.4]
covariate effect, MDx=0.5, rc=[.5]

How can I calculate those things using the formula provided in the article?

Thank you so much.

Linda K. Muthen posted on Tuesday, March 15, 2011 - 5:52 pm

Please contact the first author of the article with these questions.

Renee McDonald posted on Thursday, March 31, 2011 - 11:28 pm

Hello,
I have a question regarding simulating mixture data. I am having problems with one issue in particular. Is it possible to specify class separation between clusters in Mplus? If so, how? For example, if I want to simulate 3 classes 2SD apart, how do I specify such in Mplus? Where / how is this included in the syntax? Thank you.

Linda K. Muthen posted on Friday, April 01, 2011 - 10:36 am

You would separate the classes by the population parameter values you give for the means or thresholds of the latent class indicators. All of the mixture examples in Chapters 7 and 8 come with a Monte Carlo counterpart where the data for the examples were generated. These would be a good place to start.

QianLi Xue posted on Thursday, May 26, 2011 - 5:31 pm

Hello, I would like simulate a latent class model with 4 classes and 6 binary latent class indicators y1-y6. What's wrong with the statements below? Help pls.

MONTECARLO:
NAMES ARE y1-y6;
NOBSERVATIONS = 300;
NREPS = 500;
SEED = 4533;
GENERATE = y1-y6 (1 p);
CATEGORICAL ARE y1-y6;
GENCLASSES = c (4);
Classes = c(4);

Model Population:

%overall%
[c#1*-1.944 c#2*-1.476 c#3*-1.695];

%c#1%
[y1$1@-1.325 y2$2@-1.658 y3$1@-1.735 y4$1@-1.735
y5$1@-1.516 y6$1@-1.208];

%c#2%
[y1$1@-0.364 y2$2@-0.364 y3$1@0 y4$1@0.754
y5$1@1.992 y6$1@3.892];

...

Linda K. Muthen posted on Thursday, May 26, 2011 - 5:37 pm

Please send the output and your license number to support@statmodel.com so I can see what error message you get.

QianLi Xue posted on Thursday, May 26, 2011 - 5:50 pm

Never mind. I found the problem. The threshold level following y2$ should be 1 not 2, becuase all ys are binary.

Here are two more question:

1) Do I need the statement: GENERATE = y1-y6 (1 p);

2) I assume that the thresholds for generating the binary variables y1-y6 are calculated internally using the latent class prevalence estimates given under %overall% and the logits of conditional probabilities given under %C#1%, %C#2%, etc. right?

Linda K. Muthen posted on Thursday, May 26, 2011 - 6:19 pm

1. The statement should be either

GENERATE = y1-y6 (1 l);

or

GENERATE = y1-y6 (1);

With maximum likelihood, logistic regression is used.

2. The variables are generated using the population parameter values that you give in MODEL POPULATION. Almost all of the Mplus examples have a Monte Carlo counterpart that was used to generate the data for the examples. You should try to find an example that is close to what you want and use the Monte Carlo counterpart for that example as a starting point.

Renee McDonald posted on Friday, June 24, 2011 - 11:58 am

Hello,
How do you determine a reasonable number of starts? I am doing a FMM simulation study. I was not getting some results (results for more than one class) and when I ran a Tech9, I see I need more starts. The problem I am running 200 different models with 1000 datasets each. I do not want to use more starts than necessary due to the amount of time to run the models. I am really unclear on how to determine such.

Thank you.

Linda K. Muthen posted on Friday, June 24, 2011 - 4:52 pm

I think this is just a matter of trial and error. You can start with 100 25 and if that is not enough try 200 50. The second number should be about 1/4th of the first.

Renee McDonald posted on Tuesday, June 28, 2011 - 9:55 am

Thank you. A follow up question....should I ONLY use my own start values if the tech9 says I should use more starts (I am running FMM using MC simulations)?

Also, what is the difference between starts 100 and starts 100 25, for example? Are the starts explained in the manual somewhere, I can't seem to find.

Linda K. Muthen posted on Tuesday, June 28, 2011 - 10:07 am

Starting values and random starts are not the same thing. See the STARTS option in the user's guide for a full description of this option. The Version 6 index shows it is on pages 550-551.

Renee McDonald posted on Tuesday, June 28, 2011 - 12:14 pm

Okay. Thank you. I understand the difference now. However, I still remain with the question...should I just use the default and only use my own if required by the tech9 output?

Linda K. Muthen posted on Tuesday, June 28, 2011 - 2:50 pm

If you don't replicate the best loglikelihood, you need to increase the number of random starts using the STARTS option.

Puneet Tiwari posted on Saturday, July 02, 2011 - 5:49 am

I am conducting a montecarlo study for LC model with categorical covariates and saved the results using the RESULT option in MONTECARLO.
Can I read this RESULTS file/ output file using R to extract summary statistics from a montecarlo study?
I understand that the results can be read for a non-simulation study fitted in Mplus(using MplusAutomation). But how about results from an MC study

Michael Hallquist posted on Wednesday, August 03, 2011 - 11:01 am

Hello, Puneet,

I am the developer of MplusAutomation and got your message. I recently finished an update to the package that reads the parameters from Monte Carlo output. Please run updatePackages and check that you have 0.4-3, which is the latest version. This should give you access to the parameters you mentioned. Please feel free to contact me directly if you have further troubles with Monte Carlo output following this update.

Best wishes,
Michael

HwaYoung Lee posted on Friday, February 10, 2012 - 11:47 am

Hello,
I would like to run an external Monte Carlo analysis for a correctly specified and mis-specified mixture models. And I want to compare fit indices such as AIC, BIC, and aBIC across correctly and mis-specified models to see whether fit indices perform well in supporting better fit for the correctly specified model than for the mis-specified models.
When I looked at the output of Monte Carlo analysis, only average fit statistics were provided.
Is there any option to save each replication result to compare fit indices across replications?
In other words, is there any way to know how many times fit indices select the correct model?
Thank you in advance.

Linda K. Muthen posted on Friday, February 10, 2012 - 12:34 pm

Use the RESULTS option of the SAVEDATA command.

Tenisha Smith posted on Monday, June 17, 2013 - 7:05 pm

I think I might be missing a simple point here - can you do a monte carlo simulation with no data (i.e. prior to collecting any data at all)? Or do you need pilot data of some sort?

Thank you.

Linda K. Muthen posted on Tuesday, June 18, 2013 - 7:44 am

You do not need data for a Monte Carlo simulation. The data are generated. However, you may need a pilot study to know the population parameter values to use for data generation.

Unkyung No posted on Wednesday, September 11, 2013 - 10:43 pm

Hello,

I want to know about setting the entropy values in MonteCarlo simulation study.
"We also vary the values of alpha1c to obtain differenct entropy levels. Choosing alpha11=1, alpha12=-1 yields entropy of 0.6. Choosing alpha11=2, alpha12=-2 yields entropy of 0.85. Choosing alpha11=3, alpha12=-3 yields entropy of 0.95." (Webnote 15, 15page).
In these statements, alpha1c is related to entropy level.
- Is there the equation to calculate the entropy values using alpha1c?
- Is it possible to set accurately the entropy values?

I'll try to set the entropy value (.5, .7, .9) in GMM with three classes. Please let me know the answer. Thank you!!

Linda K. Muthen posted on Thursday, September 12, 2013 - 7:08 am

There is no formula. This is done by trial and error. The farther apart the means of the classes are, the better the entropy.

Unkyung No posted on Wednesday, October 02, 2013 - 3:22 am

Thank you.

I have another question.
As far as I know, entropy is related to sample size.

In the example of my previous post(Webnote 15, 15page), if sample size is different from 5000, should the value of alpha11 and alpha12 be changed?

The smaller the sample size, the larger the entropy. So, the means of the classes will be reset closer.Is it right?

Thanks in advance.

Linda K. Muthen posted on Wednesday, October 02, 2013 - 12:06 pm

Entropy is not related to sample size.

Unkyung No posted on Wednesday, October 02, 2013 - 7:10 pm

Ah.. I was misunderstanding.
Thank you very much!

Erika Baldwin posted on Tuesday, May 13, 2014 - 1:54 pm

Hi, I am working on a Monte Carlo study looking at the power of latent transition probabilities in LTA. I have two questions:

1) Simulation output doesn't include the estimates for the last column of transition probabilities in a probability matrix. Is there a way to get the estimates for the last column for the sake of understanding power for
all transition probabilities in the matrix?

2) When reporting power of latent transition probabilities, would it be best to report all power values for each transition probability or can I average them to show the average power of latent transition probabilities for the overall model?

Linda K. Muthen posted on Wednesday, May 14, 2014 - 7:40 am

You can use MODEL CONSTRAINT to compute the probability for the reference class.

I would not average as the power is probably different for each.

Raghav Ramachandran posted on Monday, April 27, 2015 - 11:05 am

Hi Drs. Muthen,

I am running a power simulation for an LTA model with two time points, multiple (nine) covariates, and a distal outcome. As in MPlus Web Note 13, I want to vary the effect of the covariates on the latent variable at Time 2 (c2) for different classes of the latent variable at Time 1(c1).

However, when I try to implement this in my power simulation, similar to mcex8.14, I get the following error: "The following MODEL POPULATION statements are ignored:
* Statements in Class %C1#1% of MODEL C1:
C2#1 on X2
...
C2#1 on X9
* Statements in Class %C1#2% of MODEL C1:
C2#1 on X2
...
C2#1 on X9

I am using MPlus Version 7.11. Please let me know if you have any ideas on how to run a Monte Carlo simulation for an LTA model with multiple covariates while taking into account the interaction of the covariates between c1 and their influence on c2. Thanks for your help.

Thanks,
Raghav

Bengt O. Muthen posted on Monday, April 27, 2015 - 11:14 am

We need to see the full output to say - send to support along with your license number.

Raghav Ramachandran posted on Monday, April 27, 2015 - 11:20 am

Dr. Muthen,

Thanks for your prompt reply. I really just wanted to know if there was a way to extend Monte Carlo example 8.14 (MC simulation for LTA with a continuous covariate) to LTA models with multiple covariates. Sorry for the confusion and thanks for understanding.

Thanks,
Raghav

Raghav Ramachandran posted on Monday, April 27, 2015 - 1:21 pm

Dr. Muthen,

I am analyzing Army data sets as a government contractor and am using MPlus in a virtual enclave, so I don't have a license number to share. However, I can provide a snapshot of my LTA Monte Carlo code:

TITLE: Monte Carlo simulation for LTA with 9 covariates

montecarlo:
names are u1-u20 x1-x9 y1;
generate = u1-u20(1);
categorical = u1-u20;
genclasses = c1(2) c2(2);
classes = c1(2) c2(2);
...

analysis:
type = mixture;

model population:

%overall%
[x1@0.5]; ...
x1@0.2; ...

c1#1 on x1-x9*0.1;
c2#1 on x1-x9*-0.1;
c2#1 on c1#1*0.4;

model population-c1:

%c1#1% !Class 1 at Time 1
c2#1 on x1-x9*0.3;
[y1*4.5];
[u1$1*-2];...[u5$1*-2];
[u6$1*2];...[u10$1*2];

%c1#2% !Class 2 at Time 1
...

ERROR
The following MODEL POPULATION statements are ignored:
Statements in Class %C1#1% of MODEL C1:
C2#1 on X2 ... C2#1 on X9

I hope this snapshot of my MPlus output helps and I wanted to know if there was a way for me to regress C2 on all covariates (x1-x9) in the class specific statements of the model for C1.

Thanks,
Raghav

Bengt O. Muthen posted on Monday, April 27, 2015 - 2:25 pm

Mplus allows regression of c2 on all covariates x1-x9. To find out why your setup gets stopped we need to see your full output and that you have a support contract.

Peng qian posted on Monday, October 31, 2016 - 1:34 am

Dr. Muthen,

Firstly, sorry to trouble!

I want to compute the correct classification rates(CCRs) in LTA simulations.
I have three questions:

1)How mplus simulate true latent classes for each person in LCA?

I can find the latent class for each person based on the estimated model in the generate data file.
However,no matter in the input or output file,I cannot find signs to simulate the true latent class for each person in mixture models.

2)How mplus simulate true latent classes for each person at each time in LTA?

3)In addition,I think it is meaningful to compute the first CCRs,but I wonder it is meaningful to compute CCRs in the other times or all times as a whole.

In the LTA ,I can set the class proportions for the first time,however for the other times,the proportions of each classes are computed by the transtion probabilities and the first time class proportions.

Thanks a lot!

Bengt O. Muthen posted on Monday, October 31, 2016 - 3:46 pm

1) - 2):

Look at the Monte Carlo versions of each mixture example. For instance, for ex7.3 you can use the lines

nrep = 100;
repsave = all;
save = ex7.3rep*.dat;

and you will find the true (generated) class membership of each subject shown in the last column of the saved data.

3):

I think it is meaningful for all time points.

Peng qian posted on Tuesday, November 01, 2016 - 12:51 am

Dr. Muthen,
Thanks for your helpful and prompt reply.
Thank you very much,I think I mistook the generated class membership of each subject for the estimated data.

I have another three questions:
1):
I want to understand deeply how mplus to simulate the class membership of each subject?

In the simulation of the LCA model, I can set the numbers of subjects, replications, items, item thresholds, but I do not know how to simulate the class membership of each subject in the mplus?

2-1):
Can I get the model estimated data results of each subject in the Monte Carlo study at the same time?

*** ERROR in SAVEDATA command
The SAVEDATA command is not available in conjunction with the MONTECARLO commnand.
Use the SAVE and RESULTS options in the MONTECARLO command as desired.

2-2):
If 2-1) not, that is to say, I have to estimate each generated data in mplus. I want to know if there is some way to use all the generated data(eg, nrep = 100;) in one input file to get the all estimated data , just like in the simulation( repsave = all; save = ex7.3rep*.dat;).

3):
I want to computer the mean absolute parameter recovery deviations(MADs),but I just get one output files. Is it the last replication results or the average results as a whole? How can I get the each output file for all replications.

Thanks a lot!

Bengt O. Muthen posted on Wednesday, November 02, 2016 - 5:18 pm

1) - 2);

you choose the [c#1] value (assuming 2 classes) and this gives the probability of being in the first class. Then you do a random draw (Mplus does) and a person gets put in one of the two classes (as a function of that probability).

3) You would have to do External Monte Carlo (see UG Monte Carlo chapter 12 Step 1 and Step 2 examples) and then run each generated data set at a time (like you do real data), saving the information you want. Summaries can be done using "MplusAutomation" - see our website.

Ni Yuhan posted on Tuesday, November 15, 2016 - 1:49 am

Dear Dr. Muthen,

I'm confusing that how to define a correlation between two categorical indicators in LCA Montecarlo.

Assuming there are 5 categorical indicators y1-y5, here is my code:

f1 by y1-y2 @ 0.4;

f2 by y3-y5 @ 0.4;

Does this mean that there is a correlation between y1 and y2, and there are correlations among y3-y5?

Thanks a lot.

Yuhan

Bengt O. Muthen posted on Tuesday, November 15, 2016 - 5:24 pm

Yes because the factors make them correlated.

Ni Yuhan posted on Saturday, November 19, 2016 - 5:29 am

Dear Dr. Muthen,

I remember you mentioned that if we coded [c#1*0] then 50% of the cases would be put in each of the two classes.
So I guess that [c#1*0 c#2*0] would be applicable for 3-class.

But the output was not not like expectation.

As below:

INPUT INSTRUCTIONS

montecarlo:
names are y1-y5;
genclasses = c(3);
classes = c(3);
SEED = 23458256;
nobs = 250;
nrep = 100;
repsave = ALL;
save = 5 250_*.DAT;

analysis:
type = mixture;
PROCESSORS=8(STARTS);

model population:
%overall%
[c#1*0 c#2@0];

%c#1%
[y1-y5*0.224];
y1-y5*1;
%c#2%
[y1-y5*0.7303];
y1-y5*1;
%c#3%
[y1-y5*1.2366];
y1-y5*1;

FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES
BASED ON THE ESTIMATED MODEL

Latent
Classes

1 22.28730 0.08915
2 111.08100 0.44432
3 116.63170 0.46653

Is there any problem in input?

Thanks a lot.

Yuhan

Ni Yuhan posted on Saturday, November 19, 2016 - 5:30 am

sorry there is a mistake

[c#1*0 c#2*0]

Bengt O. Muthen posted on Saturday, November 19, 2016 - 1:40 pm

Those zero logits are correct. But you have weak class separation in that the variable means are only about half a standard deviation apart. This means that there may be class switching over the replications, particularly when the sample is relatively small as it is here. We recommend the means to be at least 2 SDs apart.

Peng qian posted on Monday, February 27, 2017 - 5:42 pm

Dear Dr. Muthen,

Firstly, sorry to trouble!

My question is that how to set random threshold value(e.g.-1.5~1.5) in the LTA simulation.

Thanks a lot!

Bengt O. Muthen posted on Monday, February 27, 2017 - 6:51 pm

I don't know what you mean. Are you talking about a multilevel model?

Peng qian posted on Thursday, March 02, 2017 - 12:43 am

Dear Dr. Muthen,

Firstly, sorry to trouble again!

Not multilevel model.

e.g.In the LTA simulation: I=3 items,T=3 times

model population-c1:
%c1#1%
[u1$1*1] ;
[u2$1*1] ;
[u3$1*1] ;

%c1#2%
[u1$1*-1] ;
[u2$1*-1] ;
[u3$1*-1] ;
In this example,the item threshold value is fixed(1 or -1).
Now,I want to set random threshold value(e.g.-1.5~1.5).
The queation is that how I can achieve it.

Thanks a lot!

Amanda Hagman posted on Wednesday, March 08, 2017 - 2:04 pm

Hello Drs. Muthen,

I am running a Monte Carlo simulation to look at class enumeration under violations to conditional independence. I would like to replicate the graphs seen in the Nylund, Asparouhov, Muthen (2007), (1) percent of times the lowest IC values occured in each class, and (2) percent of times a nonsignificant p value was seen in the BLRT/LRT.

The output of my external monte carlo and the output gives an average IC's and list of IC values and BLRT average (from specifying tech14). Is there an easy way to see individual results reported or more detailed reports in the output so I can replicate this type of table?

Thanks,

Amanda

Bengt O. Muthen posted on Wednesday, March 08, 2017 - 6:07 pm

You can do that using the R program MplusAutomation on our website.

Peng qian posted on Tuesday, March 21, 2017 - 7:48 am

Dear Dr. Muthen,

Thanks a lot!

I am troubled in LTA simulation study.

It is that I can not get the analysis data in my output file when use the simulated 1000 person data ,but 200 person data is ok.

The output information as follows:

SAVEDATA INFORMATION

Class probabilities were not computed.
No data were saved.

Bengt O. Muthen posted on Tuesday, March 21, 2017 - 5:59 pm

We can answer this only when seeing your output - send to Support along with your license number.

Peng qian posted on Sunday, April 02, 2017 - 10:17 pm

Dr. Muthen,

Now, I am doing LTA simulations.I set two steps:the first step, I get the genertate(real) data in the simulation;the second step, I use the real data to estimate the model.

I am troubled:

(1)In the first step ,I wander to konw if the part MODEL command (used as analysis) which follow the part MODEL POPULATION command(used as genertate data) is necessary exist.
My concern is that whether the "MODEL" influence the genertate data.

(2)The second is about the "MODEL CONSTRAINT"(eg.1) in the first step.How could I set those new variables(eg.L1_0, L1_11) starting values or population parameter values?

eg.1:

model population-c1:

%c1#1% ! Model for Class 1
[mitemf1$1*1] (T1_1); ! Item 1 Thresh 1
[mitemf2$1*1] (T2_1); ! Item 2 Thresh 1...
...

MODEL CONSTRAINT: ! Used to define LCDM parameters

! Item 1: Define LCDM parameters present for item 1

NEW(L1_0 L1_11);
T1_1=-(L1_0); ! Item 1 Thresh 1
T1_2=-(L1_0+L1_11);

Peng qian posted on Sunday, April 02, 2017 - 10:23 pm

Thanks a lot!

Bengt O. Muthen posted on Wednesday, April 05, 2017 - 5:50 pm

Send output to Support along with your license number.

Fredrik Falkenstr�m posted on Thursday, May 10, 2018 - 10:27 am

Dear Drs. Muthen,

Is there a way in Mplus to just save the generated samples from a Montecarlo analysis without estimating any models on those samples? As it is, it takes many hours to estimate the models, and I would like to save the time.

Best,

Fredrik Falkenstr�m

Bengt O. Muthen posted on Thursday, May 10, 2018 - 3:15 pm

Choose convergence criteria such that only 1 or 2 iterations are done. Save the data using the example of 12.6 Step 1 (then do Step 2).

Fredrik Falkenstr�m posted on Thursday, May 10, 2018 - 11:08 pm

Great, thanks!

Rose Stafford posted on Thursday, September 13, 2018 - 12:54 pm

Hi Drs. Muth�n,

I am conducting a simulation study comparing continuous LCA (LPA) and mixture CFA (FMM). I would like the total (population) indicator variance to be the same for the LPA and FMM generated data, meaning that they would only differ in whether indicator covariance is solely due to a latent class factor (LPA) or both a latent class factor & a continuous latent factor (FMM).

I have questions concerning the variance specification in the LPA data-generation model. For example:

MODEL POPULATION:
%Overall%
y1-y10*1;
%c#1%
[y1-y10*.317]; !MD=1
%c#2%
[y1-y10*0];

A) Is 'y1-y10*1' referring to total indicator variance? Or is it referring to the indicators' residual variance after accounting for covariation between indicators due to the latent class factor?

B) If it is referring to total indicator variance, is there a way to specify or calculate the variance due to the latent class factor (or the residual variance)?

Multiple mixture MC sim studies provide 'residual variances' data generating population parameters, including your (Bengt's) 2007 publication with Lubke (Performance of FMM as a function of ...), and I am unsure how this would be specified.

I greatly appreciate your help!

Bengt O. Muthen posted on Thursday, September 13, 2018 - 6:02 pm

The statement

%Overall%
y1-y10*1;

gives the variance for y1-y10 in each class (because y is not regressed on any variables).

Mingchi Tseng posted on Saturday, September 22, 2018 - 8:34 am

I read the Web Note No.21. Follow the content and through Monte carlo, I can have the same result with table 2 & 3. But I don't know how to write the syntax about '5 Simulation study with a non-normal distal auxiliary outcome' to get the same result with table 4. Please help.

Mingchi Tseng posted on Monday, September 24, 2018 - 7:10 am

Dear Dr. Muthen,
how can I write syntax to creat the "bimodal distribution" in non-normal distal auxiliary outcome like P17 in Mplus Web Notes: No. 21. Please teach me. Thank you.

Tihomir Asparouhov posted on Tuesday, September 25, 2018 - 9:03 am

A regression of continuous (normal residual) on a binary variable gives bimodal.

Here is some sample code - the bimodal variable is X (non-normal distal auxiliary)

Montecarlo:
Names are u1-u5 x u;
Generate = u1-u5(1) u(1);
Categorical = u1-u5 u;
Genclasses = c(2);
Classes = c(2);
Nobservations = 1000;
Nrep = 1;

Analysis: Type = Mixture; algo=int;

Model Population:
%Overall%
[x@0];
x@1;
x on u*1;
u on x@0;

%c#1%
[u1$1*-1 u2$1*-1 u3$1*-1 u4$1*-1 u5$1*-1];
[x*0]; [u$1*1.098];
x on u*1;

%c#2%
[u1$1*1 u2$1*1 u3$1*1 u4$1*1 u5$1*1];
[x*1]; [u$1*1.098];
x on u*1;

Mingchi Tseng posted on Tuesday, September 25, 2018 - 8:31 pm

Thank you so much. Now, I can creat bimodal distal outcome in Mplus.

Peter Edelsbrunner posted on Monday, March 11, 2019 - 3:14 pm

I am conducting a monte carlo study on latent transition analyses in educational science and since I would like to compute RMSEs and bias on all datasets, I am wondering how Mplus manages to handle label switching in the computation of coverage etc in the monte carlo output, and whether I can use the same method for handling the outputs from my individually (via MplusAutomation) analyzed 1000 datasets?

Any help on this issue would be appreciated - perhaps label switching is still the reason none of the available simulation studies on mixture modeling report on bias and RSMEs, and instead always only on identifying the correct number of mixtures?

The only option I could find is provided in a paper that has been mentioned here: Tueller, Drotar, & Lubke (2011) "Addressing the Problem of Switched Class Labels in Latent Variable Mixture Model Simulation Studies" - is this still the best/only sufficiently well-working option for handling label switching in Mplus?

Bengt O. Muthen posted on Tuesday, March 12, 2019 - 5:55 pm

This is hard to do. The paper you refer to is the only approach I can think of - except having well-separated classes to begin with so switches across replications don't occur.

Anna Brown posted on Sunday, October 11, 2020 - 6:17 am

Dear Mplus team,

I am running a simulation study for a multilevel mixture model with 500 replications, and it is taking quite a long time. I would normally wait, but on this occasion,I need to release my machine, and am happy to stop at fewer replications (say 300) to see results without just wasting all work already done.
Is there any way of doing this rather than just cancelling the whole run and loosing everything?

Many thanks for your help,
Anna

Bengt O. Muthen posted on Sunday, October 11, 2020 - 4:17 pm

Sorry, no.