Monte Carlo Modeling
Message/Author
 Daniel posted on Thursday, August 28, 2003 - 8:56 am
I would like to conduct a Monte Carlo model to estimate power in a two group LGM analysis. How would I specify my groups and other related info?
 Linda K. Muthen posted on Thursday, August 28, 2003 - 9:08 am
Example 2 on page 24 of the Addendum to the Mplus User's Guide (at www.statmodel.com under Product Support) is a monte carlo for a two-group three-level growth model. You can use that as a starting point.
 Daniel posted on Thursday, August 28, 2003 - 9:10 am
Thanks very much. You and your husband give the greatest support for any product of any type I have ever purchased.
 Daniel posted on Friday, August 29, 2003 - 8:06 am
Hi, I'm having quite a bit of difficulty getting started. I have an LGM with two parallel processes. I want to see if I can find an interaction for high and low levels of depression. I want to cut my groups into high and low levels of depression. I am not using multi-level modeling. How should I get started?
I'd appreciate any help possible. I looked at the example on page 24, like you suggested, but could not make heads or tails of it because I am not doing multi-level modeling. It keeps on asking me for data.
 Linda K. Muthen posted on Friday, August 29, 2003 - 8:40 am
Here is the example from page 24 with the multilevel information deleted. Maybe that will help.

TITLE: simulation of a two group analysis of growth model
MONTECARLO: NAMES ARE y1-y4;
NGROUPS = 2;
NOBSERVATIONS = 960 560;
NREPS = 100;
SEED = 9191187;

MODEL MONTECARLO:
i BY y1-y4@1;
s BY y1@0 y2@1 y3@2 y4@3;
y1-y4*.5 i*.7 s*.2;
[y1-y4@0 i*0 s*.2 ];

MODEL MONTECARLO-G1:
[i*0 s*.2];

MODEL MONTECARLO-G2:
[i*1 s*.5];
ANALYSIS: TYPE = MEANSTRUCTURE;
MODEL:
i BY y1-y4@1;
s BY y1@0 y2@1 y3@2 y4@3;
y1-y4*.5 i*.7 s*.2;
[y1-y4@0 i*0 s*.2 ];
MODEL G1:
[i*0 s*.2];
MODEL G2:
[i*1 s*.5];
OUTPUT: TECH9;
 Linda K. Muthen posted on Friday, August 29, 2003 - 9:09 am
Actually, the input I posted will not run in Version 2 of Mplus. It will run in Version 3. In Version 2, you will need to use the FILE option to enter your population values instead of MODEL MONTECARLO. Or you will have to modify the input to be TYPE=MIXTURE, NCLASSES=1, GCLASSES=1; If I get time, I will modify this. But it won't be until next week.
 Daniel posted on Friday, August 29, 2003 - 12:30 pm
Thanks again. I figured out a way to run the model. I look forward to version 3.
 Linda K. Muthen posted on Friday, August 29, 2003 - 2:16 pm
Great. Version 3 is going to have unbelieveable Monte Carlo capabilities among other things. I think you'll like it.
 Limey posted on Thursday, May 06, 2004 - 1:07 pm
I would like to use the Monte Carlo facility (version 3 of Mplus) to generate 2 seperate populations, and then disproportionately sample a certain number from each population. The samples from these 2 populations need to be combined so that I can then analyze it using GMM. Can this be done all in Mplus or do I need to do some of it in SAS?
 Linda K. Muthen posted on Friday, May 07, 2004 - 6:15 am
See Example 11.6 which explains how to generate and save the superpopulation files and analyze them with a separate input. Example mcex5.14 which comes with the Mplus CD and is put on the hard disk as part of the installation shows how to use Monte Carlo to generate multiple group data. In the generation of the population include your binary subpopulation inclusion indicator U. In the first group [U\$1*0] and in the second group [U\$1*1] would do disproportioned sampling. In the second input
USEOBSERVATIONS = u eq 1 would make sure that you analyze the subpopulation.
 limey posted on Friday, May 07, 2004 - 1:53 pm
Thank you for the help! However, it doesn't look like Mplus can generate multiple group data for mixed models. Is this the case?
 Linda K. Muthen posted on Friday, May 07, 2004 - 3:09 pm
I think you can do multiple group for a mixed model if we mean the same thing by mixed model. Can you describe your model?
 limey posted on Monday, May 10, 2004 - 8:21 am
I would like to do a simulation using a Growth Mixture Model (with latent continuous intercept and slope factors, and a categorical latent class factor). I would like to simulate a complex sample design, where individuals are sampled at different rates from 2 strata. Later, I would like to add clustering into the sample design. Thanks again for all of your help!
 Anonymous posted on Monday, May 10, 2004 - 11:03 am
I modified example mcex8.1.inp to generate two strata. To add clustering into the design you have to modify mcex10.4.inp the same way. Tihomir.

-----

title:
this is an example of a GMM for a
continuous outcome using automatic
starting values and random starts

two strata
inclusion variable for subsequent sampling U

montecarlo:
names are y1-y4 x U;
genclasses = c(4);
classes = c(2);
nobs = 500;
seed = 3454367;
nrep = 10;
REPSAVE = ALL;
SAVE = ex8.1rep*.dat;
categorical=U;
generate=U(1);

analysis:
type = mixture;

model population:

%overall%

[x@0]; x@1;

y1-y4*.5;
i s | y1@0 y2@1 y3@2 y4@3;
i*1; s*.2;

c#1 on x*1;
i on x*.5;
s on x*.3;

! first strata
%c#1%
[i*1 s*.5];
[U\$1*0];

%c#2%
[i*3 s*1];
[U\$1*0];

! second strata
%c#3%
[i*2 s*1.5];
[U\$1*1];

%c#4%
[i*0 s*1.5];
[U\$1*1];

model:

%overall%

y1-y4*.5;
i s | y1@0 y2@1 y3@2 y4@3;
i*1; s*.2;

c#1 on x*1;
i on x*.5;
s on x*.3;

%c#1%
[i*1 s*.5];

%c#2%
[i*3 s*1];

output:
tech8 tech9;
 limey posted on Tuesday, May 11, 2004 - 12:18 pm
You rock. Thanks muchly!
 limeygrl posted on Friday, May 28, 2004 - 2:19 pm
One more thing regarding previous question. Is there a way to specify the number of observation sampled from the population seperately in each strata. So if nobs=500, 300 would come from strata one and 200 whould come from strata 2?
 bmuthen posted on Saturday, May 29, 2004 - 10:50 am
Seems like you can do that by specifying values of fixed probabilities for the classes corresponding to strata, so using the logit parameters of [c#]. I guess the strata could be handled by the V3 knownclass option.
 Annica posted on Wednesday, November 03, 2004 - 1:05 am
What information can be saved when using TYPE=MONTECARLO in the DATA statement, i.e. when performing a Monte Carlo study and the data has been generated outside Mplus? Using the RESULTS option in the SAVEDATA statement I obtain the parameter estimates for each data set. However, some of the replications do not reach convergence and I would like to know which. I know this information is possible to save when performing an internal Monte Carlo study in Mplus.

Thank you.
 Linda K. Muthen posted on Wednesday, November 03, 2004 - 10:47 am
For external Monte Carlo, ask for TECH9 to see which replications did not converge.
 Chuck Cleland posted on Tuesday, July 31, 2007 - 7:39 am
Is it possible to specify an alternative significance level at which each replication in a Monte Carlo analysis is tested? I did not see anything in the documentation for Monte Carlo that allows significance level to be specified. I would like to determine the power for detecting differences in linear growth between two groups, but I want to test at alpha = .03125 as a way of approximating a multiplicity adjustment.

thanks,

Chuck
 Linda K. Muthen posted on Tuesday, July 31, 2007 - 8:05 am
There is no way to specify an alternative significance level.
 Emily Blood posted on Thursday, September 27, 2007 - 12:47 pm
If the data has been generated outside of Mplus, is it possible to enter the true values of the population parameters? In the manual I only see described how to enter true population values if the population values were generated in another Mplus run or if the data were generated within Mplus. I have generated the data in Splus and would like to specify the true population values used so I can use the 95% cover probabilities in the Mplus output from the MC run. Thanks for your help!
Emily
 Linda K. Muthen posted on Thursday, September 27, 2007 - 1:15 pm
You would do external Monte Carlo if the data are generated outside of Mplus. See Step 2 of Example 11.6. You would enter the population values in the MODEL command.
 Emily Blood posted on Thursday, September 27, 2007 - 1:22 pm
Thank you. What is the syntax for entering a population value in the MODEL command? I know that, for example,i BY y@2 sets the value of the regression coefficient for y to 2 and [i@1] sets the intercept for the i factor to 1, and f ON x*1 will set the starting value for the regression parameter to be 1. It is not clear from the manual how to enter the true population values in the MODEL command. Can you please let me know.
Thanks,
Emily
 Linda K. Muthen posted on Thursday, September 27, 2007 - 2:14 pm
In the MODEL command you should use and * followed by the true population value for free parameters and @ followed by the true population values of fixed parameters. Please see Step 2 of Example 11.6. The true population values are given in the MODEL command.
 Emily Blood posted on Friday, September 28, 2007 - 6:44 am
Thank you. Since this is the same syntax for setting starting values for free parameters, does putting these true values in automatically make those the starting values in the estimation of the models?
Emily
 Linda K. Muthen posted on Friday, September 28, 2007 - 7:31 am
Yes, they are also used as starting values.
 Emily Blood posted on Friday, September 28, 2007 - 7:34 am
Thank you for all of your help. The MC facility in Mplus is really great!!! It seems to do everything I need for my simulation study.
 Emily Blood posted on Tuesday, October 09, 2007 - 4:57 pm
What is the method of computing the standard error of indirect effects? I have the following code in my model (it is a variation on a latent growth curve model where i is a latent intercept).
MODEL INDIRECT:
i IND tx;
There is a mediating variable between tx and i. The output shows a standard error estimate, but I am not clear on what method is used to compute this since the indirect effect involves a product of two estimated coefficients (a+b*c is the form of the indirect effect, where a is the direct effect of tx on i and b is the effect of tx on a mediating variable and c is the effect of the mediating variable on i). If you could let me know that would be great!
Thanks,
Emily
 Linda K. Muthen posted on Tuesday, October 09, 2007 - 6:05 pm
The default is to compute standard errors using the Delta method. Bootstrap standard errors are also available.
 Emily Blood posted on Wednesday, October 10, 2007 - 1:04 pm
I'm sorry to keep bothering you, but for the a+b*c total effect, the delta method std err would be: sqrt[var(a) + b^2*var(c) + c^2*var(b)], is this correct? If so, this doesn't seem to match the value from the output. I can match the value for the std err of the indirect part of the effect (se(b*c)), but for the total effect, I am getting something larger than what is in the output. I am trying to figure out what I am missing. If you could let me know I would really appreciate it.
Thanks!
Emily
 Linda K. Muthen posted on Wednesday, October 10, 2007 - 6:13 pm
You have not included covariance terms in your formula.
 Michelle Finney posted on Thursday, November 13, 2008 - 10:53 pm
I am trying to run a simulation in MPLUS for a growth model to estimate power. The proposed model has three covariates (one is binary, two continuous), and three time points. I am assuming complete data. How do you specify the parameter variances of the three intercepts and the three slopes. Is there an example you can direct me to?

Thanks you!
 Linda K. Muthen posted on Friday, November 14, 2008 - 6:14 am
The data for the user's guide examples are generated by Monte Carlo simulations. Find the Monte Carlo counterpart of the growth model example in Chapter 6 that is closest to what you are doing to see how this is done. These inputs come on the Mplus CD and are put on the hard disk as part of the default installation.
 Michelle Finney posted on Friday, November 14, 2008 - 1:21 pm
Thank you so much!! This is very helpful.
MF
 Hanno Petras posted on Wednesday, March 10, 2010 - 12:08 pm
Hi,

I am running a Montecarlo simulation for a 2 class nonlinear Growth model where a binary covariate has an effect on the slope in one class, but not the other. I am using the following syntaxt in the model population as well as in the model syntax block to create a binary variable with a 50/50 split, simulating a treatment variable: [x@0]; x@1;
When I look at the estimated sample statistics for the first replication, the mean is not .5. Am I doing something wrong, or is due to the fact that the statistics are from the first replication are shown? Thank you for your advice.

Hanno
 Linda K. Muthen posted on Wednesday, March 10, 2010 - 12:26 pm
It sounds like you are not using the CUTPOINTS option to categorize the covariate. See Example 11.1.
 Hanno Petras posted on Wednesday, March 10, 2010 - 12:43 pm
Hi Linda,
below is my reduced input (I left out the model block due to size). Do you see what is wrong with it. Thanks.

Hanno

montecarlo:
names are y1-y5 x;
genclasses = c(2);
classes = c(2);
nobs = 250;
seed = 3454367;
nrep = 1000;
cutpoints=x(1);

analysis:
type = mixture;

model population:
%overall%
[c#1*-1];
[x@0]; x@1;
y1-y5*.5;
i s q| y1@-1 y2@0 y3@1 y4@4 y5@7;
q@0; i*.5; s*.1;
i on x@0; s on x*.3; q on x*.01;
s with q*0; i with q*0;
%c#1%
[i*2.2 s*.5 q*-.1];
s on x*.3; q on x*.01;
%c#2%
[i*2.5 s*.5 q*-.02];
s on x*0; q on x*0;
 Linda K. Muthen posted on Wednesday, March 10, 2010 - 12:59 pm
You need CUTPOINTS (0). If you read the example, you will see that we did not do 50/50 split.
 Erika Wolf posted on Wednesday, October 27, 2010 - 10:24 am
What do you make of monte carlo power analyses where power appears to be more than adequate yet some replications generated in the analysis do not converge and/or some replications include non-positive definite matrices? Are the power estimates accurate in this situation, given that the majority of replications are fine? I've noticed this tends to occur as I decrease sample size in an effort to determine the minimum sample size needed, yet power estimates in the output are still well above 80% and estimated parameter estimates do not differ meaningfully from the population.
 Bengt O. Muthen posted on Wednesday, October 27, 2010 - 12:29 pm
That is a potential problem of Monte Carlo based power estimation - it relies on good parameter and SE estimation. An alternative is the Satorra-Saris pop cov matrix approach (see our Topic 3 handout), but that does not cover as many model situations.

A good question is how well the parameter estimates and SEs are recovered.

You can attempt to avoid non-pos def by various tricks (Cholesky decomp or inequality constraints, i.e. var >0).
 Erika Wolf posted on Thursday, October 28, 2010 - 8:15 am
Thanks for your quick response. A follow-up question from the thread from yesterday: Can you please clarify how mplus calculates the mean parameter estimates, SD, and SE when some solutions in the montecarlo run fail to converge or produce improper solutions? Does it eliminate those instances and change the denominator for the mean values or does it factor that in some how? Also, are those non-replications reflected in any way in the calculation of the 95% cover or power estimate? Thanks again!
 Linda K. Muthen posted on Thursday, October 28, 2010 - 10:33 am
The values from all replications that are shown as completed are used. Information from replications that are not comleted are not used.
 davide morselli posted on Tuesday, December 14, 2010 - 1:50 am
hi,
i'm trying to estimate samples sizes for a two level SEM with the Monte Carlo simulation. I have a couple of questions whihc are not really clear to me from the User guide and Muthen & Muthen (2002).
Given that I actually rely on real data I should run the Monte Carlo from the estimates of my model (obtained with "SAVEDATA: ESTIMATES") or on artificial data generated with Monte Carlo method?
If so, here is the second question is: in Muthen & Muthen (2002) you fix the variance of factor to 1, saying that this make it more easy interpret results. Should I do the same in my case (simulation based on real data)
thank you very much in advance
 Bengt O. Muthen posted on Tuesday, December 14, 2010 - 8:53 am
If you feel that your real data solution offers the best assessment of model parameters, then use those values.

You don't have to do that. But if you do, do it for both the real-data run and the simulation.
 davide morselli posted on Friday, December 17, 2010 - 6:16 am
ok thank you very much Bengt,
just one more question: can I use the bias computed with the montecarlo estimates ((average SE-pop SE)/pop SE) to correct the standard errors in my real model?
thank you
 Bengt O. Muthen posted on Friday, December 17, 2010 - 6:12 pm
No, because the real data may have features that are different from the simulated data. The fact that the simulated data uses the model parameters estimated from the real data is not sufficient because the model may not capture all aspects of the real data - even when it fits well.

If you see a bias in SEs, you may instead try changing from ML to MLR.
 davide morselli posted on Monday, December 20, 2010 - 1:18 am
Ok, thanks.
In my ML-SEM model I already use MLR, but given that the second level clusters are countries, it ends up that the number of parameters is much higher than the number of clusters. Can I consider the SE estimates with MLR reliable also in this case? I think I remember that in one course you gave, you said not to consider the Mplus warning about "trustworthiness of SE" if you use MLR. Thank you very much your, help is really appreciated
 Linda K. Muthen posted on Monday, December 20, 2010 - 6:27 am
If the number of between-level parameters is greater than the number of between-level cluster units, this is a problem. If it is the total number of parameters in the model that is greater than the number of clusters, this may also be a problem. This has not been thoroughly studied as far as I know. Note that a minimum of 30-50 clusters is recommended.
 Thierno Diallo posted on Monday, April 04, 2011 - 12:50 pm
Hi Linda,
I want to use the non-parametric GMM in a simulation but I don’t know how to specify the model. For example, how can I get a model with csub (3) cnp(2)? Please can you tell me what is wrong with my syntax below ? Thank you in advance.

NAMES ARE y1-y5 ;
NOBSERVATIONS = 1000;
NREPS = 1;
SEED = 53487;
GENCLASSES = csub (3) cnp(2);
CLASSES = c (1);

!SAVE = ex12.6rep*.sav;

ANALYSIS: TYPE = MIXTURE;

ESTIMATOR = MLR ;

MODEL POPULATION:

%OVERALL%

i s | y1@0 y2@1 y3@2 y4@3 y5@4;
i*1; s*.2; i with s*.11;
y1*1.0 y2*1.42 y3*2.24 y4*3.46 y5*5.08;

%csub#1%
[i*1.5 s*1.6];
i*5; s*.2; i with s*.069;
y1*1.0 y2*1.42 y3*2.24 y4*3.46 y5*5.08;

%csub#2%
[i*1.8161 s*0.1930];
i*25.3642; s*0.4757; i with s*2.6744;
y1*1.0098 y2*1.4256 y3*2.2380 y4*3.4435 y5*5.0170;

%csub#3%
[i*2 s*3];
i*25.3642; s*0.4757; i with s*2.6744;
y1*10 y2*1.4256 y3*3.80 y4*10 y5*5;
 Linda K. Muthen posted on Monday, April 04, 2011 - 3:23 pm
Please send the output and your license number to support@statmodel.com so I can see the error message.
 Yaacov Petscher posted on Thursday, August 25, 2011 - 8:53 am
Greetings,

I working through a modified version of example 12.4 for simulating a growth model. The default code runs as expected, but when I specify a correlation between iw/sw and ib/sb > .11 I receive a fatal error message (Population covariance matrix is NPD). Is this due to the starting values of the iw/sw and ib/sb? Any help is appreciated.

MONTECARLO:
NAMES ARE y1-y3;
NOBSERVATIONS = 244;
NREPS = 500;
NCSIZES = 1;
CSIZES = 61 (4);
MODEL POPULATION:
%WITHIN%
iw sw | y1@0 y2@1 y3@2;
y1-y3*.2;
iw*1; sw*.2;
iw with sw*.21;
%BETWEEN%
ib sb | y1@0 y2@1 y3@2;
y1-y3@0;
[ib*1 sb*.5];
ib*.2; sb*.1;
ib with sb*.21;
ANALYSIS: TYPE IS TWOLEVEL;
MODEL:
%WITHIN%
iw sw | y1@0 y2@1 y3@2;
y1-y3*.2;
iw*1; sw*.2;
iw with sw*.21;
%BETWEEN%
ib sb | y1@0 y2@1 y3@2;
y1-y3@0;
[ib*1 sb*.5];
ib*.2; sb*.1;
ib with sb*.21;
 Bengt O. Muthen posted on Thursday, August 25, 2011 - 9:08 am
You say

ib*.2; sb*.1;
ib with sb*.21;

But translating this into a correlation that is

0.21/0.1414 >1,

so your covariance can't be greater than 0.1414.
 Yaacov Petscher posted on Thursday, August 25, 2011 - 9:15 am
Thank you. I was in the process of recomputing and found that that seemed to be the case.
 Robert A. Ackerman posted on Monday, June 03, 2013 - 2:30 pm
Hello,

After getting a Monte Carlo simulation of a bivariate growth-curve model with multiple indicators to run, the program noted that many of the replications derived estimates from non-positive definite matrices. I'm wondering whether these are included in the final results presented, and if so, if I should just disregard the overall results from the power analysis then.

Any insight would be greatly appreciated.

Thank you!
 Bengt O. Muthen posted on Monday, June 03, 2013 - 3:33 pm
All converging replications are included. Non-pos-def matrices can occur with negative residual variances for the outcomes if population values are small or sample size is small, and also if growth factor correlations in the population are high or sample size small.
 Bruce A. Cooper posted on Monday, September 09, 2013 - 4:03 pm
Hello!
1. I am trying to start with a Monte Carlo analysis to estimate power for a simple growth model with 6 assessments. I'd like to specify a constant corr across time, but this doesn't work:
y1 WITH y2-y6@0.3 ;
y2 WITH y3-y6@0.3 ;
y3 WITH y4-y6@0.3 ;
y4 WITH y5-y6@0.3 ;
y5 WITH y6@0.3 ;
What am I doing wrong?
2. Since the DV is created as standard normal, should the effect of a predictor at baseline be in SD units (Cohen's d).
3. And shouldn't the slope parameters be in standard units?
Thanks! bac
 Linda K. Muthen posted on Tuesday, September 10, 2013 - 11:00 am

2. Yes.

3. Yes, if the predictors is also standardized.
 Bruce A. Cooper posted on Monday, September 23, 2013 - 4:16 pm
Thank you for your help with that model, Linda!

Now I'm learning how to carry out a Monte Carlo analysis for a growth model using an ordinal outcome. Simple pre- posttest design, testing difference in change for two randomized groups (equal Ns), and controlling for two continuous covariates as nuisance variables. I'm modeling equal proportions of cases in each of the 4 ordinal categories. Do you see any problems with the specified threshholds? Or any other problems I should be asking about?

i s | u1@0 u2@1 ;
[ i@0 s*.1 ] ;
i*.3 ; s@0 ; i WITH s@0 ;
[u1\$1*-1.1 u1\$2*0 u1\$3*1.1] ;
[u2\$1*-1.1 u2\$2*0 u2\$3*1.1] ;

i ON x@0 z1*.18 z2*.1 ;
s ON x*.824 z1*0 z2*0 ;
! OR for s ON x = 2.28, medium
 Bengt O. Muthen posted on Monday, September 23, 2013 - 5:28 pm
You don't want to do growth modeling with only 2 timepoints. The parameters won't be identified. I would just do an ancova type regression of post on pre and group.
 Bruce A. Cooper posted on Tuesday, September 24, 2013 - 3:51 pm
First post: Thank you for your recommendation, Bengt! That brings back memories of articles long ago that debated the best way to analyze data from a pretest-posttest control group design, and if I recall, your suggested method was one of the top two winners! (The other being the ANCOVA of the change score on the pretest!). I have used the commands below to do the analysis the latent growth/multilevel way, and obtained results for both the LGM and a MC analysis for the model. This model tests the 3-way interaction of two fixed predictors on the (linear) change from pre to post. I think this is analogous to the classic 2-way ANOVA. There was no problem with identification. Am I missing something? -bac

Model:
i s | alcuse0@0 alcuse2@1 ;
i ; s@0 ;
i WITH s@0 ;

i ON cpeer ;
gamma1i | i ON ccoa ;
[gamma1i] ;
gamma1i ON cpeer ;
gamma1i@0 ;

s ON cpeer ;
gamma1s | s ON ccoa ;
[ gamma1s ] ;
gamma1s ON cpeer ;
gamma1s@0 ;
 Bruce A. Cooper posted on Tuesday, September 24, 2013 - 3:58 pm
Second post: Aside from what the best method is for the pre-post analysis, I'm not sure how to do the Monte Carlo analysis using your suggestion, since it treats the pretest as a predictor. The analysis I wrote about is for an ordinal outcome at pre- and posttest, with 4 points on the scale. If I use the ANCOVA with posttest on pretest, how can I specify that the pretest predictor is ordinal? I thought that Monte Carlo predictors could only be normal or binary.

Next, in this MC analysis, if I want (say) equal proportions in the four categories, with the threshold defined as Ln[p/(1-p)], does this give the correct thresholds?
[u1\$1*-1.1 u1\$2*0 u1\$3*1.1] ;
! Equal proportions of 25%

Thanks! -bac
 Linda K. Muthen posted on Wednesday, September 25, 2013 - 11:49 am
When you fix the variance of the slope growth factor to zero and the covariance of the intercept and slope growth factors to zero, this is not really a growth model.

At the top of page 493 of the user's guide, you will find the formulas for probit regression. You can generalize these to logisitic regression using

F (t) = 1 / 1 + exp (-t)

Mplus Discussion posts are limited to 1500 characters. In the future, keep your posts within that limit.
 Andrew Grotzinger posted on Monday, November 25, 2013 - 2:41 pm
Hello,

I am running a semi-continuous growth model for a continuous outcome as in example 12.9. You note that the exponential function must be applied to continuous variables saved for subsequent two-part analysis. For our data the means in the output for replication 1 for the continuous variable appear to correspond with the means from the generated data. Am I correct in assuming then that the output is on the base e log scale as well? Also, if I am inputting means for the continuous variable for the model population command, should these already be log transformed or in there raw form? Thanks so much for your help!
 Bengt O. Muthen posted on Tuesday, November 26, 2013 - 11:56 am
If the data are saved using the DATA TWOPART command, the continuous variable has been transformed using elog, so to get back to the original variable exp needs to be used.

For montecarlo model population the variables should be in their original form and done in line with the montecarlo version of UG ex 12.9 (see website).
 Andrew Grotzinger posted on Tuesday, November 26, 2013 - 5:31 pm
When you say data saved for the DATA TWOPART command, are you talking about if we had done the analysis on real data and then used that for generation and coverage?
This is my fault for not expressing the issue clearly, but I think maybe I am still missing something. If in the Monte Carlo I use REPSAVE to save all the replications, the values for the continuous variable have been elog transformed according to example 12.9. What confuses me is if I input intercepts and mean values for the continuous variable in the Monte Carlo in their original form, they seem to be untransformed in the data files for the replications. For example, if I code [y1*20], then the mean of y1 in the replication .dat files, and in the output, is closer to 20, and not the elog of 20 (somewhere around 3) as I would expect. This made me think that I should be putting in elog transformed values in the model population command, but you seem to be saying otherwise, so I can't quite figure out what I'm missing. I've probably misunderstood the notes to example 12.9, but really appreciate any assistance in clearing up my confusion.
 Bengt O. Muthen posted on Thursday, November 28, 2013 - 1:58 pm
I agree that this can be presented more clearly. What the text on page 436 tries to convey is how a "Step 1" ("internal") Monte Carlo run relates to a "Step 2" ("external", DATA TYPE=MONTECARLO) Monte Carlo run, where I use the words Step 1 and Step 2 in line with the chapter 12 examples.

The Step 1 setup is shown in UG ex 12.9. The data generated are not obtained by a transformation (like log) but correspond to the parameter values given. But if you submit those data to a second run, say a Step 2 Monte Carlo run, the continuous variables will be by default log transformed when using DATA TWOPART. So to get the same parameter estimate results in Step 2 as in Step 1, you want to anti-log (exponentiate) the continuous variables in the data generated by Step 1 and that can be done using DEFINE in the Step 2 run. So the exponentiation is just to counteract the DATA TWOPART action in Step 2. Alternatively, the DATA TWOPART command in the Step 2 run can use the option TRANSFORM = NONE. I hope that makes it clearer.
 Anna King posted on Monday, September 15, 2014 - 8:06 am
Hello Professors,

Sorry about my stupid question. When I'm doing a simulation study to compare the influence of different estimation methods (e.g, one-step and three-step methods) on covariate effects, should I generate one set of data for later analyses? Thanks!

Anna
 Linda K. Muthen posted on Monday, September 15, 2014 - 10:17 am
Yes, please see Web Note 15.
 Eric Deemer posted on Tuesday, November 18, 2014 - 11:20 am
Hello,
I'm trying to do a monte carlo simulation for a latent growth model and I get the following error message:

"Only x-variables may have cutpoints in this type of analysis.
Cutpoints for variable tx were specified."

Do predictor variables have to specified in a certain order in the NAMES command? I'm not sure where I went wrong.

Eric
 Linda K. Muthen posted on Tuesday, November 18, 2014 - 1:20 pm
Predictor variables must be on the right-hand side of an ON statement. If this does not help, please send the output and your license number to support@statmodel.com.
 Eric Deemer posted on Tuesday, November 18, 2014 - 3:42 pm
Hmm, okay. I have my predictor on the right side of the ON statement. I'll send my output.

Eric
 Bengt O. Muthen posted on Tuesday, November 18, 2014 - 5:54 pm
Perhaps you mentioned the variance of tx in the Model part, changing it from an X to a Y.
 Eric Deemer posted on Tuesday, November 18, 2014 - 6:16 pm
Thanks, Bengt. Linda helped me. I had the mean and variance of the predictor specified in the MODEL command when I shouldn't have.

Eric
 Anne Black posted on Monday, August 08, 2016 - 8:20 am
Hello,
I am using the Monte Carlo feature to estimate power for a linear growth model, and want to test power at alpha=.017 instead of the standard .05. Is there an option to change the alpha level?
 Bengt O. Muthen posted on Monday, August 08, 2016 - 9:48 am
No.
 Anne Black posted on Monday, August 08, 2016 - 10:01 am
Thank you!
 Chunhua Cao posted on Wednesday, September 20, 2017 - 11:20 am
Dear Dr. Muthén,

My colleague and I are working on a simulation study to estimate treatment effects in interrupted time series designs using latent growth model and Bayesian estimation (default priors). We are interested in small sample sizes. We varied the total series length for baseline and treatment phases at 6,8,10 …,60 (same length across phases). For each series length, sample size was fixed to be one more than the total series length (i.e., 7 to 61), which we believe is the minimal sample size needed to ensure convergence. The treatment effect is estimated as the difference in the intercept factor mean between phases (assuming no slope) with “model constraint”. We ran into two problems:

a)Series length 10 had large negative absolute bias (-.129) in the treatment effect, but other lengths had negligible bias (ranging from -.022 to .034).
b)When we examined the 95% CI coverage, some series lengths had much lower coverage (e.g., series lengths 52 and 30 had coverages .863 and .918, respectively) than others (ranging from .939 to 1.000).

We wonder why these particular series lengths behaved in this way. We tried increasing the number of replications and Mplus 8 but results remained close to what we originally got with 1000 replications and Mplus 7.3. It would be great if you can share your insights. Thank you for your time and help!

Chunhua
 Bengt O. Muthen posted on Wednesday, September 20, 2017 - 4:24 pm
Why so few subjects - e.g. having only 11 subjects observed over 10 time points?
 Chunhua Cao posted on Friday, September 22, 2017 - 9:06 am
Thank you for your question, Dr. Muthén. As far as we know, one more than the total series length is what we need to ensure a proper posterior distribution of factor variances and covariances, if we use the default Inverse Wishart prior. Earlier we tried even smaller sample sizes (e.g., 8 subjects with 10 time points), but got this error message, "THE SAMPLE SIZE PLUS THE PRIOR DEGREES OF FREEDOM OF PSI MUST BE GREATER THAN THE NUMBER OF LATENT VARIABLES".

We also have larger sample sizes (e.g., 80 subjects over 10 time points). We want to examine the bias and 95% CI coverage of the treatment effect estimate under extremely small and relatively large sample sizes.

We observed outliers (in terms of bias and 95% CI coverage) for some series lengths (e.g., 10, 30, 52 time points). But there is no clear pattern to explain why they occurred. Do you have any recommendations on this issue?

Thank you for sharing your insights!

Best,
Chunhua
 Bengt O. Muthen posted on Friday, September 22, 2017 - 4:51 pm
You can explore what's going on by saving the results and data from all reps with REPSAVE=ALL; and SAVEDATA. Look at histograms of the parameter of interest from all reps - look for extreme values in a few reps that could drive the average estimate.
 Chunhua Cao posted on Wednesday, October 11, 2017 - 5:19 am
This is a follow-up post on the treatment effect estimates with small sample sizes. I asked why a particular series length had much larger bias than others.

Thank you very much for your suggestions, Dr. Muthén. Following your suggestion, my colleague and I checked the distribution of the treatment effect estimate for all reps for the problematic condition (series length 10, sample size 11). The distribution was normal with mean centered at -.129. Looking across several adjacent series lengths, bias for series lengths 6, 8, 10, 12, and 14 were .028, -.022, -.129, .003, and .002, respectively.

We tried a couple of things: 1) in addition to series length 8, 10, 12, we checked series length 9 and 11 to have a more closed examination of the bias; 2) for series length 10, we used informative priors for the variance-covariance matrix where the priors came from the FIML estimation of the same series length but more subjects; 3) in addition to 2), we added informative priors for the intercept factor mean of the baseline and treatment phases with priors coming from the same FIML estimation; 4) changed the population treatment effect to a different value (e.g., zero).

However, series length 10 was still the only condition that had much larger negative bias compared with other conditions. Could you please provide some insights and suggestions? Thank you very much!

Best,
Chunhua
 Bengt O. Muthen posted on Wednesday, October 11, 2017 - 2:35 pm
I suggested checking if a particular replication obtained a much larger value than other replications. It sounds like that was not the case. I have no other suggestion or insight; it is very hard to understand sampling behavior for such a small number of subjects.
 Chunhua Cao posted on Thursday, October 12, 2017 - 10:53 am
Thank you very much for your time, Dr. Muthén. I completely agree with you. It is hard to understand the behavior for this extremely small sample sizes. I appreciate your help.

Best,
Chunhua
 Denise Kerkhoff posted on Thursday, April 05, 2018 - 7:26 am
Hello,

I would like to generate data for a two-level growth curve (three-level analysis) with
measurements y1 to y4 on level 1, a level-2 predictor x, a level-3 predictor w, and their
cross-level-interaction. The population model is as follows:

MODEL POPULATION:
%WITHIN%
iw sw | y1@0 y2@1 y3@2 y4@3;
y1-y4*1;
int | iw ON x;
iw*.05; sw*.05;
x*1;
%BETWEEN%
ib sb | y1@0 y2@1 y3@2 y4@3;
y1-y4@0;
int ON w*.3;
w*1;
ib ON w*.3;
ib*.035; sb*.035;

I can specify the main effect of w on the intercept (ib ON w*.3;) and the cross-level-interaction-effect
(int | iw ON x; and int ON w*.3;). However, I seem to not be able to specify the main effect of
x on the intercept: specifying int | iw ON x*.3; gives an error message.

How can I specify the main effect of x on iw? Is it specified by the intercept of "int"?

Thank you very much
Denise
 Bengt O. Muthen posted on Thursday, April 05, 2018 - 2:40 pm
Yes, the main effect is [int] on Between because that is the part of the random slope that isn't influenced by w.

Saying

int | iw ON x*.3;

doesn't make sense because int is a variable that has a mean, a variance, and relations to other variables (so many parameters affiliated with it).