Anonymous posted on Tuesday, July 06, 2004 - 2:06 pm
Hello I am writing a paper in which I estimated a mixture model used the randomization procedure new to M-Plus 3.0. I want to describe specifically how the start values are generated, but cannot find documentation either in the new manual or in the online technical appendix. Can you give me some information regarding how this is done? Thank you.
We have not yet written up all of the technical information related to Version 3. As we do, we will add it to the Technical Appendices on the website. If you look under STARTS in the Index of the Mplus User's Guide, there is a brief description.
Anonymous posted on Friday, July 23, 2004 - 7:19 am
I really appreciate the addition of random starts to Mplus. A couple of questions so that I understand how to use this better:
In the manual it indicates that random starts are random perturbations around the user-specified or automatic start values for all parameters except variances and covariances. So are variances and covariances held constant at the user-specified or automatic values?
Also, the default for the STSCALE is 5 -- does that control the dispersion of the random perturbations? Is it the width of a uniform distribution, or is the metric in sds of a normal distribution, or something else? 5 is indicated to be a medium value, but I don't understand the scale.
Anonymous posted on Friday, July 23, 2004 - 10:11 pm
The variances and covariances will get the same starting values across the different perturbation runs.
STSCALE controls the dispersion of the random perturbations by multiplying the a perturbation in the sds metric by a uniform distribution with that width. The rule of thumb is that if the default scale doesn't produce enough diverse solutions you would increase STSCALE, if it produces too many improper solutions, like classes collapsing and singular variances, you would decrease it. You can also get the perturbed starting values with OPTSEED and MITER=1. Tihomir
Anonymous posted on Friday, November 18, 2005 - 6:42 am
I'm am trying to use the OPTSEED option to specify a start seed for a latent class analysis but cannot get MPLUS to run more than the default 500 iterations.
I've tried using MITERATIONS = 2000 but this doesn't seem to work with the OPTSEED option
Please send your input, data, output, and license number to email@example.com so we can see what is happening. Also, include the output where you found the seed that you are using.
anonymous posted on Thursday, March 02, 2006 - 5:16 am
I have a question with regard to starting values. I have been running LCA with different sets of starting values in order to examine whether there exist different local maxima. i am a little unsure as to how to evaluate the tech8 output. do i simply check the column labelled 'loglikelihood at local maxima', and then examine whether there exist vast differences between the values? in all my runs (with different sets of starting values), the loglikelihood values in that colum are almost identical and the estimated loglikelihood value listed along with the fit statistics is the same across all runs. does this mean I can be confident in the obtained solution in that it is not reaching too many different local maxima?
I'm not sure if you are changing the starting values yourself. It sounds that way given that you are looking at TECH8 for each solution. You can use the STARTS option which will randomly generate sets of starting values. This option and related options are described in the user's guide on pages 436-438. On pages 325-328, you will find a description of how to know if you have found a good solution. Note that the new user's guide is available online.
I have a question concerning the Mplus output with regard to random starts. I noticed that while Mplus versions 3 and 4 always provided the loglikelihood values, seeds, and initial stage start numbers for *all* sets of starting values (initial + final stage of the optimization), Mplus 5 provides this information only for the final stage of the optimization. Is there a way to make the initial stage starting value information available in Mplus 5 in addition to the final stage values (other than TECH8)?
(I am teaching LCA with Mplus 5 and for didactic reasons, it would be nice to have both sets in the output to illustrate what is done in order to avoid local maxima.)
We decided to delete the initial starts because with faster computers and more complex models many starts are being used and the output is lengthy. Plus we typically didn't look at the values of the initial starts. For instance, the final set of values tells us how many fewer random starts we might have been able to get away with. There is not a way to make them available in Version 5. So the pedagogical presentation would have to draw on displays outside the output.
I am running "LCA with covariates" model. The model estimation terminated normally. However, I got the following warning "WARNING: THE BEST LOGLIKELIHOOD VALUE WAS NOT REPLICATED. THE SOLUTION MAY NOT BE TRUSTWORTHY DUE TO LOCAL MAXIMA. INCREASE THE NUMBER OF RANDOM STARTS."
I tried with four different STARTS: (500 10), (1000 10) (500 20), (1000 20) with STITERATIONS=20, however I keep getting the same warning.
With (1000 10) the Log-likelihood values at local maxima, seeds, and initial stage start numbers are the following:
Regarding "Anonymous posted on Friday, July 23, 2004 - 10:11 pm ", what does "diverse soultions" mean in the sentence "the default scale doesn't produce enough diverse solutions"? Diverse seeds? diverse LL? or something else? Thanks!!
Diverse LL values. - Which is a function of the diversity of the starting values.
Ruixue Wang posted on Tuesday, February 15, 2011 - 12:37 pm
Hi,I have couple of questions about tech8 and starts value. 1.RANDOM STARTS RESULTS RANKED FROM THE BEST TO THE WORST LOGLIKELIHOOD VALUES
Final stage loglikelihood values at local maxima, seeds, and initial stage start numbers:
-4545.804 569131 26 -4545.804 608496 4 Here I know the final stage use the starts number 26 and 4 from initial stage. But what's seeds, how does mplus generate seeds? Can I use seeds to identify the exact starting value? 2.In tech8 ECHNICAL 8 OUTPUT FOR STARTING VALUE SET 1
ITER LOGLIKELIHOOD ABS CHANGE REL CHANGE CLASS COUNTS ALGORITHM 1 -0.60111874D+04 0.0000000 0.0000000 339.588 160.412 EM 2 -0.45545041D+04 1456.6833360 0.2423287 337.311 162.689 EM 3 -0.45536646D+04 0.8394853 0.0001843 335.247 164.753 EM How can I find what's the starting value in set 1? Is it random value? Can I specify the starting value? The iter is 3. Does Mplus run 3 iter only or it stops when the abs change is small enough?
The seed is a random variable that determines what the starting values will be. So for your first LL value of -4545.804 the seed is 569131. You can use this seed in a new run to get those starting values, saying
OPTSEED = 569131;
You can see the starting values in the Tech1 output.
Mplus runs iterations until the first-order derivatives are small enough.
Ruixue Wang posted on Tuesday, February 15, 2011 - 2:22 pm
thank you. Is that possible to decide what the start value by myself?
Dear Bengt and Linda, I have been trying to compare a 2 versus a 3 class mixture model. The 2 class model converged appropropriately, showed a repetition of the best log likelihood value, had lower BIC, and a had significant bootstrapped likelihood ratio test all favoring the 2 versus 1 class model.
To attempt to obtain a repetition of the lowest log likelihood value for the 3 class model, I increased the initial random starts to 1000 and final stage optimizations to 10. I also increased the initial stage iterations to 20. This still did not result in a repetition of the lowest log likelihood value.
Therefore, I followed the suggestion in the user guide and used the optseed command to examine the parameter estimates in the model solutions. These model solutions showed different estimates for each seed. The Mplus user guide suggests that this indicates that the model is not well-defined, possibly due to there being too many classes.
Are there any other steps or recommendations that you would have to trying to acheive a replicated lowest log likelihood value before I conclude that the 3 class model is not well-defined and go with the 2 class model?
Linda, thank you for your reply. I was able to acheive model replication by trying your suggestion of increasing the starts to 2000 and final stage optimization to 500. I will keep this ratio in mind for future analyses.
Stata posted on Thursday, March 08, 2012 - 7:06 pm
Dear Bengt and Linda,
I am trying to follow the example in 7.6 for my study. 1) How should I determine starting values? In addition,the example in the manual also assign negative values to some variables but not others. I can't find further explanation about this in the manual. 2) Does Mplus automatic starting values with random starts take care of the problem associate with "converge on local solutions"
1. This example uses starting values but not random starts. This would usually occur when an analysis in in the final stages and one does not want a lot of random starts so uses starting values and no random starts to speed things up. You would take the starting values from the analysis.
3. Using random starts and the Mplus default starting values will help to avoid local solutions.
Stata posted on Saturday, March 10, 2012 - 9:05 am
Dear Professor Muthens,
I previously did not mention that there are 60 variables with 4-point, 3-point, and binary indicators in my study. When I used default starting values with random starts, I got the following message:
WARNING: THE BEST LOGLIKELIHOOD VALUE WAS NOT REPLICATED. THE SOLUTION MAY NOT BE TRUSTWORTHY DUE TO LOCAL MAXIMA. INCREASE THE NUMBER OF RANDOM STARTS.
I am really confused with deciding threshold starting values. The manual use 1 and -1 or binary; 0.5 and 1, -0.5 and 0 to three-category indicators. I am not sure if I assign correct threshold starting values to my 9 indicators with 4-point scale:
Julia Lee posted on Tuesday, April 17, 2012 - 7:30 am
I am running an LTA with 5 classes and I am using STARTS = 800 40; The output did not include the final stage loglikelihood values. Is it because the model was not terminated normally? I did not see the message about normal termination. From covariance, the output jumped straight to model fit information. Thank you for your advice.
Hello, I am testing several latent class mixture models with categorical indicators for 2 latent factors. (u1-u21 has 3 categories and u22-36 has 4 categories) Sample set up code for allowing the variance and covariance to vary across classes looks like this: Analysis: Type=mixture; algorithm=integration; integ=7;estimator=mlr; starts= 1000 250; %overall% f1 by u1-21; f2 by u22-36; [f1-f2@0] %c#1% f1-f2; f1 with f2; %c#2% f1-f2; f1 with f2; %c#3% f1-f2; f1 with f2;
I have had success with having the means vary (variance and cov invariant across classes) but I am trying to determine if there is a way to use my start values from the basic latent class model to speed up the analysis? I know using the SVALUES gives this information and I have used it previously to change my reference class. Are these start values usable for the more complex variations?
Thanks for the quick response. I had a feeling that would be the case. I am looking at different variations- my next model frees factor loadings, thresholds, and variances. I was just concerned because with the current model, 4 classes is taking upwards of 8-10hrs and I assume the less restrictions I place the longer the model will take.
Dr. Muthen- Just to ammend/update my previous post (and to thank you) part of my 10hr run was a technical problem i just realized. Also per your comment on being too restrictive, I started to think that that was the reason the loglikelihoods were not replicating even with max random starts. I allowed the thresholds and means to vary and the model completed with LL replicated within 3 hours for 4 classes. Does this just mean my data does not fit the more restricted models? And I would report that the LL were not replicated in the restricted models? Thank you!!