Message/Author 


I am working on a multigroup meanstructure analysis. There are 2 groups (boys and girls) who rated their mothers on a 25 item  4 point (strongly agree, agree, etc) rating scale. The items are very skewed. The scale consists of three subscales or factors. By simply adding items to the subscales girls (on average) rate their mothers better than boys on all three subscales. I have been perplexed by the results produced from different estimators when testing the latent means. In particular with WLS (when factor loadings and threasholds are invariant between the groups) one of the latent means goes negative indicating that girls (the second group) rate their mothers more negatively than boys on this factor despite the raw data saying the opposite. Any ideas on why or how this happens would be much appreciated. 


The only thing that comes to mind is that perhaps girls are not the second group. Do they have the higher code on the gender variable? If so, can you send your input or output and data so we can take a look at it and give you a better answer? 

Anonymous posted on Wednesday, June 01, 2005  2:21 pm



I don't know why there are differences between MPLUS probit regression and STATA probit regression. Is it because the default MPLUS probit is estimated by weighted least square while STATA probit is estimated by maximum likelihood? If I specify "ANALAYSIS: ESTIMATOR=ML," then the coefficient and s.e. of the MPLUS logistic regression are the same as the STATA logit regression. Can I get the same results of probit regression in both MPLUS and STATA? Thanks! 

Anonymous posted on Wednesday, June 01, 2005  2:23 pm



I don't know why there are differences between MPLUS probit regression and STATA probit regression. Is it because the default MPLUS probit is estimated by weighted least square while STATA probit is estimated by maximum likelihood? If I specify "ANALAYSIS: ESTIMATOR=ML," then the coefficient and s.e. of the MPLUS logistic regression are the same as the STATA logit regression. Can I get the same results of probit regression in both MPLUS and STATA? Thanks! 

bmuthen posted on Wednesday, June 01, 2005  5:59 pm



The Mplus "Sample Statistics" (requesting sampstat in the output) gives ML probit regression with a single dependent variable  this should agree with STATA. These sample statistics represent the first stage of the Mplus weighted least squares estimator. 


Dear Linda and Bengt, I have a few questions concerning categorical data and the TYPE=TWOLEVEL option. 1. Is it true that Mplus uses a logistic regression for all multilevel analyses (TYPE=TWOLEVEL) with a categorical outcome variable, because estimators available are MLR, ML and MLF, and not WLSMV? Is it therefore correct to interpret the beta coefficient as the log odds ratio? 2. In my model I would like to correlate the errors of my two dependent variables, of which one is normal and the other categorical. Is that somehow possible with the option TYPE=TWOLEVEL, or is the only way out using the options TYPE=COMPLEX with ESTIMATOR=WLSMV? 3. Do you have any plans to make it possible to use censored data with TYPE=TWOLEVEL in Mplus in the future? Thank you very much in advance! Kind regards, Marleen de Moor 

BMuthen posted on Monday, September 05, 2005  2:46 pm



1. Yes. 2. You cannot use WITH to specify a residual covariace when one or more outcome is categical in TWOLEVEL analysis with maximum likelihood. You could consider putting a factor behind the two variables as shown in Example 7.16. 3. Yes. 

Sally Czaja posted on Thursday, October 12, 2006  1:58 pm



I am testing a path model with 1 independent variable predicting 2 intermediate variables which predict a dependent variable. Each of the endogenous variables has 24 control variables. One of the intermediate variables is dichotomous, which makes the default estimator WLSMV. I’ve read in the MPlus manual and discussion board that this gives a probit regression and that I can specify the estimator as ML to get logistic regression, which makes sense for the dichotomous DV. But what kind of regression is done with the continuous DVs (i.e., what are these path coefficients/how are they to be interpreted?)? (continued in 2nd post) Sally 

Sally Czaja posted on Thursday, October 12, 2006  2:10 pm



(continued from prior post re path model with 1 IV predicting 2 intermediate variables which predict a DV) The path coefficients differ, sometimes substantially: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Coefficients For the path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . WLSMV. . . . . ML from IV to the dichotomous variable,. . . . . . . . . . .04 (n.s.). . . . .15 (p<.001) from dichotomous variable to the final DV,. . . . . .28 (p<.001). . .53 (p<.001) from IV to the other intermediate variable. . . . . . .10 (p<.01). . . .07 (p.05) from other intermediate variable to the final DV,. .20 (p<.001). . .14 (p<.001) What accounts for these differences?. They are both more & less than the approx. 1.7 scale difference between logistic and probit.. I would have thought the pattern of significance would be the same, even with different methods. Finally, on what basis do I choose an estimator?. The dichotomous variable has a 76/24 split and skewness & kurtosis statistics are n.s., which suggests it could be treated as normally distributed.. But if I don’t declare it categorical, the fit becomes awful. I’d really appreciate your help in understanding this area. Sally 


The regression coefficients for the continuous dependent variables are simple linear regression coefficients. The coefficients will differ between WLSMV and ML because one is probit and the other is logit. They are on a different scale. You should be comparing the ratios. I would choose WLSMV with a 76/24 split. 

Sally Czaja posted on Friday, October 13, 2006  12:46 pm



Hi Linda Sorry, but what ratios are you referring to in your 2nd paragraph? Could you elaborate on why I should use WLSMV? I'll have to explain this to someone else. Thanks. 


The ratio of a parameter estimate to its standard error. It is the third column of the results. It seems you want residual covariances. You can't have more than four with maximum likelihood because a model with four dimensions of integration is probably the maximum you can estimate. This is why I recommended WLSMV. 

Sally Czaja posted on Monday, October 16, 2006  12:37 pm



Hi Linda Thank you for your quick responses last week. I have 2 more related questions: If, as I understand, the coefficients for predictors of continuous DVs are simple linear regression coef. regardless of the estimator (WLSMV or ML), shouldn't they be identical? For 2 paths, I get .20 in WLSMV vs .14 in MLR (both p<.001); and .13 (p<.01) in WLSMV vs .05 (p<.05) in MLR (and smaller differences on other paths). Also, for a predictor of the dichotomous variable, MLR gives an OR of 2.26 and est./SE of 4.92, while WLSMV gives an OR of 2.55 (using exp(Estimate*1.7)) with est./SE of 2.98. Should they be this far apart? Thanks for your help. 


They should be the same. You would need to send me your inputs, data, outputs and license number to support@statmodel.com for me to see why they are not. Odds ratios cannot be computed for probit regression coefficients. 


Hi From version 5, I see WLSMV can be used with TWO LEVEL. Are the loadings using CFA, categorical variables & no covariates probit coefficients? Why I cannot conduct multigroup analysis with TWO LEVEL CATEGORICAL CFA & WLSMV? Is my only alternative to use integration in that case? Thank you very much. 


Your only option in this case is numerical integration. 


I also cannot conduct analysis with integration: I am requested to use KNOWNCLASS & MIXTURE. Why? 


When numerical analysis is required, multiple group analysis uses the KNOWNCLASS option and TYPE=MIXTURE. 


I am conducting multiple logistic regression on a binary outcome. I have missing data, so I am allowing the default to use missing data theory, and I also included INTEGRATION=MONTECARLO;. I would like to get unbiased estimates of confidence intervals and I know that I can’t use bootstrap CI when I am using the montecarlo integration. For logisitic regression, there two options for estimation procedures (ML & MLR). For both of these, I asked for confidence intervals in outcome. When I use ESTIMATOR = MLR I get the same point estimates then when I use ESTIMATOR = MLR. So I assume that I get log odds (or odds ration) for either ML estimator. However, I get different standard errors, which estimator should I use? 


What I meant to ask: When conducting multiple logisitic regression with missig data, which estimation procedure would give me the least bias estimates of the standard errors (or confidence intervals)? Thanks 

Paul Silvia posted on Wednesday, June 17, 2009  5:59 am



When ML and MLR diverge in their SE estimates, MLR is generally more trustworthy. Broadly, though, this is often a sign to explore residuals, distributions, and possible influential cases. 

Cecily Na posted on Monday, February 07, 2011  3:07 pm



Hi Professors, I am new to Mplus. I used the syntax MODEL = BASIC; Estimator = ML to generate a covariance matrix in Mplus. It was not same as the one produced in SPSS. What's the reason (suppose I treated all variables as continuous)? Also, when can I use ML? Can I use it for ordered categorical variables? Thanks! 


It is likely that the sample sizes are not the same. If they are, you may be reading the data incorrectly and should send the problem along with your license number to support@statmodel.com. Yes, ML can be used for ordered categorical data. See the ESTIMATOR option in the user's guide where there is a table that shows the cases when each estimator can be used. 


Hi, An article named "propensity score adjustment for multiple groups SEM" (Hoshino,Kurata & Shigemasu, 2006) uses weighted M estimator. Weights are propensity scores. I wonder if WLS estimator does the same job? Thanks. 


The Mplus WLS estimator is not based on propensity scores. M estimators are sometimes connected with GEE. The connection between GEE and WLSM is shown in Muthén, B., du Toit, S.H.C. & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Unpublished technical report. which is on our web site under Papers, SEM. 


Perhaps this can be done using weighted ML, which we call quasiML in some of Asparouhov's writing on complex survey data analysis on our web site? 

burak aydin posted on Wednesday, May 11, 2011  4:06 pm



I made some further search and figured out that residual based GLS estimator is what I need. I know Mplus has traditional GLS estimator. Is there a way to modify GLS estimator to residual based GLS estimator? (Yuan&Bentler,1997,mean and covariance structure analysis: theoretical and practical improvements) Furthermore, I d like to learn if there is an estimator which is robust to both nonnormality and outliers? Thanks. 


Don't know the answer to that. The Mplus GLS does not allow weights. Outlier detection is available in Mplus  see the UG. MLR is in principle robust to model misspecification, but how well that works with outliers I'm not sure of. 

Heike B. posted on Thursday, October 20, 2011  3:28 am



Dear Dres. Muthen, I intend to build a manifest path model containing two exogenous variables and 5 endogenous variables. Three of them are mediators. The observed variables are means from fourstep likert scales (two variables actually are single items). That's why I wanted to treat the data as ordinal. My sample is small (230 objects), the data is skewed and not normaly distributed. I tried to estimate the model using WLMSV, however now I would like to add an interaction. Besides one endogenous variable ended up with eleven categories, so MPLUS did not allow to decleare it as categorical. Given all this  1. which estimator would you recommend? 2. if an ML based estimator is recommended, should I declare all my variables as continous? Many thanks in advance. Heike 


If the original Likert variables have floor or ceiling effects, I would not recommend summing them. I think you want an interaction between two observed variables. You can create that as the product of the two variables using the DEFINE command. Both weighted least squares and maximum likelihood estimation can be used with categorical dependent variables. 

Miho Tanaka posted on Monday, February 13, 2012  11:01 am



Hi, I have been working on a SEM for my dissertation. The primary outcome in my model is a binary (whether participant did a hepatitis B screening or not). Predictors are three latent variables by nonnormally distributed continuous factor indicators. By default, Mplus uses WLSMV estimator for both structural and measurement part. I would like to know what is happening to the measurement model if I allow the default estimator (WLSMV). That is WLSMV is used to nonnormally distributed continuous factor indicators. For CFA (only for the measurement part), I may chose to use MLR, rather than WLSMV. Is there any significant difference by these two estimators? I understand both estimators are robust to nonnormality. Thanks for your advice. 


WLSMV is not robust to nonnormality of continuous variables. I would use MLR. 


Hello, I am doing a path analysis. i have 5 intermediate continuous variables and one dependent variable. I am not sure which type of estimation i should use? Thanks Owis 


I would use ML or MLR. 


Hi again, Thanks for your response. I used MlR and i got this error message: "*** FATAL ERROR THIS MODEL CAN BE DONE ONLY WITH MONTECARLO INTEGRATION." is that because i have missing values? Thanks Owis 


Yes, you must have missing values on a mediator. Add INTEGRATION=MONTECARLO; to the ANALYSIS command. 


Ok, if i removed the missing, can i use MLR or ML estimator? i dont want to use WLSMV. Thanks Owis 


When you add Integration=MonteCarlo you are still doing ML/MLR, it's just that you specify a certain algorithm for doing it. Your dependent variable must have been categorical or count, in which case missing on mediators leads to numerical integration with MonteCarlo when using the ML or MLR estimator. 


Actually my independent variables have these missing values. Thanks a lot Owis 


Hello again, i used Integration=MonteCarlo and ML/MLR estimator but i didnt have ChiSquare Value and RMSEA in the output, is it normally? also, i got a different results (i.e. different direction of relationships between variables) in ML/MLR versus WLSMV! 


When means, variances, and covariances are not sufficient statistics for model estimation, chisquare and related fit statistics are not available. Please send the two outputs and your license number to support@statmodel.com. 


When i use WLSMV estimation, i get the fit statistics. i am using my supervisor program, both of us dont know the license number. where is it written usually? Thanks Owis 


With WLSMV, the statistics for model estimation are thresholds and correlations. You can login to your account on the website and see it. 


Sorry for bothering you, but does that mean with WLSMV, i get a wrong result? i got a good fit model with WLSMV! Thanks Owis 


We don't make a habit of giving wrong results. WLSMV gives chisquare and related fit statistics. 


One more question, so with WLSMV, we get chisquare and related fit statistics while with ML/MLR we dont, is that true? Also, if i use ML or WLSMV i get similar result, isnt it? that what i understood from your video! Thanks Owis 


To understand the different aspects of testing model fit in this situation, see Muthén, B. (1993). Goodness of fit with categorical and other nonnormal variables. In K. A. Bollen, & J. S. Long (Eds.), Testing Structural Equation Models (pp. 205243). Newbury Park, CA: Sage which is paper #45 at http://pages.gseis.ucla.edu/faculty/muthen/full_paper_list.htm This chapter makes the distinction between testing the underlying structure (as WLSMV does) versus testing the model against the data (which isn't always feasible as presumably in your case). 


ML and WLSMV tends to give similar results when the missing data are MCAR (missing completely at random) or MAR as a function of covariates. 


Hi I am running a simulation study with categorical indicators using the BAYES estimator, I have heard that Mplus uses two methods for handling categorical variables: tetrachorical correlation and direct ML. In the specific case of using the BAYES estimator, which method uses Mplus? thank you 


Bayes does not use tetrachorics and does not use ML. But like ML, Bayes is a "fullinformation" estimator that uses all available data in an optimal way. It is equivalent to ML in its missing data handling. Bayes is an estimator in its own right. So Mplus offers 3 major estimators: WLSMV (which builds on tetrachorics/polychorics), ML, and Bayes. 


Hello, I would like to ask a technical question with Mplus. When i use WLSMV estimator, i get ChiSquare, RMSEA, and CFI values automatically. My question is: can i get a ChiSquare, RMSEA, and CFI values with ML estimator? Thanks Owis 


With maximum likelihood and categorical variables means, variances, and covariances are not sufficient statistics for model estimation. Because of this, chisquare and related fit statistics are not available. 


Dear all, I have some questions regarding the ODLL algorithm implemented in Mplus. I’m running a large IRT model including many nonlinear parameter constraints (around 700). ML estimation on basis of the EM algorithm is no longer feasible and the constraints are not supported in the Bayes framework. I’ve tried out different algorithms and found out that ODLL (in combination with MLF) works well in reasonable time. Unfortunately, I was not able to find any documentation of the ODLL algorithm. I only found out that ODLL optimizes the observed data likelihood directly. Is ODLL something like JML (Joint Maximum Likelihood)? Is ODLL an iterative algorithm (Tech 8 doesn’t report an iteration history for ODLL)? What is ODLL exactly doing? Are there any references about this algorithm that might be cited in a manuscript? What about the performance of ODLL relative to other algorithms, such as EM? I suspect that there might be some reasons that the much slower EM algorithm is routinely used in the IRT framework. Thank you for your help! 


ODLL stands for Observed data loglikelihood. The algorithm optimizes the loglikelihood using the QuasiNewton method. http://en.wikipedia.org/wiki/QuasiNewton_method You can look at Tech5 for the iterations. Use the Mplus manual as a reference. My experience is that in most cases (but definitely not in all cases) the default EMA algorithm is faster. EMA actually contains ODLL within it and is occasionally deployed. My suggestion is to spend time simplifying your model constraints. There are 3 types of constraints listed in order of complexity 1) New parameters = function of model parameters 2) Dependent parameters = function of independent parameters 3) anything else Try to use 1 and 2 as much as you can instead of 3. Model constraints can be written in many different ways and using the most optimal way can improve the estimation dramatically. 

Anna posted on Sunday, June 02, 2013  11:21 pm



Hello, I have a model with five observed variables, A, B, C, D, and E. E is categorical. The model proposes an indirect link, A>C>D>E, while B moderates the A>C path. There are missing values on A, B, C, D. Sample size is around 250. I would like to know which estimator is more appropriate for testing this kind of model: categorical outcome, aims to test moderated mediation effect, has missing values. I have tried WLSMV, MLR, and BAYES. The results estimated through these three estimators are actually comparable, and the fit indices in WLSMV and the Bayesian PPC and PSR indicate good fit. I tend to favor Bayesian estimation because it handles missing data well and it does not require normal distribution. But I am not sure to what extent it is favored against the other two estimators in my situation. (I don't have specific estimation of the priors.) Thank you very much for your help! 


Bayes and missing data handle missing data in the same way. I would choose them above WLSMV if there is a lot of missing data. You can use noninformative priors in Bayes. 

Anna posted on Monday, June 03, 2013  11:44 am



Dear Linda, Thank you! I would like to ask more about these estimators. Beside the difference in handling missing data, are there any other concerns in choosing among these methods? 1. Is WLSMV robust for models with interaction terms and nonnormal distribution of indirect effects (e.g., a*b term)? I read the Muthen, du Toit, and Spisic (2007) technical report and I think that WLSMV often underestimates SE when the sample is small and skewed. 2. I also wonder if I should correlate the IVs with the interaction term (and perhaps correlate the exogenous covariates) because WLSMV does not automatically do so in the sequential modeling. 3. For MLR, since bootstrapping is not allowed with numerical integration, will this be a big deal for estimation of indirect effects with nonnormal distribution? Thanks! 


1. You can use bootstrap with WLSMV. 2. The model is estimated conditioned on the exogenous variables. Their means, variances, and covariances should not be mentioned in the MODEL command. To obtain these values, do a TYPE=BASIC with no MODEL command. 3. If they have a nonnormal distribution, this will not be taken into account. 

db40 posted on Friday, August 29, 2014  6:59 am



Hi Linda, when estimating a model using the Bayes estimator and outcome variables are specified as binary  are the parameters linear or logit or? 


Probit. 

db40 posted on Sunday, August 31, 2014  5:21 pm



Linda thanks for clearing that up. Might I ask if once probit is standardized it gives comparable estimates as logistic? 


No. You should listen to our Topic 2 course video on the website where probit and logistic regression are discussed. 


Hi Dr Muthen Sorry to have another question but I wanted to confirm my analysis method as I finally write up my results. In my SEQ model my observed variables are continuous but my dependent/outcome is categorical (binary). I read that WLSMV is the preferred option for analysis of categorical outcomes and this has successfully provided me with the different measures of model fit. However, I am wanting to compare several different models. These models all have different variables making the diff test not an option. Normally I would use AIC, as this takes into account the number of variables, but this is not available under WLSMV. I understand that this is available using ML (although fit indices would not be) but this option is not available to select when I setup my analysis.It seems like either method does not provide some of the information I would normally used in model comparison. Could you please advise how I might best compare models given the lack of AIC? I have more than one model that meets the criteria for a good fit. I could then consider rsquared and account for the number of variables in the model. Would that suffice? I just need to be able to justify my choice of model. thanks in advance for your help 


I don't know if by "these models all have different variables" you mean covariates or DVs. If the latter, AIC cannot be used because the different models have AIC in different metrics. If the former, for WLSMV you can use the largest set of covariates for all models and fix to zero the slopes of the covariates not included in certain models. 

Back to top 