ML with Numerical Integration vs. WLSMV PreviousNext
Mplus Discussion > Categorical Data Modeling >
 Scott Weaver posted on Saturday, May 22, 2004 - 10:15 am
Drs. Muthen & Muthen,
I plan to estimate a full structural equation model (with missing data) with 11 latent factors (all continuous) for 4 groups (crossed 2x2 of gender and ethnicity - 200-300 subjects per group). 8/11 latent factors are represented by categorical indicators (4-11 per factor). Because most categorical indicators have only 3-4 response options with some showing floor effects, I don't believe that I have the option of treating these variables as continous and using MLR.
One option that I have (which I've used thus far to examine each latent construct separately for the groups) is WLSMV. The Mplus manual (and the short course handouts) refers to using ML with numerical integration. Can you speak to the relative merits of each approach in general and as it might pertain to my specific model? My understanding is that numerical integration will involve considerable computational resources - do you suspect that it would be too much for my model? Thank you! Scott
 bmuthen posted on Saturday, May 22, 2004 - 2:57 pm
ML is an efficient estimator, so has in theory somewhat smaller SEs than WLSMV, but in practice the difference may not be large given that WLSMV uses bivariate information which probably contains the most essential information. ML can take advantage of MAR missingness, that is missingness predicted by the variables that are observed for the individual (both covariates and outcomes), whereas with WLSMV missingness can only be related to covariates not outcomes. ML via numerical integration is intended for models with a few latent variables and gets heavy whenever the number of factors gets above 3-4. It is not feasible with your 11 factors.
 Scott Weaver posted on Sunday, May 23, 2004 - 9:25 am
Dr. Muthen -
Thank you for your informative response. I have a short follow-up question. As a disadvantage, you state that WLSMV missingness can only be related to covariates not outcomes. Then, is potentially problematic if I specify type=missing (WLSMV) in a CFA with categorical indicators since there are only observed outcomes and no observed covariates? How does Mplus account for missingness and under what assumptions for this type of model? Thank you, Scott
 bmuthen posted on Sunday, May 23, 2004 - 10:50 am
Without covariates and with categorical outcomes, WLSMV computes polychoric correlations which are based on the pairwise present data for the two variables at hand. So this is drawing on an MCAR assumption for the pair.
 Scott Weaver posted on Sunday, May 23, 2004 - 11:39 am
Thanks again for your responses. Am I correct in interpreting your response as saying that pairwise deletion of subjects with missing data is occurring with WLSMV when type=missing and no covariates are present. Do the standard problems with pairwise deletion (such as greater risk of non-positive definite matrices and biased standard error estimates)apply here as well?
 bmuthen posted on Sunday, May 23, 2004 - 1:01 pm
Yes. There is not much else one can do in the framework of using polychorics and WLS. One can always bring in covariates that are predictive of missingness - covariates that probably also have a causal role in the model.
 Anonymous posted on Tuesday, June 29, 2004 - 12:59 pm
I’ve been experimenting with the numerical integration features in Mplus and had a couple of questions.

1. I have a 3 factor MIMC model (categorical latent variable indicators) that I ran using WLSMV. The model is identified, converges, no problem. I then attempted to run the model using INTEGRATION=MONTECARLO(500), using the WLSMV estimates as start values, and the model hits a saddle point. Turning CHOLESKY off, using MONTECARLO(250) or MONTECARLO(750), perturbing my start values doesn't rectify the situation, nor did setting the convergence criteria at really low values. When I used INTEGRATION=STANDARD(15), however, the model converged fine (after running overnight for 8 hours), and produced estimates by in large in line with the WLSMV model.

So my first question is this: is the default numerical integration algorithm somehow inherently better / more stable than MONTECARLO (or GAUSSHERMITE) ?

2. When setting the convergence criteria, for example, MCONVERGENCE=.05, Mplus actually reads these parameters as the inputted value / 10000, is this correct ? This is the sense I get examining the TECH8 output from my runs – that the estimation doesn’t terminate at (for example) .05, but at some much smaller value.

3. In terms of getting rather complicated models to converge using numerical integration in Mplus, how critical is the choice of convergence criteria ? I.e., what would be the consequences of using critical values of .001 versus .0001 (etc.) ? Can one “compensate” for more modest convergence criteria by increasing the number of MONTECARLO draws or integration points ?

4. Is there a way to set the parameters on my machine (via a .CFG file, for example) to allocate the maximum amount of RAM possible to Mplus when running numerical integration jobs ? For example, I have >750MB of RAM, and would like to make sure Mplus is using as much of it possible (in the event that this speeds up estimation).

Thanks very much.
 Anonymous posted on Tuesday, June 29, 2004 - 3:13 pm
1. Yes. The default numerical integration algorithm is better than MONTECARLO and GAUSSHERMITE. It is more precise and stable than MONTECARLO and is generally more reliable than GAUSSHERMITE. We do not recommend MONTECARLO for 1-3 dimensions of integration since the regular integration is possible for most models. If time is an issue you can use INTEGRATION=STANDARD(7), with more lenient convergence criteria, which will be less precise but 8 times faster for 3 dimensions.

2. No. Mplus uses multiple convergence criteria. Convergence is reported only if all convergence criteria are satisfied. MCONVERGENCE refers to the log-likelihood derivative. The second column in Tech 8 is controlled by LOGCRITERION and the third by RLOGCRITERION. The user manual has complete information about the convergence criteria.

3. The choice of convergence criteria is critical for models with "flat" log-likelihood. If the log-likelihood is "pointy" the convergence criteria is not critical. The log-likelihood tends to be more pointy for models with less parameters and large sample size. In practice the only way to see if 0.0001 is better than 0.001 is to run both and compare the difference in the parameters. Generally speaking one cannot “compensate” for modest convergence criteria by large number of integration points, however I would not be surprised if there is a small such effect with MONTECARLO integration.

4. No, but this is not needed since Mplus would use all the memory that is needed if it is available. You can monitor what Mplus uses in the Windows Task Manager. If the memory is not enough than Mplus may use less than 100% of the CPU.

 Anonymous posted on Thursday, July 15, 2004 - 5:24 am
Who can give me a short description of the WLSMV-estimator and how it works (including the formula and the meanings of the Symbols?
 Linda K. Muthen posted on Thursday, July 15, 2004 - 6:55 am
Try Chapter 4 of the Technical Appendices which are posted on the Mplus website. See Technical Appendices in the left margin.
 Anonymous posted on Tuesday, September 07, 2004 - 7:37 am
I am interested in the robust ML estimator, MLMV. It is my understanding that this technique could be used with ordinal data, but it may also be more computationally intensive than WLS (due to the full-information ML technique) when larger models are used.
If this is correct, do we know the limitations of MLMV in terms of model size? Although a requirement for the ML estimator is normally distributed data, is MLMV as sensitive to distributional nonnormality (e.g., item or parcel level skew/kurtosis) as ML or does the mean/variance adjustment help to accommodate this problem?
 Linda K. Muthen posted on Wednesday, September 29, 2004 - 4:30 pm
MLM and MLMV are not available for categorical dependent variables. Only ML, MLR, and MLF are available for categorical dependent variables. With categorical dependent variables, no normality assumption of the data is made.
 Annonymous posted on Tuesday, January 03, 2006 - 5:30 pm
Dr. Muthen,

We are running a series of model comparisons for categorical data. Since the models are not nested we are using the Information Criteria which are available through MLR estimation. For one of the potential models, there are convergence problems which we have attempted to address in multiple ways (e.g. investigating different start values including the use different numbers of random start values). This may be a mute point, however, if the AIC, BIC, and BCC are at all informative. Even though the model fails to converge the output provides these values. Are the information criteria useful in this context.
 Linda K. Muthen posted on Wednesday, January 04, 2006 - 9:02 am
I don't think that we would give these values if the model did not converge. Can you send your input, data, output, and license number to so I can see your problem?
 Eisuke Segawa posted on Thursday, July 13, 2006 - 10:02 am
On Tuesday, June 29, 2004 Tihomir replied to a question,

"1. Yes. The default numerical integration algorithm is better than MONTECARLO and GAUSSHERMITE. It is more precise and stable than MONTECARLO and is generally more reliable than GAUSSHERMITE."

Is there a document that describes why the default method is more reliable than GAUSSHERMITE? Also, is there a document which describes how quadrature points and weights are determined in the default method?

Thank you.
 Bengt O. Muthen posted on Friday, July 14, 2006 - 4:40 pm
No, there is not. But given an interest, we can produce more information.
 RDU posted on Saturday, November 15, 2008 - 9:08 pm
Three questions:

1.) I have been experimenting with a series of Factor Mixture Models with ordinal indicators. I have noticed that the command "ALGORITHM = INTEGRATION" is required whenever I specify the indicators as being categorical. So is integration needed (e.g., using MLR estimation) because we are treating the ordinal data as discretized version of a continuous underlying variable, and we need to integrate in order to derive thresholds and other estimates?

2.) Is it correct that polychoric correlations are not used in these models, since we are not using WLS or WLSMV as the estimator and due to the general assumptions of mixture models?

3.) Are there any papers written on this particular topic so that I can better understand what's happening? Thank you.
 Bengt O. Muthen posted on Sunday, November 16, 2008 - 10:28 am
1) No, that's not the reason. Numerical integration is needed for ML estimation when the model involves a combination of continuous latent variables and categorical observed variables. Same thing in IRT where single-class models are considered - see that literature for more technical details in the single-class case, e.g. the Psychometrika paper by Bock & Aitkin (see References on our web site).

2) Yes.

3) See the Muthen-Asparouhov (2008) chapter

Muthén, B. & Asparouhov, T. (2008). Growth mixture modeling: Analysis with non-Gaussian random effects. In Fitzmaurice, G., Davidian, M., Verbeke, G. & Molenberghs, G. (eds.), Longitudinal Data Analysis, pp. 143-165. Boca Raton: Chapman & Hall/CRC Press.

which is on our web site (Papers, GMM).
 Syd posted on Wednesday, August 04, 2010 - 4:03 am

A note above states that "The default numerical integration algorithm is better than MONTECARLO and GAUSSHERMITE. It is more precise and stable than MONTECARLO and is generally more reliable than GAUSSHERMITE. We do not recommend MONTECARLO for 1-3 dimensions of integration since the regular integration is possible for most models. If time is an issue you can use INTEGRATION=STANDARD(7), with more lenient convergence criteria, which will be less precise but 8 times faster for 3 dimensions."

I'm trying to run a model with 4 dimensions of integration. Due to the complexity of the model, it appears that the model will run for approximately 10 hours if I use INTEGRATION=STANDARD, so I am looking for a way to decrease the duration. However, as MONTECARLO is stated to be less precise than STANDARD, I'm not sure about how to proceed. Would it be preferable to use MONTECARLO or STANDARD(7) in terms of the accuracy of the results?
 Linda K. Muthen posted on Wednesday, August 04, 2010 - 9:40 am
I would use STANDARD (7) and then for the final model use the default using starting values from the SVALUES option of the OUTPUT command to speed things up.
 Syd posted on Wednesday, August 04, 2010 - 12:18 pm
Thank you for the guidance. Just to clarify what you mean by "for the final model," I understand that I need to run my model twice, once with STANDARD(7), and a second time with STANDARD(15) using the starting values from the STANDARD(7) run. Is this correct?
 Linda K. Muthen posted on Wednesday, August 04, 2010 - 4:54 pm
Whichever model is your final model should be run with the default of STANDARD (15). Sometimes, one runs more than one model before the final model.
 Syd posted on Wednesday, August 04, 2010 - 5:52 pm
Thank you very much for the help. This really decreased the the running time for the final model to about 2 hours.
 Mary Campa posted on Monday, October 24, 2011 - 5:42 pm
Hello. I am running a two level mixture model with 4 between level dimensions of integration. I have tried to reduce the initial running time by using standard(7) statement as indicated in the 8/4/10 post but it did not change the final number of integration points (50625). The program has issued a warning about the computational burden and suggested reducing integration points or using montecarlo but neither of these statements has made a difference.

Here is the code I am using:

STARTS = 20 10;


C#1; C#2*1; C#3; C#4; C#1 with C#2 C#3 C#4; C#2 with C#3 c#4; C#3 with C#4;

Any suggestions appreciated. Thank you.
 Bengt O. Muthen posted on Monday, October 24, 2011 - 8:41 pm
Sounds like you have a very large sample size - what is it?

How many dimensions do you have on Within?

If you like, you can send your full outputs for integration=7 and integration=MonteCarlo to Support.
 Mary Campa posted on Monday, October 24, 2011 - 9:19 pm
There are no dimensions on within and the sample size is 707 relationships (clustered in 83 people).

My problem is that I cannot get the 5 class model to run (I have tried twice, leaving it overnight to work and came in to find my computer shut down from "unexpected errors"). I just updated my computer to 64 bit to run the program and was able to go through 4 classes but not the fifth on the two-level. The simple model (without clustering) suggests that 5 classes is the best fit so I need to press on. However, neither using integration = 7 or integration = MonteCarlo statements seem to change what is happening. Even with integration = 7 I get an error that says the paging file is too small for this operation to complete. Sadly, I cannot send output I cannot produce.

Is there another way to run this model that will produce class memberships and be computationally tractable?
 Linda K. Muthen posted on Tuesday, October 25, 2011 - 2:45 pm
Please send your input, data, and license number to
 Tim Konold posted on Wednesday, March 14, 2012 - 9:14 am
Can you please clarify for me whether it is possible to deal with categorical variables and missing data in EFA/CFA models with V6 and how this is done.

I have specified my categorical variables and the fact that data are missing. If I request ML estimation, will I obtain results based on a polychoric correlation matrix with missingness handled through FIML? If not, what is occurring with the specification of both categorical variables and missing?

Many thanks.
 Linda K. Muthen posted on Wednesday, March 14, 2012 - 5:56 pm
The default in Version 6 is to use all available information. With categorical outcomes, you can use weighted least squares, maximum likelihood, or multiple imputation. Maximum likelihood does not analyze correlations.

Mplus provides maximum likelihood estimation under MCAR (missing completely at random), MAR (missing at random), and NMAR (not missing at random) for continuous, censored, binary, ordered categorical (ordinal), unordered categorical (nominal), counts, or combinations of these variable types (Little & Rubin, 2002). MAR means that missingness can be a function of observed covariates and observed outcomes. For censored and categorical outcomes using weighted least squares estimation, missingness is allowed to be a function of the observed covariates but not the observed outcomes. When there are no covariates in the model, this is analogous to pairwise present analysis.
 Sarah Ryan posted on Friday, October 05, 2012 - 6:32 am
Is it possible to use the two-tier approach with categorical outcomes, a new feature discussed in the version 7 release, for a model in which I have four latent factors- but in which I do expect the factors to exhibit some level of correlation?

I am also curious as to whether any sort of "probability calculator" (such as the new latent transition probability calculator for LCA) will eventually be available to use with categorical outcomes in Mplus?
 Bengt O. Muthen posted on Friday, October 05, 2012 - 9:49 am
No, two-tier is used when you for instance have 2 uncorrelated factors that are the only factors influencing an item. So a typical candidate is a bi-factor model.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message