Message/Author 

bmuthen posted on Friday, May 17, 2002  9:15 pm



Every individuallevel variable can in principle be decomposed into variance and covariance components for two orthogonal parts: the within and the betweenparts. Mplus automatically generates both parts. This is different from conventional multilevel programs, where the user has to create a variable containing the clusterspecific mean of an individuallevel variable for use on the between level. This mean is the between part of what Mplus automatically generates. So, with Mplus the user should not create clusterspecific mean variables, unless he/she wants to use the individuallevel variable in clustermean form only in the between part of the model. Because of this, individuallevel variables have a between covariance matrix part in Mplus when using the MUML approach. If you don't want the between parts of individuallevel variables to be predictors on the between level, one fixes the between regression coefficients at zero. But one still needs to capture the betweenlevel covariances among those predictors so the program includes that part automatically. The new FIML approach makes things a little easier in this regard. The individuallevel variables that you do not want to use as predictors on the between level can be specified on the Within = list and they will then be excluded from the between part of the model automatically. So with FIML, one doesn't have to fix those betweenlevel regression coefficients to zero. 

Feiming Li posted on Tuesday, April 26, 2005  1:29 pm



Does the new FIML approach refer to MLM and MLMV? What's the difference between MLM and MLMV? Where I can find a paper or some fomula about them? Thanks a lot! 


The new FIML approach refers to ML, MLR, and MLF. See the technical appendices on our website. See Chapter 15 under Estimator for a brief description of the estimators and also for a table that shows which estimators are available for various analyses. 

Feiming Li posted on Wednesday, April 27, 2005  7:50 pm



I checked the techincal appendices and the table you mentioned, I want to make sure my understanding about MLM, MLMV and MUML. 1. these three are all quasilikelihood estimator, so they belong to LIML, right? 2. these three all can be used to estimate the multilevel model with unbalanced group sizes. But the difference is: MLM is ML parameter estimates with robust standard errors and a meanadjusted chisquare test statistic; MLMV is ML parameter estimates with robust standard errors and a mean and varianceajusted chisquare test statistic; And what's the difference between MUML and these above two? Is it also with robust standard errors and chisquare? When we have the multilevel data with unbiased group size, which one is better to use? I'm so confused by these three method, could you correct my understanding and clarify the difference between them. Thank you so much!!!! 


With balanced data MUML is fullinformation maximum likelihood. With unbalanced data, it is limited information. It is not robust. MLM and MLMV are not limited information. You must be using Version 2 because in Version 3 the estimator choices are ML, MLR, and MLF. I would recommend MLR. 


Hi Bengt, Using FIML, I am getting fit statistics for a model with 6 indicator variables, for which a single factor is specifies at both within and between levels. I was under the impression that only MUML would return fit statistics because it's the only estimator which allows for a logical saturated model. I am guessing, now, that FIML will give fit statistics in the cases that a logical saturated can be computed. Can you please tell me if I am right and, if so, under what circumstances logical saturated models exist (e.g., without random coefficients, etc.). For example, if there are variables included at between or within levels that are not specified at the other level, will I still get fit statistics, etc. Is there any publication I can read about this issue? Thanks! 

bmuthen posted on Monday, July 18, 2005  11:10 pm



Mplus does not give fit indices for models with random slopes, but does give them if the model has only random intercepts. Random slope models have variances for dependent variables that vary with the values of the covariates that have the random slope, so the usual SEM test against a single covariance matrix is less well motivated. This hasn't been written about I think, nor do I know of anybody doing research on overall fit indices for random slope latent variable models  could be useful (dissertation anyone?). 


Hi Bengt, If I understand FIML correctly, a different covariance structure is computed for each set of groups with a similar size, with "d" many sets of groups. If this is true, then one should be unable to recover a single covariance matrix at the betweengroup level when one models groups with "d" many different sizes.(?) I have such a model (without random coefficients) and I am getting a single covariance matrix at the betweengroup level when I use the SAMPSTAT option. Can you please tell me why? Also, when getting fit statistics for this model, are these fit statistics based on the single covariance matrix provided by Mplus, or are the fit statistics based on comparisons of a saturated model for each set of groups with a similar size? I.e., are the FIML fit statistics for the between model representing a kind of multigroup model with "d" many groups? Thanks for your time, Michael Zyphur 

bmuthen posted on Monday, July 25, 2005  11:31 pm



I think you are referring to the FIML approach of using "d"specific sample covariance matrices S_Bd in the old multiplegroup approach for random intercepts (not random slopes) models (this was described in my tech report Muthén, B. (1990). Mean and covariance structure analysis of hierarchical data. Paper presented at the Psychometric Society meeting in Princeton, NJ, June 1990. UCLA Statistics Series 62. (#32)). But one needs to make a distinction between sample and population  note that even though there are several S_Bd, there is still only one Sigma_B matrix. Also, note that the more general FIML approach does not compute such "d"specific covariance matrices when doing modeling, but works with raw data. SAMPSTAT computes an ML estimate of the unrestricted (H1) within and between covariance matrix, so one estimated Sigma_B. The fit statistics for the model (H0) are based on a comparison between the H0estimated Sigma_B and the H1estimated Sigma_B (and the same for within of course). 


Interesting. So, when using FIML with many different group sizes (without random coefficients), the parameter estimates should be more accurate than under MUML because the individuallevel data are used. However, in such a case, if the model fit statistics are based on a single Sigma_B, then shouldn't these fit statistics be as problematic as when using MUML? In other words, fit statistics will always be based on a single Sigma_B matrix, so these fit statistics will suffer when, for example, ICCs are low, the betweengroup sample size is small, and there are groups of widely variant sizes. 

bmuthen posted on Sunday, July 31, 2005  4:46 pm



First a quick and to the point answer which is that model fitting that considers a single Sigma_B is appropriate even if we use clustersize specific S_Bd sample covariance matrices. It is good to disentangle this. (1) Your 2nd sentence talks about the virtue of using more information than the 2 sample covariance matrices S_B and S_PW used by MUML. (2) Your 3rd sentence talks about model testing focusing on estimating the population matrix Sigma_B. So in (1) the issues are sample information and estimation  how to use sample information for parameter estimation, while in (2) the issues are model testing and population structures  how to test the model for Sigma_B. (1) is not directly related to (2). Another way of answering is to note that, yes MUML uses limited information and is only equal to ML when cluster sizes are the same  but does give good estimates and tests of fit also with unequal cluster sizes. FIML can be seen as using several S_Bd sample matrices and S_PW (although this is a correct view only if we don't have random slopes), but this doesn't mean that we have a different Sigma_B model for each cluster size. So, fit statistics based on a single Sigma_B don't "suffer" in the cases you mention last. It's a lot to keep track of in this area and rather little pedagogical material is available so far unfortunately. But there is always our annual course in November. 

Marco posted on Thursday, November 24, 2005  2:18 pm



The Muthénpaper (1994) says that the sample betweencov matrix could be obtained by a standard statistic software. I guess that this is simple, but I still don´t know how to do that? There seems to be no special option (in SPSS), so is there a general procedure to "compute" the SB? Thanks for your help! 

Bmuthen posted on Friday, November 25, 2005  2:21 pm



To compute S_B you first create cluster means. The cluster means are the values on a variable (z, say) that has as many observations as there are clusters (C say). Then you simply use any program to create a regular sample covariance matrix for the variable z with n = C. 


The discussion board talks about using FIML (fullinfomax liklihood) but I can't make it run on Mplus. Also there are some other techniques (without weights) to dtermine percenatges of covariance structures explained by the missing patterns etc. Can you help? 


Please send input, data, output, and license number to support@statmodel.com so I can see why you are having a problem. 


Dear Linda These are the details of what I am trying to accomplish: I have N (a subpopulation)= 5964 I have weights= gweights, stratum, PSU When I run a model in SEM (shown in the attached output), I find that I am missing a large portion of data (N=3464) and they are MAR. My concern is that listwise deletion will not be appropriate. These are the things I need to do (as per your notes on the web): run a FIML (“Saying Analysis type = missing implies using all available data. With FIML this is the standard "MAR" approach to missingness” from your notes) These are the commands I have been trying to use :  subpop option  Type= Missing  Type=complex  Estimator=FIML (or equivalent)  The dependent variable(HIVT) is categorical (0/1).  independent vars are: slhc ,doc5, urb ,zedu ,zinc, mcare However I cannot run TYPE= Complex, Missing both at the same time. I also cannot use the subpop option with Type=missing. Is there anyway to accomplish all this at the same time? Thanks for your reply Judy BaerMplus VERSION 4.0 MUTHEN & MUTHEN 05/03/2006 2:10 PM INPUT INSTRUCTIONS Title: CFA on AOC: 1 groups, !SUBPOP=ANYSX montecarlo Data: File is C:\Ranjana\Thesis\Noidu2.dat ; Variable: Names are race gend stratum swght urb chnA chnS hivt prcv regc doc ons2 age2 schl2 slhc doc4 doc5 age idu peer graf dmgp lie shop run car st50 rob weap slldr st49 rowd fight gang educ inc mcare prcvR curf owfrn owcl tvA tvP bed owdt eatp zedu zinc msex mrskp msup sxdm2 anysx sxdm casl alcsx drgsx anal cndm psu aid; USEVAR= slhc zedu urb zinc mcare doc5 ;!stratum swght psu CATEGORICAL ARE SLHC urb; !SUBPOP= anysx EQ 2; Missing are all (9999) ; !these are the weights: !WEIGHT=swght; !STRAT= stratum; !CLUSTER=psu ; Analysis: !Type = COMPLEX MISSING; TYPE=MISSING; ESTIMATOR=MLR; !WLSM !INTEGRATION=MONTECARLO; ITERATIONS=10000; MODEL: f1 BY slhc doc5 urb; f1 ON zedu zinc mcare; OUTPUT: MODINDICES(0) TECH3 ; *** WARNING in Output command MODINDICES option is not available for ALGORITHM=INTEGRATION. Request for MODINDICES is ignored. *** WARNING Data set contains cases with missing on xvariables. These cases were not included in the analysis. Number of cases with missing on xvariables: 2418 2 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS CFA on AOC: 1 groups, montecarlo SUMMARY OF ANALYSIS Number of groups 1 Number of observations 11079 Number of dependent variables 3 Number of independent variables 3 Number of continuous latent variables 1 Observed dependent variables Continuous DOC5 Binary and ordered categorical (ordinal) SLHC URB Observed independent variables ZEDU ZINC MCARE Continuous latent variables F1 Estimator MLR Information matrix OBSERVED Optimization Specifications for the QuasiNewton Algorithm for Continuous Outcomes Maximum number of iterations 10000 Convergence criterion 0.100D05 Optimization Specifications for the EM Algorithm Maximum number of iterations 500 Convergence criteria Loglikelihood change 0.100D02 Relative loglikelihood change 0.100D05 Derivative 0.100D02 Optimization Specifications for the M step of the EM Algorithm for Categorical Latent variables Number of M step iterations 1 M step convergence criterion 0.100D02 Basis for M step termination ITERATION Optimization Specifications for the M step of the EM Algorithm for Censored, Binary or Ordered Categorical (Ordinal), Unordered Categorical (Nominal) and Count Outcomes Number of M step iterations 1 M step convergence criterion 0.100D02 Basis for M step termination ITERATION Maximum value for logit thresholds 15 Minimum value for logit thresholds 15 Minimum expected cell size for chisquare 0.100D01 Maximum number of iterations for H1 2000 Convergence criterion for H1 0.100D03 Optimization algorithm EMA Integration Specifications Type STANDARD Number of integration points 15 Dimensions of numerical integration 1 Adaptive quadrature ON Link LOGIT Cholesky OFF Input data file(s) C:\Ranjana\Thesis\Noidu2.dat Input data format FREE SUMMARY OF DATA Number of patterns 4 COVARIANCE COVERAGE OF DATA Minimum covariance coverage value 0.100 PROPORTION OF DATA PRESENT Covariance Coverage SLHC URB DOC5 ZEDU ZINC ________ ________ ________ ________ ________ SLHC 0.984 URB 0.982 0.994 DOC5 0.984 0.994 1.000 ZEDU 0.984 0.994 1.000 1.000 ZINC 0.984 0.994 1.000 1.000 1.000 MCARE 0.984 0.994 1.000 1.000 1.000 Covariance Coverage MCARE ________ MCARE 1.000 PROPORTION OF DATA PRESENT FOR U Covariance Coverage SLHC URB ________ ________ SLHC 0.984 URB 0.982 0.994 PROPORTION OF DATA PRESENT FOR Y Covariance Coverage DOC5 ZEDU ZINC MCARE ________ ________ ________ ________ DOC5 1.000 ZEDU 1.000 1.000 ZINC 1.000 1.000 1.000 MCARE 1.000 1.000 1.000 1.000 SUMMARY OF CATEGORICAL DATA PROPORTIONS SLHC Category 1 0.186 Category 2 0.814 URB Category 1 0.443 Category 2 0.557 THE LOGLIKELIHOOD DECREASED IN THE LAST EM ITERATION. CHANGE YOUR MODEL, STARTING VALUES AND/OR THE NUMBER OF INTEGRATION POINTS. THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ERROR IN THE COMPUTATION. CHANGE YOUR MODEL AND/OR STARTING VALUES. MODEL RESULTS Estimates F1 BY SLHC 1.000 DOC5 1.503 URB 2.801 F1 ON ZEDU 0.005 ZINC 0.001 MCARE 0.001 Intercepts DOC5 2.383 Thresholds SLHC$1 1.572 URB$1 0.330 Residual Variances DOC5 0.000 F1 0.321 Beginning Time: 14:10:21 Ending Time: 14:12:23 Elapsed Time: 00:02:02 MUTHEN & MUTHEN 3463 Stoner Ave. Los Angeles, CA 90066 Tel: (310) 3919971 Fax: (310) 3918971 Web: www.StatModel.com Support: Support@StatModel.com Copyright (c) 19982006 Muthen & Muthen 


It is not possible for me to help you with this problem on Mplus Discussion. It is a support question. Please send your input, data, output, and license number to support@statmodel.com so I can see why you are having a problem. 

Ringo Ho posted on Sunday, December 03, 2006  10:22 am



Dear Prof. Muthen I cannot find the details of the FIML (MLR estimator) for fitting multilevel models (with random slopes) in tech. appendices. Are there any papers you can suggest for me so to learn more about the FIML implemented in Mplus? Thanks a lot for your great help!! 


The paper describing this has not yet been written. A related paper is: Muthén, B. & Asparouhov, T. (2006). Growth mixture analysis: Models with nonGaussian random effects. Forthcoming in Fitzmaurice, G., Davidian, M., Verbeke, G. & Molenberghs, G. (eds.), Advances in Longitudinal Data Analysis. Chapman & Hall/CRC Press. You can download it from the website. 

yang posted on Friday, February 29, 2008  7:07 pm



Drs. Muthen, I am running CFA/MIMIC on a multiple dimensional structure (with a mixture of categorical and continuous indicators and covariates). Some (about 15%) of the observations have missing values on the indicators, and I am using the WLSMV estimator (the default of Mplus for this situation). Since the percentage (15%) of observations with missing values is pretty high, I also run the analysis on the complete cases (by adding LISTWISE = ON in the DATA section), and did the sensitivity analysis. My question is, what is the relation between the WLSMV and the FIML? I checked the outputs, and found that without LISTWISE = ON, WLSMV used all of the observations. Thank you very much. Yang 


I think you are asking about missing data estimation with weighted least squares estimation versus maximum likelihood estimation. For censored and categorical outcomes using weighted least squares estimation, missingness is allowed to be a function of the observed covariates but not the observed outcomes. When there are no covariates in the model, this is analogous to pairwise present analysis. 


Hi,, In a twolevel model with unbalanced data, I preferred to use MUML. However, when I typed in the ANALYSIS command ESTIMATOR IS MUML the output warned me that MUML is not used with the twolevel analysis. As far as I know MUML equals to ML with balanced data. Does ML estimator handle the unbalanced case? (I have a large sample with sufficient number of groups (n=130)and continuous latent variables being under investigation)Which estimator is better for such an unbalanced case with continuous latents via Mplus 6.1? Thanks... Utkun 


MUML is allowed with TWOLEVEL and continuous outcomes. I would need to see your output to understand why you get this message. Yes, ML can handle the unbalanced case. I would recommend using ML or one of the other maximum likelihood estimators. The default is MLR. 


Linda,, of interest I wanted to run a multilevel model by using total scores. And that, I got this message when I create a latent variable with the total scores on that variable. I mean for instance, f BY x1 x2 x3 (x1x3 are 0/1 coded categorical variables) g BY x4 x5 x6 (x4x6 are 0/1 coded categorical variables) I formed latent variables f' and g', such that by adding x1+x2+x3, as total score. Then I zstandardized the f' and g'. I wanted to model: f' ON g'; with MUML estimator because my data are unbalanced. However I got the error message. And when I used ML to investigate this "total score" issue the data fit the model. May be that zstandardization caused the problem? Because when I ran the multilevel model below with categorical indicators using a WLSM estimator I obtained a good fit... f BY x1 x2 x3; g BY x4 x5 x6; f ON g; Thanks... Utkun 


Sorry Linda a followup question: if I am to go on the analysis with latents obtained by total scores,, may be I should have conducted covariance structure analysis following Bengt's 1994 article. Might the problem be that I did not use the covariance matrices? Thanks again... Utkun 


MUML is allowed only with continuous factor indicators. See the ESTIMATOR option in the user's guide. There is a table that shows which estimators are allowed in different situations. 


Hello I have run a 2 level regression model with a random slope. I find that when I run the model without a random slope, it converges and I get very reasonable estimates. However when I add the random slope (and I use algorithm=integration; integration=montecarlo), I get the following message: THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A NONPOSITIVE DEFINITE FISHER INFORMATION MATRIX. CHANGE YOUR MODEL AND/OR STARTING VALUES. THE MODEL ESTIMATION HAS REACHED A SADDLE POINT OR A POINT WHERE THE OBSERVED AND THE EXPECTED INFORMATION MATRICES DO NOT MATCH. THE CONDITION NUMBER IS 0.116D01.THE PROBLEM MAY ALSO BE RESOLVED BY DECREASING THE VALUE OF THE MCONVERGENCE OR LOGCRITERION OPTIONS. THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED.... PROBLEM INVOLVING PARAMETER 69. I the provided the random slope model with start values from the model without the random slopes but get the following error. THE ESTIMATED WITHIN COVARIANCE MATRIX COULD NOT BE INVERTED.COMPUTATION COULD NOT BE COMPLETED IN ITERATION 1. CHANGE YOUR MODEL AND/OR STARTING VALUES. THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ERROR IN THE COMPUTATION. CHANGE YOUR MODEL AND/OR STARTING VALUES. What do you think I could do to move forward? Thank you for your time. 


If you look at the variance of the random slope on the between level, you will likely see that it is zero suggesting you fix it at zero or run the model as a fixed slope model. 


Hi Linda Thanks very much for this suggestion. The model at the between level has: s on efmsoa; s has a residual variance of 1 efmsoa has a variance of .03 the residual variance of s being so large has to do I think with the fact that it didn't even start converging (s on efmsoa yields a beta of 0). However the fact that efmsoa (which is a between level variable) has such little variance across clusters suggests perhaps that the slope would not then vary according to this variable, and so it needs to be a fixed slope model. Is this right? Thank you, Ananthi 


Please send the output and your license number to support@statmodel.com. 


Hi, I have to choose between MLM and MLMV, what are the supporting and nonsupporting arguments of using each of the estimators? Does anyone knows a useful reference? Thanks a lot! 


Our informal simulation studies suggest MLM performs better. 

Back to top 