Message/Author 

Anonymous posted on Friday, January 07, 2000  12:20 pm



Some multilevel packages have begun to include methods for multilevel models with binary and other categorical outcomes. Does MPLUS plan to develop procedures for multilevel models with categorical outcomes? If yes, when might we look for those to become available? 


This is an important topic. We continue to expand Mplus modeling opportunities in many ways. To both not disappoint people and not place undue pressure on ourselves, we try to avoid preannouncements of our developments. 

David Klein posted on Tuesday, April 10, 2001  11:16 am



I have data with probability weights and a categorical outcome. I'd like to run a weighted analysis (structural equation modeling) but I want to make sure that simply adding the WEIGHT statement will adjust the standard errors properly. A read of the manual gives me the feeling that I should use TYPE=COMPLEX analysis, but that requires that I use a continuous outcome, which is not the case. The manual also says that the WEIGHT statement "identifies the variable that contains case or sampling weight information," but does not say how to differentiate between case vs. sampling weights or whether MPlus in fact treats them differently. Any insight would be greatly appreciated. Thanks! 


Yes, just adding the WEIGHT statement is sufficient. TYPE=COMPLEX requires that you have a cluster variable and that the outcomes be continuous. If you do not, just use the WEIGHT statement alone. Mplus does not treat case or sampling weights differently. 

Yeong Chang posted on Monday, September 10, 2001  1:15 pm



I hope the person who started this topic would check this out: In your message, you said: Some multilevel packages have begun to include methods for multilevel models with binary and other categorical outcomes. I hope you could name one or two of these packages because I want to use them for my analysis. Thanks a lot. Another question is for Muthen & Muthen: Are we closer to the stage where MPlus can do multilevel modeling with binary outcomes? 


Packages that can handle categorical outcomes are Mixor, MLWin, and HLM as far as I know. We are currently working on adding categorical outcomes to the multilevel modeling in Mplus. 

Yeong Chang posted on Thursday, September 13, 2001  2:27 pm



Thank you, Linda. Among the three packages you mentioned (Mixor, MLWin, and HLM), can any of them do multilevel factor analysis (like CFA with categorical outcomes)? I guess HLM cannot, but I don't know much about Mixor or MLWin. Thanks again. 


Not to my knowledge. 


What are the risks and implications of modeling dichotomous indicators of latent continuous variables as if they were continuous, nonnormal indicators under the complex sampling paradigm? 


The issues of modeling dichotomous indicators as though they are continuous are the same for clustered data as for nonclustered data. As you get away from a 50/50 split, correlations become attenuated. 


But since the COMPLEX modeling procedure allows for nonnormally distributed variables, isn't the affect of unequally distributed dichotmous indicators also accounted for? 


Parameter estimates are not adjusted for nonnormality when a robust estimator is used. Only standard errors and chisquare test statistics are adjusted. With dichotomous indicators that are not 50/50, the sample statistics (correlations) will be attenuated. This means that the parameter estimates will be distorted. So although the standard errors may be correct for the parameter estimates, the parameter estimates themselves will not be correct. 


Help wanted for some basic questions: I want to explore alternative descriptions of the same model: The data consists of 508 persons making 9 responses to stated choice experiments. If we initally ignore the fact that each person gave 9 responses, we can view the data as 4572 cases. The model is then simply: categorical ch1; ch1 on pd1 td1 wd1; The (dichotomous) choice is dependent on a price difference, time difference and waiting time difference. A multilevel approach could be applied to check the size of the within subject variation and possible to improve the estimates. However, how do we specify that the 9 responses for each cluster are correlated? Type=complex; and cluster=id; and what more??  The data could alternatively be dealt with by applying a multivariate framework where the 9 choices are represented as 9 variables. The number of cases is then 508. The simple model then looks like this: ch1 ON pd1 (1); ch2 ON pd2 (1); ch3 ON pd3 (1); ch4 ON pd4 (1); ch5 ON pd5 (1); ch6 ON pd6 (1); ch7 ON pd7 (1); ch8 ON pd8 (1); ch9 ON pd9 (1); ch1 ON Td1 (2); ch2 ON Td2 (2); ch3 ON Td3 (2); ch4 ON Td4 (2); ch5 ON Td5 (2); ch6 ON Td6 (2); ch7 ON Td7 (2); ch8 ON Td8 (2); ch9 ON Td9 (2); ch1 ON wd1 (3); ch2 ON wd2 (3); ch3 ON wd3 (3); ch4 ON wd4 (3); ch5 ON wd5 (3); ch6 ON wd6 (3); ch7 ON wd7 (3); ch8 ON wd8 (3); ch9 ON wd9 (3); Again I wonder how to specify that the 9 choices are correlated (since the dependent variables are categorical the with statement will not work). Could one introduce a factor f by {ch1ch9@1} to signify that the 9 choices share a common identical source of error? Or should one introduce 9 continous formative indicators of the respective pd td and wd inputs and let each of them list the respective dichotomous choice as their sole indicator? f1 by ch1; ... f9 by ch9; [ch1ch9](4); !constrain intercepts to be equal f1 on pd1(1); f1 on td1(2); f1 on wd1(3); ..... f1f9 with f1f9(5); I am thankful for any answers 


To account for clustering in the univariate approach that you showed where cluster is id, saying CLUSTER = ID; and TYPE = COMPLEX is all you have to do to account for the lack of independence of observations. Regarding the multivariate framework, you could do one of the following: Weighted least squares estimator MODEL: ch1ch9 ON pd1pd9 td1td9 wd1wd9; Maximum likelihood estimator f BY ch1ch9; f ON pd1pd9 td1td9 wd1wd9; 

CMW posted on Sunday, June 26, 2005  12:20 pm



Greetings, In example 9.2 in the Mplus manual on page 213 of twolevel regression analysis for a categorical dependent variable, is that probit or logistic regression? Also, assuming a model like example 9.2 with a random slope, how can I get the actual slope estimate for each person? CMW 

BMuthen posted on Tuesday, June 28, 2005  8:13 am



It is a logistic regression. You can get this by requesting factor scores in the SAVEDATA command. 


I have data in which nearly 1000 children were recruited among 60 doctors. I want to include child self reported symptoms and doctor reported child symptoms in the same latent class model. My only outcome is to find latent classes combining the doctor and child reports of child symptoms. The symptoms are binary. 1) Is there anything theoreitcally or conceptually violated by including data from multiple levels in a single latent class analysis  i.e. including responses from doctors and children about the child in the same model? If they are measures of the underlying latent construct and are conditionally independent then it seems acceptable. 2) Can MPLUS take into account the clustering of children around doctors when the items are binary? There is no regression outcome  just latent classes based on the binary symptoms. How do I know if its even necessary to take into account the clustering? 3) If yes the second question, could you please direct me to where I would find the syntax for clustering? 3) I have not found any 'multilvel' latent class analysis? Could you provide any references or examples of this, in which for example, latent classes are found using community level, school level, teacher level, child level data, not from the same reporter? Thank you so much for having an excellent site and for all your help. 

BMuthen posted on Monday, September 05, 2005  2:34 pm



The doctor and the child measures are not independent but can be analyzed in a single model by using TYPE=TWOLEVEL MIXTURE; or TYPE=COMPLEX MIXTURE; when symptoms are binary. See Chapter 10 Example 10.3 for multilevel latent class analysis. 

CMW posted on Thursday, September 15, 2005  12:15 pm



Greetings, In the context of twolevel logistic regression with a random slope, does Mplus compute empirical bayes estimates of the slope for each cluster? CMW 


These can be obtained by requesting factor scores using the FSCORES option of the SAVEDATA command. 


Thank you for the reply in September. Now I am doing a twolevel logistic regression with a random intercept as well as a random slope and would like to get EB estimates of the intercepts. When I use this code: "savedata: file is slopesall.out; save = fscores;" I get EB estimates of slopes only. Is there a way to get estimates of the intercepts too? CW 


It sounds like you are using an old version of Mplus. You should get factor scores for both. If this is not the case, please send your input, data, output, and license number to support@statmodel.com. 


Does the Mplus 3.0 can do the multilevel SEM with two continuous latent variables(CFA) that are both with categorical indicators? 


Following the question, after I exploringly operate the two level model, the output state"*** FATAL ERROR THERE IS NOT ENOUGH MEMORY SPACE TO RUN THE PROGRAM ON THE CURRENT INPUT FILE. YOU CAN TRY TO FREE UP SOME MEMORY BY CLOSING OTHER APPLICATIONS THAT ARE CURRENTLY RUNNING. ANOTHER SUGGESTION IS CLEANING UP YOUR HARD DRIVE BY DELETING UNNECESSARY FILES".What is the basic need for hardware. How do I overcome this problem? Thanks for help. Following is the syntax: data: file is ag.prn; variable: names are cl edu inc gen bmi year at1at7 ba1ba5 em im; usevariables are at1at7 ba1ba5; categorical are at1at7 ba1ba5; cluster is cl; analysis: type is twolevel random; model: %within% aw1 by at1at3; aw2 by at4at7; bw1 by ba1ba3 ba5; bw2 by ba4; s1  bw1 on aw1; s2  bw1 on aw2; s3  bw2 on aw1; s4  bw2 on aw2; %between% ab1 by at1at3; ab2 by at4at7; bb1 by ba1ba3 ba5; bb2 by ba4; output: tech1 tech8; 


The model you are estimating requires numerical integration. There is a section on numerical integration and suggestions for using numerical integration in Chapter 13 of the User's Guide. If you are still having problems, please send your input and data to support@statmodel.com. 

Boliang Guo posted on Friday, January 12, 2007  10:11 am



when conducting 2 level logistic regression y on X, I find the result from MLwiN is quit different from MPlus with the same data, but the results are virtually same when I run 1 level logistics regression with the same data. I test the reult with ex9.2 data with U on X with both MLwiN and Mplus, the results are almost same when I run a nonrandom slop rgression, but if I set the slop random, the result will be different. is the different result due to the algorithm used in each software? 


The results should be the same if you use the ML estimator in MLwiN (I think they also have a PQL estimator). If you use ML, please send input, output (from both programs), data and license number to support@statmodel.com. 


I have just finished running a multilevel analysis (using TWOLEVEL RANDOM) with a binary outcome (smoking). As part of the output I received a Threshold value at the Between level for the binary outcome that has a value of .086. How is this value interpreted? Can it be transformed into a more easily interpreted value (e.g., as logits can be transformed into odd's ratios)? Thanks in advance! 


It is in a logit scale. See the section in Chapter 13 Calculating Probability Coefficients From Logistic Regression Coefficients to see how the threshold is used. In the example, intercepts are used rather than thresholds. Note that an interecept is the threshold with the opposite value. 


In a twolevel multinomial logistic regression, Mplus v6 gives odds ratios only for the within but not the between part of the model. Why is that? 


The variable in the betweenpart of the model is a continuous random intercept. 

Susan Haws posted on Wednesday, November 30, 2011  4:35 pm



I am estimating multilevel models using type=complex twolevel, with binary outcome smoking (0/1). Slope coefficients are fixed. Are the Rsquared within and Rsquared between estimates provided in the MPlus output legitimate ways to convey proportion of variance explained at each level? Secondly, is it safe to compare them as one builds models in steps, for example, comparing the R2 between before and after adding level 2 predictors? 


Q1. These Rsquares have been proposed in for example the multilevel book by Snijder & Boskers. Q2. Rsquare can change in unexpected ways due to the two levels interacting. See the multilevel literature such as the Raudenbush & Bryk book. 


Dear Dr. Muthen, I have the following problem: I want to transform the results of a twolevel logit model into probabilities. But apparently I make some mistakes, because I get values which don't seem correct. Now I tried a twolevel model using only the binary outcome Y (fixed intercept). According to univariate proportions Y = 1 amounts for 36%. The treshold is 0.581. Now I calculated p (Y=1) as 1/(1+exp(0.581) = 0.64, what is actually p(Y=0). If I treat Y as continous, the intercept = 0.36. Shoudn't p also be 0.36? Some more infos: I use withinweigts and WTSCALE IS UNSCALED; Thanks Christoph Weber 


Sorry, I already found the mistake. p(Y=1) = 1/(1+exp(a), and a = t. Isn't it? 


Getting this probability requires taking into account the distribution of the random effect which can be done only using numerical integration. 


Could you explain the required steps, or give some references? I use the following input (Now with random intercept): Is this correct? weight = wgtstud; WTSCALE IS UNSCALED; cluster = idclass; USEVARIABLES = ahs ; MISSING = all (99); categorical = ahs ; Analysis: type = twolevel; algorithm = integration; MODEL: %within% %between% ahs; I get a treshold = 0.769, thus p=0.32? Thanks 


See Larsen & Merlo (2005). Appropriate assessment of neighborhood effects on individual health: Integrating random and fixed effects in multilevel logistic regression. American Journal of Epidemiology, 161, 8188. and also our teaching on slides 6066 of the Topic 7 handout and video on our web site. 

Jan Super posted on Friday, September 28, 2012  8:56 am



Hello, We are trying to run a multilevel analysis with a 2212 model. Our primary IV is categorical (paytype) and our DV is categorical (ranking). MPLUS won't allow us to use a withinlevel predictor for a betweenlevel DV that has been declared categorical. We're getting error messages such as the following: *** ERROR in MODEL command Variances for betweenonly categorical variables are not currently defined on the BETWEEN level. Variance given for: GP_E_RNK A sample of our code is as follows: MISSING ARE ALL (999999); CLUSTER IS Grp_N; CATEGORICAL = pay_t gp_E_rnk; BETWEEN ARE pay_t disc_t gp_E_rnk; ANALYSIS: TYPE IS TWOLEVEL ; MODEL: %WITHIN% tot_Ed; %BETWEEN% pay_t disc_t gp_E_rnk; disc_t on pay_t (a1); tot_Ed on disc_t (b1); gp_E_rnk on tot_Ed (c1); gp_E_rnk on disc_t (d1); tot_Ed on pay_t; Thanks in advance for your help. 


You mention the variances of categorical variables. Their variances are not model parameters. You should remove the variances. Also, observed exogenous variables should not be on the CATEGORICAL list. The CATEGORICAL list is for dependent variables only. 

Jan Super posted on Friday, September 28, 2012  2:36 pm



Thank you for your reply. I'd like to ask a followup question. Could you elaborate on removing variances from dependent categorical variables? We aren't sure how to go about that. Thanks, Jan 


Remove pay_t from the CATEGORICAL list. I find it only on the righthand side of ON. Remove the variances: pay_t disc_t gp_E_rnk; gp_E_rnk is categorical so no variances are estimated. You should not mention the variance of pay_t because it is an observed exogenous variable. The other residual variance is estimated as the default. If you run into other errors, send the full output and your license number to support@statmodel.com. 

Kate Song posted on Friday, April 12, 2013  1:43 pm



Hi Linda. I am using 2level ordered logit model. For mplus, the code I use as following. VARIABLE: NAMES ARE a3_le a3_eff id_yr health a3_p; CATEGORICAL = health; BETWEEN = a3_le a3_p a3_eff; CLUSTER IS id_yr; ANALYSIS: TYPE = TWOLEVEL; ESTIMATOR = WLSM; MODEL: %BETWEEN% health on a3_p a3_le a3_eff; Health is the categorical variable and only one withinvariable. However, I've compared the results with STATA ordered logit model. It looks so different. Could you please explain the reason? FYI, I uses the following code for STATA for the same model. gllamm health a3_p a3_le a3_eff, i(id_yr) link(ologit) Thank you. 


You would need to send the Mplus and STATA outputs and your license number to support@statmodel.com. Be sure you are using the same estimator in both analyses and that your sample sizes are the same. 


Dear Drs. Muthén, we are trying to analyze diary data with Mplus. We use emotion as independent variable and a dichotomous (defined as categorical in the syntax) dependet variable on level 1. Since the daily emotion are dependent on the individual (level 2), we try to model emotion as a random effect. However, mplus can't compute the model, showing the following error message: Unrestricted xvariables for analysis with TYPE=TWOLEVEL and ALGORITHM=INTEGRATION must be specified as either a WITHIN or BETWEEN variable. So my question is: Is there a way to compute a multilevel analysis with a categorical dependent variable with a random effect? And if so, how can I perform the analysis? Thank you in advance. Andreas 


Please send the output with the error and your license number to support@statmodel.com so I can see the full context. 

Adel Pwell posted on Monday, August 26, 2013  9:09 am



When modeling multilevel categorical data with sample size 20: I am getting an error message THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NONPOSITIVE DEFINITE FIRSTORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.646D17. PROBLEM INVOLVING PARAMETER 20. Parameters 1520 are under Tau and I inputted the cut points when I generated the data. So what could this error point to as to why the standard error could not be estimated. 


Please send the output and your license number to support@statmodel.com. 


I have a hierarchical dataset (people nested within states), a dichotomous outcome, a number of withinlevel covariates, and a single betweenlevel dichotomous predictor. Here is some of my code: CATEGORICAL = drunkdr; BETWEEN = spsobch; CLUSTER = state; STRATIFICATION = strata; WEIGHT = weight; WTSCALE = ECLUSTER; Analysis: TYPE = TWOLEVEL COMPLEX; Model: %BETWEEN% drunkdr ON spsobch; I want to interpret the effect of spsobch (a dichotomous L2 predictor) on my dichotomous outcome (drunkdr). I understand that the within part of the model is regressing the L2 random intercept on the predictor. Here is the output for this part of the model: MODEL RESULTS Between Level DRUNKDR ON SPSOBCH 0.280 0.058 4.803 0.000 Thresholds DRUNKDR$1 6.028 0.088 68.774 0.000 Residual Variances DRUNKDR 0.134 0.028 4.744 0.000 I'm struggling with how to interpret the estimate = 0.28. Given my outcome variable is dichotomous, is this mean clusters with a 1 on the predictor have an average proportion of .28 lower than those with a 0? Is there a way to output the random intercepts for each of the clusters? Thanks, Darin 


The variable drunkdr is a random intercept on the between level. 0,28 is a linear regression coefficient. 


I am trying to calculate predicted probabilities for certain values of a higherlevel covariate using a multilevel model with a 4category ordinal dependent variable and a crosslevel interaction term. From what I've read on the forum, I must calculate this by hand. I have tried to write out the equation so I can input covariate values, but I cannot visualize how the pieces fit together in a single equation. The inclusion of the crosslevel interaction term is confusing me. Any help you can offer in interpreting the output would be greatly appreciated. MODEL: %WITHIN% worry ON jobsec easefind union private univ abovehi higher age spwrkft; %BETWEEN% worry ON plmp unemp; 


Apologies for posting the wrong equation. The correct equation is below: MODEL: %WITHIN% worry ON jobsec easefind union private univ abovehi higher; %BETWEEN% easefind worry ON almp ; worry ON unemp ; 


I assume "worry" is your categorical DV. With categorical DVs, the multilevel model involves numerical integration to get the probabilities. Also, I don't see a crosslevel interaction term. That would appear as e.g. %within% s  worry on jobsec; %between% s on w; so that w*jobsec is the interaction. 


Perhaps I used the wrong terminology (maybe the term is crosslevel effect rather than interaction?). I only want random intercepts. Here is what I am trying to simultaneously achieve: (1) Categorical DV worry is regressed on individuallevel IVs (jobsec easefind union private univ abovehi higher). (2) DV worry is regressed on countrylevel IVs (almp unemp). (3) easefind, an IV at the individual level, becomes a DV at the country level and is regressed on a countrylevel IV (almp). In short, I'm trying to create a multilevel mediation model where X is measured at a higher level (country) than the mediator or Y, both of which are measured at the individual level. It sounds like calculating predicted probabilities for such a model might not be possible, at least not by hand, if their computation requires numerical integration. If not, how can it be done? 


See the multilevel teaching we do on categorical DVs in our short course website handout and video from Johns Hopkins  Topic 7. 


Hi, I am trying to fit a twolevel model with ordered categorical dependent variables and two measurement time points. I tried to include measurement invariance by constraining the thresholds to be equal over time. Doing this I get an error message saying Internal Error Code: PARAMETER EQUALITIES ARE NOT SUPPORTED. Is it not possible to constrain thresholds in this kind of models? Even if I just try to assign variable names to the threshold variables I get the same error message... Thanks! 


Please send your output and license number to support@statmodel.com. 

Hoda Vaziri posted on Thursday, September 18, 2014  1:12 pm



Does the twolevel multinomial regression provide any chisquare test for fit? It does give me ligliklihood and AIC, but I can't run a null model to get the logliklihood for the null model to compare. What about R^2? how can I compute that? 


Q1. No. You can run a null model by fixing slopes at zero. Q2. Which Rsquare reference do you have in mind? 

Hoda Vaziri posted on Friday, September 19, 2014  7:34 am



Thank you! I mean Cox and Snell's Rsquare, Nagelkerke's Rsquare, or Hosmer & Lemeshow's Rsquare. 


Those can be obtained by getting the likelihood for the full model and for the model where all covariates have slopes fixed at zero. But I thought those measures were more used for binary response than multinomial. And with 2level modeling they would be relevant mostly for level1 I presume, whereas Rsquare for level2 is the usual one since the DV is continuous. 

Hoda Vaziri posted on Thursday, September 25, 2014  12:17 pm



Thanks Bengt. I also have 2 other questions. 1. Does twolevel multinomial model gives the test for each of the predictor variables, so that i can see whether their inclusion significantly improves the model, or not. If not, is there any way to calculate those using the data provided? 2. The output only gives me the results comparing to the last category. Is there any way to get the results with other categories as the reference automatically? Or I should change the reference category using define command? Thanks 


1. You get a ztest for the significance of each predictor. And you can check how BIC changes. 2. You have to use Define in this model. 


I am running a twolevel multinomial model with random intercepts and fixed slopes. After reviewing the discussion board, user guide, and training handout for Topic 7, I still have a question about how/if Mplus (version 7.0) provides output for the estimated variance components for the level1 and level2 models in multinomial models. It seems like a variance estimate is provided for twolevel logistic models, but not for multinomial models? 


Both give variance estimates on level2. You may have to mention the variances on level2 to activate them. 

Yoon Oh posted on Wednesday, June 10, 2015  10:17 pm



I was running multilevel analysis of count outcome variables. The result output contains both "unstandardized model results" & "standardized model results". I found that the pvalues for each predictor variable differed between standardized & unstandardized model results. I wonder why the pvalues differ and what version should be used. Your help would be greatly appreciated. 


Please send input, output and data to support so we can tell. 

Andrew Percy posted on Thursday, December 10, 2015  10:02 am



I'm currently analysing data from a clustered RCT (pupils in schools) with a binary primary outcome (binge drinking). The model I'm using is a two level logistic with a random intercept. I have been asked (by STATA users) to calculate an effect size for the treatment effect (a binary between level covariate) preferably as a odds ratio (along with other estimates that they would be familiar with). The Heck and Thomas book indicate that the relevant parameter is a linear regression coefficient for regression of the covariate on the underlying latent response variable. However, I'm struggling to understand the scale of the parameter. Is it on a log odds scale? 


This type of general statistical question should be posted on a general discussion forum like SEMNET or MultilevelNet. 


If you use twolevel analysis with a binary outcome and ML you get the same results as in Stata, so ask your Stata colleagues what kinds of effect size measures they use and then you can create the same ones using Model Constraint. On the other hand, if you are talking about the effect of a binary betweenlevel covariate then the DV is a continuous variable and regular linear regression effect estimates apply (the DV is continuous because you consider the random intercept of the binary outcome). 


I am running a 2level model (students in schools) using 12 studentreport categorical ordinal (scale 14) indicators that are nonnormal to create 3 latent outcome variables (within) and same 3 latent outcome variables (between). My within independent variables are categorical (gender, SES, grade level), and my between independent variables are continuous (3) and binary (1). I declared the 12 categorical indicators using CATEGORICAL ARE. I specified in Analysis:Type= twolevel; Estimator = wlsmv; and requested Standardized in Output. Which output I should use to report standardized results for within and between  STD for both? Or STDYX for continuous and STDY for categorical? Thanks for your guidance! 


For factor loadings use StdY. For binary covariates, use StdY. For continuous covariates, use StdYX. 

L posted on Sunday, April 16, 2017  6:45 am



I am testing several hypotheses based on a multilevel mediation model. All of them have continuous predictors, continuous mediators and categorical outcome with 4 categories. Here's the code (modified from Preacher et al., 2010) that I am planning to use. USEVARIABLES ARE QA FPreD PBF Prior; CLUSTER IS QA; Categorical= Prior; ANALYSIS: TYPE IS TWOLEVEL; Estimator is WLSMV; MODEL: %WITHIN% PBF on FPreD(aw); Prior on PBF(bw); Prior on FPreD(cpw); %BETWEEN% FPreD PBF Prior; PBF on FPreD(ab); Prior on PBF(bb); Prior on FPreD(cpb); MODEL CONSTRAINT: NEW(indb indw); indw=(aw*bw); indb=(ab*bb); OUTPUT: TECH1 TECH8 Tech3 CINTERVAL; 1. Are my modifications appropriate? 2. Should I be using the MLR estimator? 3. Further, most of the indirect effects at the between person level (where our interest lies) are not significant, in which case, does it make sense to interpret the probit coefficients? 


1. This looks ok, except that the indirect effect on Within concerns the continuous latent response variable behind your ordinal outcome. For causal effects pertaining to the observed ordinal outcome, a counterfactual approach is needed (but more advanced given the 2level model). 2. You can use WLSMV, ML, or Bayes. 3. It sounds like between person is your Within level given that you mention probit coefficients. You can interpret probit coefficients if they are significant (even if indirect effects are not). 

L posted on Wednesday, April 19, 2017  2:06 pm



Dr. Muthen, Thank you so much for your inputs. I have a couple of followup questions. My model itself consists of 8 continuous IV, 8 continuous mediators and 1 outcome variable (categorical). I have alternate continuous variables that I can use as outcome variables, but the reliability could be somewhat low for the continuous variable since it appears that participants have interpreted the question (on how much time they spent on a work or family activity, when a conflict occurred) differently. So, I am using a categorical variable (which activity did you prioritize when a conflict arose? 1. work, 2. both, 3. family 4. neither) as outcome (level 1 variable) instead. While I have done my initial analysis assuming this could be thought of as ordinal, would it be a mistake to make that assumption. So, a regular SEM framework would not work given the categorical outcome. Would you have any references for how to adopt a counterfactual approach with multilevel data? 

John C posted on Wednesday, August 02, 2017  5:56 pm



Hello, I would like to check on the correct syntax for a model with a binary outcome and two predictors, one binary and the other continuous. Both predictors do not vary within a cluster. Only the outcome variable can vary within a cluster. Is the following syntax correct; in particular, does one leave the %within% section empty (even though the outcome is not level 2)? Variable: ... CATEGORICAL ARE u1; USEVARIABLES ARE u1 u2 x1; CLUSTER = clus; BETWEEN = u2 x1; Analysis: Type is TwoLevel; Model: %within% %between% u1 on u2 x1; 


With a binary outcome you have nothing on Within. On Between you have the threshold and the residual variance for the continuous between part of u: [u$1]; u; 

John C posted on Monday, August 07, 2017  7:43 am



Hello, I also have a mediation relationship at the between level, where u2 mediates the effect of x1 on u1. It seems you cannot request estimates for indirect/total effects with multilevel, but are the indirect estimates done by means of the model constraint command still valid? 


Mediated effects on Between are formulated just like for singlelevel models  yes, you can use Model Constraint to express that. 


Hello, My question is similar to John C's with one variation. My DV varies both within and between. It is binary at the within person level and thus at the between level is the proportion of 0s/1s. I have one binary(experimental condition) and several continuous predictors and am also interested in two and three way interactions between the binary and continuous predictors. My DV is zeroinflated. I am not sure what the best way to model this data is. Do you suggest using the model that John C is using (leaving the withinperson part of the model blank?) If so, could you please explain to me what exactly MPLUS is doing in this model? How is it modelling the DV/what is being predicted? Alternatively, as I am not interested in predicting anything at the withinperson level, would it be better to not model this as multilevel, and instead compute a count variable counting the 1s (vs Os) that exist at the withinperson level, and use a negative binomial regression predicting this count variable at the between person level, to account for the fact that the data is zeroinflated? And is my understanding correct that in the negative binomial model the DV is the probability of non0 scores? If I create a count variable the scores range from 0 to 215 and are 0 inflated. Thank you 


I don't know that there is any point in doing twolevel analysis when there is no within variation to model (as in this case). The negbin model concerns outcomes that range from zero and up. Inflated models add a part for the probability of being at zero. 


Thank you. Not sure if I was clear about this before, but the DV does vary within. It is a binary variable collected at many moments across time, so participants have scores of 0 or 1 for each moment. But we are not interested in predicting this variable at the within person level. Is it still not worth modelling it as twolevel in your opinion? 


Right, but there is no variance parameter on Within; the binary variation is taken care of by the threshold parameter given that for a binary y, V(y) = p*(1p) where p is the prob which is a fcn of the threshold. So maybe not worth modeling as twolevel. 

sunil posted on Thursday, November 30, 2017  11:14 am



Hello, I am trying to estimate the following model. I have unbalanced panel with observations at individual and business unit level. The individual level variables are continuous in nature and are nested within the business unit level variables that are categorical in nature. The model is a random slope and intercept model with first a DV that is measured at the individual level (mediator) with an ultimate DV at the business unit level that is categorical in nature. What I gathered from the manual is that for unbalanced data we can use MUML estimators but MUML works for continuous variables only. Is there a work around that I can focus on. Any help will be much appreciated. 


Full ML is available in newer versions of Mplus for any type of DV. See examples in the User's Guide on our website, chapter 9. 


Dear Bengt, dear Linda, we are estimating a twolevel model for categorical outcomes (measured at both levels) with a Bayesian estimator. We wanted to compute predicted probabilities and noticed that the derived probabilities deviate extremely from raw proportions in the data (even in the model where only the DV is included, without any predictors). However, if we use standardized thresholds in the calculation, we do get probabilities that are very close to raw proportions. Is it really the case that we need to use standardized thresholds if we have a DV measured on both levels? Excerpts from our output: UNIVARIATE PROPORTIONS AND COUNTS FOR CATEGORICAL VARIABLES SMOKEA Category 1 0.665 18313.000 Category 2 0.335 9224.000 Between Level Thresholds SMOKEA$1 1.791 Variances SMOKEA 16.659 STDYX Standardization Between Level Thresholds SMOKEA$1 0.438 Variances SMOKEA 1.000 


Please send your output to Support along with your license number. Show how you calculate the probabilities. With random effects, it is not straightforward. 


Dear Mplusteam, if I use algorithm = GAUSSHERMITE to fit a twolevel logistic model, does this correspond to the Approach outlined by Liu and Pierce (1994)? best Christoph Liu, Q., & Pierce, D. A. (1994). A note on Gauss—Hermite quadrature. Biometrika, 81(3), 624629. 


Not quite because by default Mplus uses adaptive quadrature. You need both of these commands algorithm = GAUSSHERMITE; adpative=off; to get that method. 


Thanks, actually I'm looking for the performance of the different algorithms (bias of level 2 variance component in a Nullmodell) when there is a small number of clusters. The Liu and Pierce approach as implemented in the glmer Rpackage was used in a simulation study by Austin (2010). Do you have internal simulation results or do you know of other studies on the performance of the available algorithms? Christoph 


Our own unpublished simulations, which are used for settung up Mplus defaults and have a slightly different objective (than your objective of optimal performance with small level 2 units), showed that the rectangular (evenly spaced integration points) adaptive integration is generally slightly better than the alternatives. You may find this articles useful and find references for other related articles http://www.statmodel.com/bmuthen/articles/Article_128.pdf 


Thanks again, I've conducted a small MCstudy regarding the bias of the level 2 variance component. Now I want compare MLEstimation with bayes estimation, where only a probit link is availaible. How is this best done? Use a probit population model that is equivalent to the logit Parameters? 


You can get only approximate correspondence between logit and probit. I would recommend comparing ML and Bayes by using the link=probit option for ML. 


Is this done by just adding link = Probit in the analysis section? I compared a probit model with a logit model (roughly the same population Parameters based on empirical data) and the probit model (ML) yields much a greater bias in the threshold as well as in the L2variance than the logit model. What might be the reason for this? 


Q1: Yes. Q2: This should not be  perhaps you have set it up incorrectly. 


I used thresholds of 2.197 (logit) and 1.282 (probit) corresponding to 10% of category 2 in the outcome. The Level 2 variances are 0.41 (logit) and 0.12 (probit). Thus, set up should be roughly equivalent, isn't it? Bias in thresholds is 0,2% (logit) and 39% (probit) and in variance estimates 18% (logit) 73%(probit). I generated data for 7 clusters of size n = 35. 


Doesn't seem plausible but I would have to see the full outputs to see what's going on. 7 clusters for twolevel analysis is much too few with ML  even with continuous outcomes. 


may I send the Outputs? regarding number of clusters: A simulation study by Austin (2010; including also a L2predictor) shows that the L2variance component (and this is the only estimate I'm interested in) is biased about 10% for MLbased estimators for 7 clusters. My model doesn't include predictors, thus I was interested wheter ML would work even for 7 clusters. 


Please send them to Support (I think you have an uptodate license). 


Hello, we would like to test for the effects of various predictors on the probability to experience a certain event. The data contain many waves, and with these data, we usually use twolevel modeling with clustering within participants. My question is, what is the most sensible way to disentangle the within and between variance in the outcome? 1. Within: for each wave, code 1/0 whether the event was experienced or not between the waves; Between: code 1/0 for the event had ever been experienced (the events are not exactly rare but not very frequent either). 2. Within: same as above; Between: use individual rate of experiencing the event (however, most experience it only once). 3. Use the latent variance decomposition and centering (in this case I don't understand the meaning of within coefficients). 4. Don't use twolevel analysis and switch to type=complex to correct for nonindependence of observations within participants (but in this case, interindividual differences in the probability to experience the event are confounded with intraindividual factors). Many thanks in advance! 


I would start with 4 and then move to 3. When you use method 3 you probably need to fix the variance of the binary outcome to 0, given that it is a rare event like that and most experience it one time. Even if you have it as a free parameter it will probably get estimated to 0. The reason you want the binary item to be withinbetween is not the latent centering but so that you can include time invariant (between level) predictors if you have such. The coefficients on the within level will have exactly the same interpretation as in method 4. In method 2, having an almost constant variable is not a great idea. In method 1 adding the additional variable also doesn't sound like a great idea  that is an observed variable that doesn't have any information in it (all the information is already contained in the within variable). 

Y.A. posted on Thursday, June 11, 2020  12:39 am



Dear Prof. Muthen, I am practicing with ex9.3 data on a twolevel logistic regression. I have read the Larsen & Merlo (2005) paper you recommended, and succeeded in calculating MOR and IOR, now I am trying to calculate predictive probability for the DV. The paper did not talk about this issue in particular, so I followed the instruction on the user guide. I assume that the logits in different level of the model can be summed up, is it reasonable? I created some new parameters, but I am not sure if my logic makes sense. Could you give me some advice please? MODEL: %WITHIN% !s1u on x1; s2u on x2; u ON x1(b1_sub) x2(b2_sub); %BETWEEN% u ON w(b1_clus); u(v);[u$1](a); model constraint: new (b1_pop b2_pop ior_l_w ior_h_w log1_l_w log1_h_w p1_l_w p1_h_w log1_x_sub p1_x_sub log1_x_pop p1_x_pop log_f5 log_f6 log_f7 log_f8 pred_f5 pred_f6 pred_f7 pred_f8 ); b1_pop=b1_sub/(sqrt(1+(16**2*3/(15*3.1415926)**2)*v)); b2_pop=b2_sub/(sqrt(1+(16**2*3/(15*3.1415926)**2)*v)); ior_l_w=exp(b1_clus*1+sqrt(2*v)*(1.2816)); !lower bound of ior ior_h_w=exp(b1_clus*1+sqrt(2*v)*(1.2816)); !upper bound of ior 

Y.A. posted on Thursday, June 11, 2020  12:41 am



syntax followup: ! logits and predictive probability of category 1 using w=1 log1_l_w=a+b1_clus*1+sqrt(2*v)*(1.2816); !lower bound of logit of cagegory 1 predicted by w log1_h_w=a+b1_clus*1+sqrt(2*v)*(1.2816); !upper bound of logit of cagegory 1 predicted by w p1_l_w=exp(log1_l_w)/(exp(0)+exp(log1_l_w)); p1_h_w=exp(log1_h_w)/(exp(0)+exp(log1_h_w)); ! logits and predictive probability of category 1 using x1=0.3 and x2=0.6 log1_x_sub=a+b1_sub*0.3+b2_sub*0.6; p1_x_sub=exp(log1_x_sub)/(exp(0)+exp(log1_x_sub)); log1_x_pop=a+b1_pop*0.3+b2_pop*0.6; p1_x_pop=exp(log1_x_pop)/(exp(0)+exp(log1_x_pop)); ! full model logits and predictive probability of category 1 using w=1, x1=0.3, and x2=0.6 log_f5=log1_l_w+log1_x_pop; log_f6=log1_h_w+log1_x_pop; log_f7=log1_h_w+log1_x_pop; log_f8=log1_h_w+log1_x_pop; pred_f5=exp(log_f5)/(exp(0)+exp(log_f5)); pred_f6=exp(log_f6)/(exp(0)+exp(log_f6)); pred_f7=exp(log_f7)/(exp(0)+exp(log_f7)); pred_f8=exp(log_f8)/(exp(0)+exp(log_f8)); 

Y.A. posted on Thursday, June 11, 2020  3:15 am



Sorry, the full model part should be as: ! full model logits and predictive probability of category 1 using w=1, x1=0.3, and x2=0.6 log_f1=log1_l_w+log1_x_pop; log_f2=log1_h_w+log1_x_pop; log_f3=log1_l_w+log1_x_sub; log_f4=log1_h_w+log1_x_sub; pred_f1=exp(log_f1)/(exp(0)+exp(log_f1)); pred_f2=exp(log_f2)/(exp(0)+exp(log_f2)); pred_f3=exp(log_f3)/(exp(0)+exp(log_f3)); pred_f4=exp(log_f4)/(exp(0)+exp(log_f4)); My questions are: 1. how to estimate the random slope of x1 and x2 on the binary u? 2. is the betweenlevel [u$1](a) the right parameter to be used for calculating the predictive probability within level x1 and x2? 3. can the logits calculated using between level w and within level x1 and x2 be summed up to an overall logits for subsequent calculation of the full model predictive probability? I hope I describe my questions in clearly. Thank you very much! 


We need to see your full output  send to Support along with your license number. Also, we request that postings be limited to one window  if it can't fit, send to Support. 


I have a question regarding correct specification of a multilevel model with a categorical L1 outcome. Is it necessary to model the variance of the outcome at L2 to correctly specify the model, and if so, why? Example: MODEL: %WITHIN% cnsq ON day013; %BETWEEN% cnsq; !is this line required? cnsq ON male; Many thanks. 


I think you get the residual variance of cnsq on Between even without mentioning it given the regression on male. But you do want it estimated because csn on Between is a random intercept which is likely to vary and its variation not be fully explained by male. 

Back to top 