Message/Author 

Anonymous posted on Friday, January 07, 2000  12:20 pm



Some multilevel packages have begun to include methods for multilevel models with binary and other categorical outcomes. Does MPLUS plan to develop procedures for multilevel models with categorical outcomes? If yes, when might we look for those to become available? 


This is an important topic. We continue to expand Mplus modeling opportunities in many ways. To both not disappoint people and not place undue pressure on ourselves, we try to avoid preannouncements of our developments. 

David Klein posted on Tuesday, April 10, 2001  11:16 am



I have data with probability weights and a categorical outcome. I'd like to run a weighted analysis (structural equation modeling) but I want to make sure that simply adding the WEIGHT statement will adjust the standard errors properly. A read of the manual gives me the feeling that I should use TYPE=COMPLEX analysis, but that requires that I use a continuous outcome, which is not the case. The manual also says that the WEIGHT statement "identifies the variable that contains case or sampling weight information," but does not say how to differentiate between case vs. sampling weights or whether MPlus in fact treats them differently. Any insight would be greatly appreciated. Thanks! 


Yes, just adding the WEIGHT statement is sufficient. TYPE=COMPLEX requires that you have a cluster variable and that the outcomes be continuous. If you do not, just use the WEIGHT statement alone. Mplus does not treat case or sampling weights differently. 

Yeong Chang posted on Monday, September 10, 2001  1:15 pm



I hope the person who started this topic would check this out: In your message, you said: Some multilevel packages have begun to include methods for multilevel models with binary and other categorical outcomes. I hope you could name one or two of these packages because I want to use them for my analysis. Thanks a lot. Another question is for Muthen & Muthen: Are we closer to the stage where MPlus can do multilevel modeling with binary outcomes? 


Packages that can handle categorical outcomes are Mixor, MLWin, and HLM as far as I know. We are currently working on adding categorical outcomes to the multilevel modeling in Mplus. 

Yeong Chang posted on Thursday, September 13, 2001  2:27 pm



Thank you, Linda. Among the three packages you mentioned (Mixor, MLWin, and HLM), can any of them do multilevel factor analysis (like CFA with categorical outcomes)? I guess HLM cannot, but I don't know much about Mixor or MLWin. Thanks again. 


Not to my knowledge. 


What are the risks and implications of modeling dichotomous indicators of latent continuous variables as if they were continuous, nonnormal indicators under the complex sampling paradigm? 


The issues of modeling dichotomous indicators as though they are continuous are the same for clustered data as for nonclustered data. As you get away from a 50/50 split, correlations become attenuated. 


But since the COMPLEX modeling procedure allows for nonnormally distributed variables, isn't the affect of unequally distributed dichotmous indicators also accounted for? 


Parameter estimates are not adjusted for nonnormality when a robust estimator is used. Only standard errors and chisquare test statistics are adjusted. With dichotomous indicators that are not 50/50, the sample statistics (correlations) will be attenuated. This means that the parameter estimates will be distorted. So although the standard errors may be correct for the parameter estimates, the parameter estimates themselves will not be correct. 


Help wanted for some basic questions: I want to explore alternative descriptions of the same model: The data consists of 508 persons making 9 responses to stated choice experiments. If we initally ignore the fact that each person gave 9 responses, we can view the data as 4572 cases. The model is then simply: categorical ch1; ch1 on pd1 td1 wd1; The (dichotomous) choice is dependent on a price difference, time difference and waiting time difference. A multilevel approach could be applied to check the size of the within subject variation and possible to improve the estimates. However, how do we specify that the 9 responses for each cluster are correlated? Type=complex; and cluster=id; and what more??  The data could alternatively be dealt with by applying a multivariate framework where the 9 choices are represented as 9 variables. The number of cases is then 508. The simple model then looks like this: ch1 ON pd1 (1); ch2 ON pd2 (1); ch3 ON pd3 (1); ch4 ON pd4 (1); ch5 ON pd5 (1); ch6 ON pd6 (1); ch7 ON pd7 (1); ch8 ON pd8 (1); ch9 ON pd9 (1); ch1 ON Td1 (2); ch2 ON Td2 (2); ch3 ON Td3 (2); ch4 ON Td4 (2); ch5 ON Td5 (2); ch6 ON Td6 (2); ch7 ON Td7 (2); ch8 ON Td8 (2); ch9 ON Td9 (2); ch1 ON wd1 (3); ch2 ON wd2 (3); ch3 ON wd3 (3); ch4 ON wd4 (3); ch5 ON wd5 (3); ch6 ON wd6 (3); ch7 ON wd7 (3); ch8 ON wd8 (3); ch9 ON wd9 (3); Again I wonder how to specify that the 9 choices are correlated (since the dependent variables are categorical the with statement will not work). Could one introduce a factor f by {ch1ch9@1} to signify that the 9 choices share a common identical source of error? Or should one introduce 9 continous formative indicators of the respective pd td and wd inputs and let each of them list the respective dichotomous choice as their sole indicator? f1 by ch1; ... f9 by ch9; [ch1ch9](4); !constrain intercepts to be equal f1 on pd1(1); f1 on td1(2); f1 on wd1(3); ..... f1f9 with f1f9(5); I am thankful for any answers 


To account for clustering in the univariate approach that you showed where cluster is id, saying CLUSTER = ID; and TYPE = COMPLEX is all you have to do to account for the lack of independence of observations. Regarding the multivariate framework, you could do one of the following: Weighted least squares estimator MODEL: ch1ch9 ON pd1pd9 td1td9 wd1wd9; Maximum likelihood estimator f BY ch1ch9; f ON pd1pd9 td1td9 wd1wd9; 

CMW posted on Sunday, June 26, 2005  12:20 pm



Greetings, In example 9.2 in the Mplus manual on page 213 of twolevel regression analysis for a categorical dependent variable, is that probit or logistic regression? Also, assuming a model like example 9.2 with a random slope, how can I get the actual slope estimate for each person? CMW 

BMuthen posted on Tuesday, June 28, 2005  8:13 am



It is a logistic regression. You can get this by requesting factor scores in the SAVEDATA command. 


I have data in which nearly 1000 children were recruited among 60 doctors. I want to include child self reported symptoms and doctor reported child symptoms in the same latent class model. My only outcome is to find latent classes combining the doctor and child reports of child symptoms. The symptoms are binary. 1) Is there anything theoreitcally or conceptually violated by including data from multiple levels in a single latent class analysis  i.e. including responses from doctors and children about the child in the same model? If they are measures of the underlying latent construct and are conditionally independent then it seems acceptable. 2) Can MPLUS take into account the clustering of children around doctors when the items are binary? There is no regression outcome  just latent classes based on the binary symptoms. How do I know if its even necessary to take into account the clustering? 3) If yes the second question, could you please direct me to where I would find the syntax for clustering? 3) I have not found any 'multilvel' latent class analysis? Could you provide any references or examples of this, in which for example, latent classes are found using community level, school level, teacher level, child level data, not from the same reporter? Thank you so much for having an excellent site and for all your help. 

BMuthen posted on Monday, September 05, 2005  2:34 pm



The doctor and the child measures are not independent but can be analyzed in a single model by using TYPE=TWOLEVEL MIXTURE; or TYPE=COMPLEX MIXTURE; when symptoms are binary. See Chapter 10 Example 10.3 for multilevel latent class analysis. 

CMW posted on Thursday, September 15, 2005  12:15 pm



Greetings, In the context of twolevel logistic regression with a random slope, does Mplus compute empirical bayes estimates of the slope for each cluster? CMW 


These can be obtained by requesting factor scores using the FSCORES option of the SAVEDATA command. 


Thank you for the reply in September. Now I am doing a twolevel logistic regression with a random intercept as well as a random slope and would like to get EB estimates of the intercepts. When I use this code: "savedata: file is slopesall.out; save = fscores;" I get EB estimates of slopes only. Is there a way to get estimates of the intercepts too? CW 


It sounds like you are using an old version of Mplus. You should get factor scores for both. If this is not the case, please send your input, data, output, and license number to support@statmodel.com. 


Does the Mplus 3.0 can do the multilevel SEM with two continuous latent variables(CFA) that are both with categorical indicators? 


Following the question, after I exploringly operate the two level model, the output state"*** FATAL ERROR THERE IS NOT ENOUGH MEMORY SPACE TO RUN THE PROGRAM ON THE CURRENT INPUT FILE. YOU CAN TRY TO FREE UP SOME MEMORY BY CLOSING OTHER APPLICATIONS THAT ARE CURRENTLY RUNNING. ANOTHER SUGGESTION IS CLEANING UP YOUR HARD DRIVE BY DELETING UNNECESSARY FILES".What is the basic need for hardware. How do I overcome this problem? Thanks for help. Following is the syntax: data: file is ag.prn; variable: names are cl edu inc gen bmi year at1at7 ba1ba5 em im; usevariables are at1at7 ba1ba5; categorical are at1at7 ba1ba5; cluster is cl; analysis: type is twolevel random; model: %within% aw1 by at1at3; aw2 by at4at7; bw1 by ba1ba3 ba5; bw2 by ba4; s1  bw1 on aw1; s2  bw1 on aw2; s3  bw2 on aw1; s4  bw2 on aw2; %between% ab1 by at1at3; ab2 by at4at7; bb1 by ba1ba3 ba5; bb2 by ba4; output: tech1 tech8; 


The model you are estimating requires numerical integration. There is a section on numerical integration and suggestions for using numerical integration in Chapter 13 of the User's Guide. If you are still having problems, please send your input and data to support@statmodel.com. 

Boliang Guo posted on Friday, January 12, 2007  10:11 am



when conducting 2 level logistic regression y on X, I find the result from MLwiN is quit different from MPlus with the same data, but the results are virtually same when I run 1 level logistics regression with the same data. I test the reult with ex9.2 data with U on X with both MLwiN and Mplus, the results are almost same when I run a nonrandom slop rgression, but if I set the slop random, the result will be different. is the different result due to the algorithm used in each software? 


The results should be the same if you use the ML estimator in MLwiN (I think they also have a PQL estimator). If you use ML, please send input, output (from both programs), data and license number to support@statmodel.com. 


I have just finished running a multilevel analysis (using TWOLEVEL RANDOM) with a binary outcome (smoking). As part of the output I received a Threshold value at the Between level for the binary outcome that has a value of .086. How is this value interpreted? Can it be transformed into a more easily interpreted value (e.g., as logits can be transformed into odd's ratios)? Thanks in advance! 


It is in a logit scale. See the section in Chapter 13 Calculating Probability Coefficients From Logistic Regression Coefficients to see how the threshold is used. In the example, intercepts are used rather than thresholds. Note that an interecept is the threshold with the opposite value. 


In a twolevel multinomial logistic regression, Mplus v6 gives odds ratios only for the within but not the between part of the model. Why is that? 


The variable in the betweenpart of the model is a continuous random intercept. 

Susan Haws posted on Wednesday, November 30, 2011  4:35 pm



I am estimating multilevel models using type=complex twolevel, with binary outcome smoking (0/1). Slope coefficients are fixed. Are the Rsquared within and Rsquared between estimates provided in the MPlus output legitimate ways to convey proportion of variance explained at each level? Secondly, is it safe to compare them as one builds models in steps, for example, comparing the R2 between before and after adding level 2 predictors? 


Q1. These Rsquares have been proposed in for example the multilevel book by Snijder & Boskers. Q2. Rsquare can change in unexpected ways due to the two levels interacting. See the multilevel literature such as the Raudenbush & Bryk book. 


Dear Dr. Muthen, I have the following problem: I want to transform the results of a twolevel logit model into probabilities. But apparently I make some mistakes, because I get values which don't seem correct. Now I tried a twolevel model using only the binary outcome Y (fixed intercept). According to univariate proportions Y = 1 amounts for 36%. The treshold is 0.581. Now I calculated p (Y=1) as 1/(1+exp(0.581) = 0.64, what is actually p(Y=0). If I treat Y as continous, the intercept = 0.36. Shoudn't p also be 0.36? Some more infos: I use withinweigts and WTSCALE IS UNSCALED; Thanks Christoph Weber 


Sorry, I already found the mistake. p(Y=1) = 1/(1+exp(a), and a = t. Isn't it? 


Getting this probability requires taking into account the distribution of the random effect which can be done only using numerical integration. 


Could you explain the required steps, or give some references? I use the following input (Now with random intercept): Is this correct? weight = wgtstud; WTSCALE IS UNSCALED; cluster = idclass; USEVARIABLES = ahs ; MISSING = all (99); categorical = ahs ; Analysis: type = twolevel; algorithm = integration; MODEL: %within% %between% ahs; I get a treshold = 0.769, thus p=0.32? Thanks 


See Larsen & Merlo (2005). Appropriate assessment of neighborhood effects on individual health: Integrating random and fixed effects in multilevel logistic regression. American Journal of Epidemiology, 161, 8188. and also our teaching on slides 6066 of the Topic 7 handout and video on our web site. 

Jan Super posted on Friday, September 28, 2012  8:56 am



Hello, We are trying to run a multilevel analysis with a 2212 model. Our primary IV is categorical (paytype) and our DV is categorical (ranking). MPLUS won't allow us to use a withinlevel predictor for a betweenlevel DV that has been declared categorical. We're getting error messages such as the following: *** ERROR in MODEL command Variances for betweenonly categorical variables are not currently defined on the BETWEEN level. Variance given for: GP_E_RNK A sample of our code is as follows: MISSING ARE ALL (999999); CLUSTER IS Grp_N; CATEGORICAL = pay_t gp_E_rnk; BETWEEN ARE pay_t disc_t gp_E_rnk; ANALYSIS: TYPE IS TWOLEVEL ; MODEL: %WITHIN% tot_Ed; %BETWEEN% pay_t disc_t gp_E_rnk; disc_t on pay_t (a1); tot_Ed on disc_t (b1); gp_E_rnk on tot_Ed (c1); gp_E_rnk on disc_t (d1); tot_Ed on pay_t; Thanks in advance for your help. 


You mention the variances of categorical variables. Their variances are not model parameters. You should remove the variances. Also, observed exogenous variables should not be on the CATEGORICAL list. The CATEGORICAL list is for dependent variables only. 

Jan Super posted on Friday, September 28, 2012  2:36 pm



Thank you for your reply. I'd like to ask a followup question. Could you elaborate on removing variances from dependent categorical variables? We aren't sure how to go about that. Thanks, Jan 


Remove pay_t from the CATEGORICAL list. I find it only on the righthand side of ON. Remove the variances: pay_t disc_t gp_E_rnk; gp_E_rnk is categorical so no variances are estimated. You should not mention the variance of pay_t because it is an observed exogenous variable. The other residual variance is estimated as the default. If you run into other errors, send the full output and your license number to support@statmodel.com. 

Kate Song posted on Friday, April 12, 2013  1:43 pm



Hi Linda. I am using 2level ordered logit model. For mplus, the code I use as following. VARIABLE: NAMES ARE a3_le a3_eff id_yr health a3_p; CATEGORICAL = health; BETWEEN = a3_le a3_p a3_eff; CLUSTER IS id_yr; ANALYSIS: TYPE = TWOLEVEL; ESTIMATOR = WLSM; MODEL: %BETWEEN% health on a3_p a3_le a3_eff; Health is the categorical variable and only one withinvariable. However, I've compared the results with STATA ordered logit model. It looks so different. Could you please explain the reason? FYI, I uses the following code for STATA for the same model. gllamm health a3_p a3_le a3_eff, i(id_yr) link(ologit) Thank you. 


You would need to send the Mplus and STATA outputs and your license number to support@statmodel.com. Be sure you are using the same estimator in both analyses and that your sample sizes are the same. 


Dear Drs. Muthén, we are trying to analyze diary data with Mplus. We use emotion as independent variable and a dichotomous (defined as categorical in the syntax) dependet variable on level 1. Since the daily emotion are dependent on the individual (level 2), we try to model emotion as a random effect. However, mplus can't compute the model, showing the following error message: Unrestricted xvariables for analysis with TYPE=TWOLEVEL and ALGORITHM=INTEGRATION must be specified as either a WITHIN or BETWEEN variable. So my question is: Is there a way to compute a multilevel analysis with a categorical dependent variable with a random effect? And if so, how can I perform the analysis? Thank you in advance. Andreas 


Please send the output with the error and your license number to support@statmodel.com so I can see the full context. 

Adel Pwell posted on Monday, August 26, 2013  9:09 am



When modeling multilevel categorical data with sample size 20: I am getting an error message THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NONPOSITIVE DEFINITE FIRSTORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.646D17. PROBLEM INVOLVING PARAMETER 20. Parameters 1520 are under Tau and I inputted the cut points when I generated the data. So what could this error point to as to why the standard error could not be estimated. 


Please send the output and your license number to support@statmodel.com. 


I have a hierarchical dataset (people nested within states), a dichotomous outcome, a number of withinlevel covariates, and a single betweenlevel dichotomous predictor. Here is some of my code: CATEGORICAL = drunkdr; BETWEEN = spsobch; CLUSTER = state; STRATIFICATION = strata; WEIGHT = weight; WTSCALE = ECLUSTER; Analysis: TYPE = TWOLEVEL COMPLEX; Model: %BETWEEN% drunkdr ON spsobch; I want to interpret the effect of spsobch (a dichotomous L2 predictor) on my dichotomous outcome (drunkdr). I understand that the within part of the model is regressing the L2 random intercept on the predictor. Here is the output for this part of the model: MODEL RESULTS Between Level DRUNKDR ON SPSOBCH 0.280 0.058 4.803 0.000 Thresholds DRUNKDR$1 6.028 0.088 68.774 0.000 Residual Variances DRUNKDR 0.134 0.028 4.744 0.000 I'm struggling with how to interpret the estimate = 0.28. Given my outcome variable is dichotomous, is this mean clusters with a 1 on the predictor have an average proportion of .28 lower than those with a 0? Is there a way to output the random intercepts for each of the clusters? Thanks, Darin 


The variable drunkdr is a random intercept on the between level. 0,28 is a linear regression coefficient. 


I am trying to calculate predicted probabilities for certain values of a higherlevel covariate using a multilevel model with a 4category ordinal dependent variable and a crosslevel interaction term. From what I've read on the forum, I must calculate this by hand. I have tried to write out the equation so I can input covariate values, but I cannot visualize how the pieces fit together in a single equation. The inclusion of the crosslevel interaction term is confusing me. Any help you can offer in interpreting the output would be greatly appreciated. MODEL: %WITHIN% worry ON jobsec easefind union private univ abovehi higher age spwrkft; %BETWEEN% worry ON plmp unemp; 


Apologies for posting the wrong equation. The correct equation is below: MODEL: %WITHIN% worry ON jobsec easefind union private univ abovehi higher; %BETWEEN% easefind worry ON almp ; worry ON unemp ; 


I assume "worry" is your categorical DV. With categorical DVs, the multilevel model involves numerical integration to get the probabilities. Also, I don't see a crosslevel interaction term. That would appear as e.g. %within% s  worry on jobsec; %between% s on w; so that w*jobsec is the interaction. 


Perhaps I used the wrong terminology (maybe the term is crosslevel effect rather than interaction?). I only want random intercepts. Here is what I am trying to simultaneously achieve: (1) Categorical DV worry is regressed on individuallevel IVs (jobsec easefind union private univ abovehi higher). (2) DV worry is regressed on countrylevel IVs (almp unemp). (3) easefind, an IV at the individual level, becomes a DV at the country level and is regressed on a countrylevel IV (almp). In short, I'm trying to create a multilevel mediation model where X is measured at a higher level (country) than the mediator or Y, both of which are measured at the individual level. It sounds like calculating predicted probabilities for such a model might not be possible, at least not by hand, if their computation requires numerical integration. If not, how can it be done? 


See the multilevel teaching we do on categorical DVs in our short course website handout and video from Johns Hopkins  Topic 7. 


Hi, I am trying to fit a twolevel model with ordered categorical dependent variables and two measurement time points. I tried to include measurement invariance by constraining the thresholds to be equal over time. Doing this I get an error message saying Internal Error Code: PARAMETER EQUALITIES ARE NOT SUPPORTED. Is it not possible to constrain thresholds in this kind of models? Even if I just try to assign variable names to the threshold variables I get the same error message... Thanks! 


Please send your output and license number to support@statmodel.com. 

Hoda Vaziri posted on Thursday, September 18, 2014  1:12 pm



Does the twolevel multinomial regression provide any chisquare test for fit? It does give me ligliklihood and AIC, but I can't run a null model to get the logliklihood for the null model to compare. What about R^2? how can I compute that? 


Q1. No. You can run a null model by fixing slopes at zero. Q2. Which Rsquare reference do you have in mind? 

Hoda Vaziri posted on Friday, September 19, 2014  7:34 am



Thank you! I mean Cox and Snell's Rsquare, Nagelkerke's Rsquare, or Hosmer & Lemeshow's Rsquare. 


Those can be obtained by getting the likelihood for the full model and for the model where all covariates have slopes fixed at zero. But I thought those measures were more used for binary response than multinomial. And with 2level modeling they would be relevant mostly for level1 I presume, whereas Rsquare for level2 is the usual one since the DV is continuous. 

Hoda Vaziri posted on Thursday, September 25, 2014  12:17 pm



Thanks Bengt. I also have 2 other questions. 1. Does twolevel multinomial model gives the test for each of the predictor variables, so that i can see whether their inclusion significantly improves the model, or not. If not, is there any way to calculate those using the data provided? 2. The output only gives me the results comparing to the last category. Is there any way to get the results with other categories as the reference automatically? Or I should change the reference category using define command? Thanks 


1. You get a ztest for the significance of each predictor. And you can check how BIC changes. 2. You have to use Define in this model. 


I am running a twolevel multinomial model with random intercepts and fixed slopes. After reviewing the discussion board, user guide, and training handout for Topic 7, I still have a question about how/if Mplus (version 7.0) provides output for the estimated variance components for the level1 and level2 models in multinomial models. It seems like a variance estimate is provided for twolevel logistic models, but not for multinomial models? 


Both give variance estimates on level2. You may have to mention the variances on level2 to activate them. 

Yoon Oh posted on Wednesday, June 10, 2015  10:17 pm



I was running multilevel analysis of count outcome variables. The result output contains both "unstandardized model results" & "standardized model results". I found that the pvalues for each predictor variable differed between standardized & unstandardized model results. I wonder why the pvalues differ and what version should be used. Your help would be greatly appreciated. 


Please send input, output and data to support so we can tell. 

Andrew Percy posted on Thursday, December 10, 2015  10:02 am



I'm currently analysing data from a clustered RCT (pupils in schools) with a binary primary outcome (binge drinking). The model I'm using is a two level logistic with a random intercept. I have been asked (by STATA users) to calculate an effect size for the treatment effect (a binary between level covariate) preferably as a odds ratio (along with other estimates that they would be familiar with). The Heck and Thomas book indicate that the relevant parameter is a linear regression coefficient for regression of the covariate on the underlying latent response variable. However, I'm struggling to understand the scale of the parameter. Is it on a log odds scale? 


This type of general statistical question should be posted on a general discussion forum like SEMNET or MultilevelNet. 


If you use twolevel analysis with a binary outcome and ML you get the same results as in Stata, so ask your Stata colleagues what kinds of effect size measures they use and then you can create the same ones using Model Constraint. On the other hand, if you are talking about the effect of a binary betweenlevel covariate then the DV is a continuous variable and regular linear regression effect estimates apply (the DV is continuous because you consider the random intercept of the binary outcome). 


I am running a 2level model (students in schools) using 12 studentreport categorical ordinal (scale 14) indicators that are nonnormal to create 3 latent outcome variables (within) and same 3 latent outcome variables (between). My within independent variables are categorical (gender, SES, grade level), and my between independent variables are continuous (3) and binary (1). I declared the 12 categorical indicators using CATEGORICAL ARE. I specified in Analysis:Type= twolevel; Estimator = wlsmv; and requested Standardized in Output. Which output I should use to report standardized results for within and between  STD for both? Or STDYX for continuous and STDY for categorical? Thanks for your guidance! 


For factor loadings use StdY. For binary covariates, use StdY. For continuous covariates, use StdYX. 

Back to top 