Message/Author 

Anonymous posted on Wednesday, April 27, 2005  6:58 pm



Hi everyone, I am a newbie to MPLUS. I have a question about basic logic behind the path analysis with categoical outcome. Why is the probit regression set to be the default model to estimate the path coefficients? Why not use logit regression since it is preferable in population studies? I check the user's guide, but it does not explain the reason. Does anyone come across any paper that explain why? Thanks in advance! 


The probit model is the default in Mplus because it is a more general model than the logit model for multivariate dependent variables. The logit model is also available in Mplus. There are general statistical references describing multivariate logistic distributions and their restrictions related to correlations. 


This is my first time using Mplus as well as doing path analysis. I have dichotomous outcomes and independent variables that are either dichotomous or nominal (eg. race). Here is the setup: u1, u2 and u3 are the dichotomous dependent variables and x1 and x2 are independent variables with x2 representing race categories. VARIABLE: NAMES ARE u1 u2 u3 x1 x2; CATEGORICAL IS u1 u2 u3; ANALYSIS: ESTIMATOR = ML; MODEL: u1 ON u2 u3 x1 x2; u3 ON x3 ; u2 ON u3; A few questions: a) Is this a correct set up? b)how do I specify that x2 (race) is a nominal variable? c) can the coefficients provided be interpreted as path coefficients? or how do I identify the direct and indirect effects? My goal is to have path coefficients for the relationships and a corresponding significance test. I appreciate your help. Thank you. 


If race is a nominal variable, you need to create a set of dummy variables to use as covariates. With three categories, two dummy variables are needed. You can use DEFINE to create these dummy variables. See a regression textbook if you do not know how to do this. The tyoe of regression coefficients in a path analysis are determined by the scale of the dependent variable and the estimator that is used. With ML and categorical dependent variables, logistic regression coefficients are estimated. Examples 3.11 through 3.17 show various path analyses. Example 3.16 shows how to use MODEL INDIRECT to obtain indirect effects. 


Thank you Linda for your quick response.I have been able to run the model after creating the dummy variables. Now I have two questions. 1) One is regarding indirect effects. I did follow example 3.16 and put the indirect effects as follows: VARIABLE:NAMES ARE u1 u2 u3 x1 x2 x3 x4 x5 x6; USEVARIABLES ARE u1 u2 u3 x1 x2 x4 x5 x6; CATEGORICAL IS u1 u2 u3 ; ANALYSIS:ESTIMATOR = MLR; MODEL: u1 ON u2 u3 x1 x2 x4 x5 x6; u2 ON u3; u3 ON x2; MODEL INDIRECT: u1 IND u2 u3 x2; u1 IND u3 x2; u1 IND u2 u3; I get this error message. Anything I need to do differently? *** ERROR MODEL INDIRECT is not available for analysis with ALGORITHM=INTEGRATION. 2) The data comes from a sample survey and I analyzed it separately without the indirect effects and the # of observations it uses (n=1,261) is different from the number of observations I expect to see (n=1,643). Is there anything that may cause the drop in the number of observations? Thanks again for the help. 


MODEL CONSTRAINT can be used to compute indirect effects when MODEL INDIRECT is not available. However, in Mplus with maximum likelihood and categorical mediators, this cannot be done. With categorical mediators, indirect effects should be computed using probit regression and weighted least squares estimation. If cases are not used, you should have a message to that effect, for example, observations with missing on observed exogenous variables are deleted. If this is not the case, you may be reading the data incorrectly and should send your files and license number to support@statmodel.com. 


Thanks again Linda for your help. I did the analysis with constraint. and it works fine. I created the constraints as products of the path coefficients. I have a couple of questions. a) Would that be correct even in the case with logit models? b) Can I exponentiate the indirect effect and interpret it as an odds ratio? If not how would I go about describing the extent of indirect effect? I still have problems with the sample size in the weighted vs. unweighted analysis and have sent the data and the Mplus code. Thanks for your help. 


You cannot have an indirect effect with a categorical mediator with maximum likelihood estimation. If the mediator is continuous that is okay. In this case and the final dependent variable is categorical, you can exponentiate the indirect effect to obtain an odds ratio. 


Linda, Even though I cannot have indirect effect with a categorical mediator under MLE, in my set up it is acceptable to exponentiate indirect effects obtained by multiplying the path coefficients to obtain an odds ratio. Am I understanding your comment right? If possible, I would appreciate suggestions to any references that address this issue. Thank you very much for all your help. 


This would not be correct for the model you show above because the mediator is categorical. The reason is that when the mediator is a dependent variable, it is treated as a latent response variable whereas when it is an independent variable, it is treated as a continuous variable. I know of no reference for this. The mediator must treated in the same way both when it is a dependent variable and an independent variable to compute at indirect effect. 


You mentioned a few posts ago that with categorical mediators one should use probit regression with WLS to get indirect effects. a) Would that approach work in my case? b) Can indirect effects be obtained by multiplying the path coefficients? Thanks again for all your help. 


Yes. Yes. 

Cecily Na posted on Sunday, December 12, 2010  2:22 pm



Dear Linda, I did an SEM model with categorical variables. The output has means/intercepts/threshold for each level of every categorical variable. Where can I find the overall intercept for the probit structural equation? What syntax command should I use? For example, STD [z score] = intercept + b1*crime + b2* drug; where crime and drug are all categorical variables. Thanks a lot! 


The intercept for a probit regression is under the heading Threshold. 


Hello, I am trying to run a path analysis with a binary dependent variable. I keep getting the following message: **Categorical variable TREAT contains noninteger values.** TREAT is my dependent variable. Any thoughts? Thank you. ***Melissa 


Sorry, just to add, all of my values for TREAT are either zero or 1.....so, I am not clear on why I am having a problem. So I need to change it from 0 & 1 to 1 & 2? Thanks 


You are reading the data incorrectly. You probably have blanks in the data set which are not allowed with free format data. If you can't see the problem, send the input, data, output and your license number to support@statmodel.com. 


Hello, I'm running a path model (N=10.000) with a binary (75/25split) dependent variable, 8 independent variables and indirect effects. When I specify the MODEL INDIRECT command I get the message... * The chisquare value for MLM, MLMV, MLR, ULSMV, WLSM and WLSMV cannot be used for chisquare difference testing in the regular way. MLM, MLR and WLSM chisquare difference testing is described on the Mplus website. MLMV, WLSMV, and ULSMV difference testing is done using the DIFFTEST option. ...and when using MODEL CONSTRAINT the output didn't provide any fit indices. Can I ignore this message or what exactly does it mean? Thanks, Mario 


The message is telling you what to do if you are doing difference testing of two nested models. It does not sound like you are so you can ignore this message. 


Hi, I'm doing a path analysis with two continuous DVs and one binary DV with MLR estimation. All IVs are continuous. For the logit part, I get "reasonable" parameter estimates and their SEs, but the odds ratio estimates show the following: LOGISTIC REGRESSION ODDS RATIO RESULTS GYNTULEH ON VDR 0.000 ODR ********* What's wrong with the odds ratio estimates? For the binary DV, ca. 60 cases have value of 1 and ca. 170 cases have value of 0. Thanks a lot! Samuli 


Please send the output and your license number to support@statmodel.com. 

Leslie Roos posted on Thursday, June 27, 2013  7:18 am



Hello! I'm running a mediation model with 2 binary mediators and a nominal outcome. I'm interested in determining if there is a significant mediation for each of the mediators on each 2 of the levels of the nominal outcome variable vs. nominal reference group. My understanding is with nominal outcome variables & model constraint, it is not possible to obtain Indirect Odds ratios, however it is possible to obtain if the indirect effect is significant. What would be the best way to determine the significance of the indirect effect of the mediator at each level of the nominal outcome? Below is the relevant part of the model. I was attempting to constrain the nominal outcome (Jail2L) at the first 2 levels. MODEL: aax12Nas ON cdum1; (a1); asub ON cdum1; (a2); Jail2L@1 ON aax12Nas; (b1); Jail2L@1 ON asub; (b2); Jail2L@2 ON aax12Nas; (b10); Jail2L@2 ON asub; (b20); MODEL CONSTRAINT: NEW(eff1 eff2 eff10 eff20); eff1=a1*b1; eff2=a2*b2; eff10=a1*b10; eff20=a2*b20; Thank you! Leslie 

Leslie Roos posted on Thursday, June 27, 2013  7:36 am



Hi, I am now trying to use the '#' as a constraint and it seems to have worked. I wanted to check in that I am not violating specific assumptions in categorical / nominal variable modelling? Thank you again! Leslie aax12Nas ON cdum1 (a1); asub ON cdum1 (a2); Jail2L#1 ON aax12Nas (b1); Jail2L#2 ON aax12Nas (b10); Jail2L#1 ON asub (b2); Jail2L#2 ON asub (b20); MODEL CONSTRAINT: NEW(eff1 eff2 eff10 eff20); eff1=a1*b1; eff2=a2*b2; eff10=a1*b10; eff20=a2*b20; 


Indirect effects cannot be computed as products in your situation. See the following paper which is available on the website: Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus 


Hi! I’m examining a multiple mediator model through path analysis. I have variables at four time points and the model include three IV’s (at t1), three DV’s at t2 (1 binary and 2 continuous), three DV’s at t3 (1 binary and 2 continuous) and one final continuous DV at t4. I have used WLSMV with THETA and examined indirect effects using MODEL INDIRECT together with the BOOTSTRAP analysis command. I’ve got a response from a reviewer suggesting that I should use Monte Carlo Integration to attain a logit rather than a probit model, since “in a multivariate case these are superior”. I’m not sure in what ways this would be beneficial or the more correct way to examine my model. From what I have read on this forum, using WLSMV rather than MLR with INTEGRATION=MONTECARLO and MODEL CONSTRAINT, is the better option is you have categorical mediators. Is this correct? If yes, can you point me to any references supporting this? Also, if I were to do as the reviewer suggests, then I would not be able to have the residuals of my mediators be correlated (risk of having a misspecificed model). Is this an argument I can use to support the WLSMV approach? Thank you for your time! /Frida 


When you have a mediation model, ML becomes problematic if you have categorical mediators in that you would have to combine linear and nonlinear regressions. WLSMV instead works with a latent response variable mediator. I don't know what the reviewer means by logit being superior for the multivariate case. I would say the opposite because the probit with WLSMV is more flexible in the multivariate case  like you say correlated residuals. Although missing data can be an issue. See also our FAQ on estimator choices with categorical variables. 

Mark Thomas posted on Thursday, September 08, 2016  6:40 am



Hi Dr. Muthen, I am trying to run a path analysis with logistic regression with subgroup analyses (male vs. female). I have successfully ran separate models for the binary outcome with the whole sample, and continuous outcome for subgroup analyses. However, I encounter the following issue when I try to do logit with dichotomous variables. " ALGORITHM=INTEGRATION is not available for multiple group analysis. Try using the KNOWNCLASS option for TYPE=MIXTURE." My code: VARIABLE: NAMES ARE SEX AGE RACE EDU BMI SMK ALCHL KILO BPhy BVerb BAng BHost CMH BPAQ MAP EPI NOR SLOPE AUC ZMS DIMS; USEVARIABLES ARE BAng epi nor slope auc map diMS SEX AGE RACE EDU ; MISSING ARE .; CATEGORICAL IS DIMS; GROUPING IS SEX (0=Male, 1=Female); ANALYSIS: ESTIMATOR = ML; PROCESS = 2; BOOTSTRAP = 5000; INTEGRATION = MONTECARLO; Thank you. 


With ML and categorical outcomes, you must use KNOWNCLASS instead of GROUPING. When classes are known, this is the same as multiple group analysis. See Example 5.33 where this is shown for Bayes. Just use ML instead of Bayes. 

Back to top 