Path analysis with a mix of categoric... PreviousNext
Mplus Discussion > Structural Equation Modeling >
 Daniel Rodriguez posted on Thursday, January 08, 2004 - 10:02 am
I am analyzing a path analysis model with one endogenous variable (the primary outcome) being categorical (binary) and the other (a potential mediator) being continuous. Is the aim in my path analysis the same as it is when all outcomes are continuous, to reproduce as closely as possible the variance-covariance matrix?
 bmuthen posted on Thursday, January 08, 2004 - 10:09 am
In this case you are using a limited-information weighted least-squares where the model is fitted to sample-estimated multivariate probit thresholds, slopes, and residual correlations. Same number of quantities are used for the fitting but different ones from the continuous case of means, variances, and covariances.
 Daniel Rodriguez posted on Thursday, January 08, 2004 - 10:14 am
Thank you. I am planning to submit my paper for publication. What would you recommend presenting in the results section, a correlation matrix does not seem appropriate given the categorical outcome. Or are the fit indices plus a figure with standardized path coefficients sufficient?
 bmuthen posted on Thursday, January 08, 2004 - 10:17 am
I think what you suggest is good, but also add unstandardized coefficients and their SEs.
 Anonymous posted on Monday, December 13, 2004 - 7:26 am
I am experimenting with the examples in the Mplus guide (ex 3.12)that use binary outcomes as dependent/endogenous measures. I was wondering if there are any peer reviewed publications that have used this technique? I would like to use this in a paper that I am planning to send out for publication and I would like to cite some studies that do this. Thank you.
 bmuthen posted on Monday, December 13, 2004 - 8:28 am
You many want to look at the article by Xie on the Mplus web site, Reference section.
 Anonymous posted on Saturday, May 21, 2005 - 2:32 pm
Hi, there is an article regarding SEM of discrete data by Winship and Mare (1983) in the American Journal of Sociology ("Structural Equations and Path Analysis for Discrete Data."). If you are aware of it, could you please let me know the similaries/differences between their estimate and the MPlus estimation of discrete variables? Thanks!

 bmuthen posted on Saturday, May 21, 2005 - 4:19 pm
Yes, I am aware of this article, but I don't recall the content other than that I think a different tack was taken than in my work from the late 70's and early 80's. I should take another look at the article - do you have a pdf handy so I can refresh my memory of it?
 bmuthen posted on Sunday, May 22, 2005 - 2:47 pm
Thanks for sending the Winship-Mare (1983) article. Its relationship to the categorical variable modeling in Mplus is clear from examples like the one they start off with - social background (x, say) influencing the mediator of getting a college degree (y, say), influencing unemployment (z, say). As in the econometric literature, they make it clear that while x->y is naturally a logit or probit regression, the specification of the influence of the mediator on z may either use the 0/1 dummy variable y itself or an underlying continuous latent response variable y* (that gives rise to y=1 instead of y=0 when exceeding a threshold). They discuss ML estimation using probit for some simple special cases. The literature indicates that ML using y* gives a different loglikelihood than ML using y, and as pointed out in the article different applications naturally call for different choices.

Using the simple example above, Mplus makes the y* choice when using probit modeling and (limited-information) WLS type estimators that I developed in my 1978-1984 work on LISCOMP. More recently, the y choice has also been added to Mplus using logit with ML. We have plans to add the other combinations of using y/y*, probit/logit, and ML/WLS.

When all observed dependent variables are categorical, Mplus also offers the Feinberg-inspired "causal" modeling mentioned in the article. This is with multiple latent class variables as mentioned in Chapter 13 of the Version 3 User's Guide, where a latent class variable can be specified to be perfectly measured by an observed categorical variable (this feature also enables loglinear modeling of frequency table data, using the alternative loglinear parameterization).

Hope this answers your question.
 Anonymous posted on Wednesday, June 08, 2005 - 2:36 pm
I have two endogenous variables, Y1 is binary (0/1), and the other (Y2) has three categories, which are coded as 0/1/2:

Y1 ON Y2 X1 X2
Y2#1 Y2#2 ON X1 X3

In the second equation (Y2 is the dependent variable), I use Y2=0 as the comparison baseline, but how can I make Y2=0 as the comparison baseline in the first equation when Y2 is an independent variable?
 Linda K. Muthen posted on Wednesday, June 08, 2005 - 5:19 pm
A nominal variable cannot be a mediating variable in Mplus. It is unclear how to deal this situation because it would be treated as nominal when it is a dependent variable but would need to be treated as a set of dummy variables as an independent variable.
 Anonymous posted on Wednesday, June 08, 2005 - 6:43 pm
Thanks a lot. If there is an order between the three categories of Y2 (0-->1-->2), can I treat Y2 as ordered categorical (ordianal)? In another word, can an ordinal variable (ordered categorical) be a mediating variable in MPLUS?
Again, I want one of the three categories (e.g., "Y2=0") as the comparison baseline in both equations (i.e., in the equation when Y2 is the dependent variable and in the equation when Y2 is an independent variable). Thanks again!
 Linda K. Muthen posted on Thursday, June 09, 2005 - 8:18 am
Yes, ordinal variables can be used as mediating variables. With weighted least squares estimation, probit regression is obtained. With maximum likelihood estimation, logistic regression is obtained.

There is no reference category with ordinal dependent variables.
 Anonymous posted on Thursday, June 09, 2005 - 9:28 am
Thanks a lot. If there is no reference category with ordinal dependent variables, how I can interpret the results (for both cases: when the ordinal variable is an independent variable and when it is a dependent variable)? If an ordinal variable only has three categories, it may not be appropriate to see it as continuous.
 Anonymous posted on Thursday, June 09, 2005 - 4:19 pm
Please allow me to rephrase the above question. I now know how to interpret ordinal logit regression. And my three questions are:

(1) In the example above, if Y2 is ordinal, it is fine for it to be a dependent variable in the second equation (Y2 ON X1 X3), but how to treat it when it is an independent variable in the first equation (Y1 ON Y2 X1 X2)? Y2 has only three categories (0/1/2) so that I doubt we can treat it as a continuous independent variable, and I wonder if it is appropriate to generate two dummy variables corresponding to Y2=1 and Y2=2 to replace Y2 in the first equation (Y1 ON Y2 X1 X2).

Alternatively, can I do this in the following way? First, generate three dummy variables Y20, Y21, and Y22, corresponding to Y2=0, Y2=1, and Y2=2. Then, run equations:

Y1 ON Y21 Y22 X1 X2
Y21 Y22 ON X1 X3

The following questions are:
(2) Should I specify Y20 as CATEGORICAL as well?
(3) Should I add Y20 as a dependent variable in the second equation? If I add Y20 (or don’t add it), is there any technical problem?
 bmuthen posted on Thursday, June 09, 2005 - 6:28 pm
(1) The default option in Mplus is to use WLSMV which takes a probit approach in which case the equation where Y2 is an independent variable, an underlying continuous latent response variable Y2* is used to predict Y1. For a discussion of related matters see the Xie article on the Mplus web site and also the Winship-Mare 1983 article discussed earlier in this forum.

Don't create dummy Y2 variables if you want to treat it as ordinal. Stay with the approach I just mentioned.

(2)-(3) Not relevant given my answers above.
 Anonymous posted on Thursday, June 09, 2005 - 8:20 pm
Thanks a lot! Two more questions regarding interpretation:

(1) When the ordered categorical (ordinal) variable is an independent variable, since it is treated as a latent continuous variable, can I interpret its coefficient like interpreting a continuous variable?
(2) When the ordered categorical (ordinal) variable is a dependent variable, how to interpret the coefficients of independent variables in a meaningful way? (now the dependent variable is not a binary variable but has more than two categories)
 bmuthen posted on Friday, June 10, 2005 - 6:16 am
(1) Yes

(2) The slope can be interpreted the same as with a continuous dependent variable given that an ordinal dependent variable can be seen as a continuous latent response variable that exceeds thresholds to give the various outcome categories. There is therefore only one slope. See also Appendix 1 in the Technical Appendices on the Mplus web site.
 liesbeth mercken posted on Tuesday, March 28, 2006 - 1:22 am
I have several questions concerning a SEM analysis I am trying to run.
I have some categorical variables and some other variables with extreme missing values. I do not have any latent variables in my model.
The analysis always terminates normally but I don’t know if I can trust the results,
although they are very stable.

Type= missing;
Estimator= mlr;
Integration= montecarlo;
(mcseed I tried because the error said something about a starting value.. but it makes no difference)

In the output I see 2 warnings / errors:


190 603 119 134 611
Error B only appeared when I included 17 dummies in my model (for the 18 schools-this was originally a categorical variable)

After these errors the output states that the model estimation terminated normally.

1.What do these errors mean?
2.How can I correct for them?
3.Can I get fit indices like TLI CFI by putting H1 in the type command?


PS: I will attend to your course in Utrecht although it is not specifically what I am doing… can I bring my output also to the course?
 Linda K. Muthen posted on Tuesday, March 28, 2006 - 8:53 am
This error message is serious. Your model has a lot of parameters. I wonder how large your sample is. You need more observations than parameters. I can't say much more without more information. Please send your input, data, output, and license number to

I don't think H1 is available with the estimator you are using but I would need to see your entire model to answer that.

It is unlikely that we will have time to look at output at the Utrecht meeting. You should send your questions to support.
 Wendong Li posted on Sunday, February 21, 2010 - 3:51 am

I am running a model with all continuous variables but one categorical variable as a control. I am not interested in whether there is any difference among different categories of this variable and just want to partial out its impact. Can I just put it in the model, tell the program it is categorical and then regress the outcome variable on it?

Thanks in advance!
 Linda K. Muthen posted on Sunday, February 21, 2010 - 6:38 am
You can include it and regress the outcome variable on it but you should not put it on the CATEGORICAL list. This list is for dependent variables only. The model is estimated conditioned on the observed exogenous variables and their scale is not relevant.
 Wendong Li posted on Sunday, February 21, 2010 - 10:40 pm
Thanks a lot for your input, Linda!
 haxha posted on Monday, November 08, 2010 - 10:18 pm
Hi Linda, in page 496 of the manual version 5, you indicate that Bootstrap is not available for MLR due to the parameter estimates and bootstrap standard errors for these estimators not differing from those of ML. I am not quiet sure I understand this? Would this mean that that there is no need to use bootstrap with MLR because the results are identical? Sorry I'm a novice in Path Analysis and would like to use some type of simulation to ensure that my results are not biased. Any suggestions other than cross-validation and holdover sampling method?
 Linda K. Muthen posted on Tuesday, November 09, 2010 - 5:57 am
Yes, the parameter estimates are the same for ML, MLR, and MLF. So if you used bootstrapping with any one of these to obtain bootstrapped standard errors, the results would be identical. For simplicity we make bootstrap available only with ML.
 Regan posted on Saturday, November 20, 2010 - 12:07 am

Two questions:
1) After reading a response on the site, I wanted to clarify that if a binary 0/1 bvariable is included as a mediator, it does not have to be declared as categorical in the "CATEGORICAL ARE" command, ( I ask because I know this must be done for dependent outcome variables, however, in an SEM class, it was said that all variables with arrows going into them were considered dependent)

2)Are pvalues available for correlations obtained in Mplus? I see that users publish correlations with pvalues, but these do not seem to be part of the Mplus output.
 Linda K. Muthen posted on Saturday, November 20, 2010 - 7:49 am
1. A mediator is a dependent variable and if it is binary it should be placed on the CATEGORICAL list.

2. Not for sample correlations. You can estimate a model of correlations or covariances and obtain p-values in this way.
 Stacey Conchie posted on Thursday, February 24, 2011 - 2:15 am

This is a very basic question so apologies in advance.

I have a mediation model comprising 5 continuous latent constructs (1xIV; 1xM; 3xDV)and three control measures (the latter are directly observed). One of my control measures is binary (0=no/1=yes). Would I be correct in thinking that I don't enter the binary measure on the line 'CATEGORICAL ARE' as it's not the DV? When I don't specify this measure as categorical the model runs fine. However, when I do specify it as categorical I recieve a meassage that I must use montecarlo integration and that I cannot estimate indirect effects. Using this integration technique runs the data, but I don't get any fit indices for the model.


 Linda K. Muthen posted on Thursday, February 24, 2011 - 6:14 am
Only dependent variables should be placed on the CATEGORICAL list. Observed exogenous variables should not be put on this list.
 Mohamed Abou-Shouk posted on Sunday, April 24, 2011 - 6:46 am
If i have a binary outcome variable(1=adopt, and 0=nonadopt)and i want to measure the impact of factors affecting the adoption, these factors are continuous latent variables but are divided to four main factors; f1, f2, f3, and f4.
the question is: do i have to run Path analysis with categorical dependent variable as in example 3.12?
If yes, i have used ML to get logistic regression results, but the second question is, do these results target the adopt category or nonadopt? in other words, if i want to say that f1 affects positively u1? in this case is u1 adopt or non adopt category?
I hope you get my meaning.

 Linda K. Muthen posted on Sunday, April 24, 2011 - 10:44 am
Example 3.12 uses weighted least squares not maximum likelihood. This yields probit regression coefficients. For maximum likelihood and logistic regression, you must add ESTIMATOR= ML; to the ANALYSIS command. Note that if your factor have categorical factor indicators, this would result in four dimensions of integration.

U1 is adopt. It is the highest category.
 Arina Gertseva posted on Thursday, May 12, 2011 - 11:07 am
I am testing the extent to which delinquent peers and depression condition the relationship between victimization and offending (offending is a binary outcome (1=offending 0=no offending). I conducted analysis on the overall sample as well as on each gender group. The results are very similar, with an exception of a few parameter estimates. My question is: how appropriate is to conduct coefficient comparison test across gender groups without running a multiple group analysis?

If yes, which formula can I use? Thank you.
 Bengt O. Muthen posted on Thursday, May 12, 2011 - 6:36 pm
You get a statistical test of gender equality if you do a multiple-group analysis. I think that is highly desirable.
 Owis Eilayyan posted on Monday, February 27, 2012 - 11:29 am

I am using mplus 5 to run an ordinal regression and path analysis model. i have missing values, but less than 10%. The program excludes all observations that have missing values from the analysis. could you please tell me how can i insert all observations i have in my analysis.

 Linda K. Muthen posted on Monday, February 27, 2012 - 11:34 am
Version 5 has TYPE=MISSING as the default. Cases must be excluded that have missing on observed exogenous x variables. This is because missing data theory does not apply to these variables. If you mention their means, variances, or covariances in the MODEL command, they will be treated as dependent variables and distributional assumptions will be made about them. The subjects will not, however, be excluded from the analysis.
 Owis Eilayyan posted on Monday, February 27, 2012 - 11:58 am
Thanks for your response, but how can i mention their means. do you mean put then in brackets?
 Owis Eilayyan posted on Monday, February 27, 2012 - 12:44 pm
This is my command. could you please tell me how to mention the mean or the variance in the MODEL command.


TITLE: this is an example of a logistic
regression for a categorical observed
dependent variable with two covariates
DATA: FILE IS C:\Users\Admin\Desktop\MGH\path3.dat;
VARIABLE: NAMES ARE smo em sym pf ef sf re bmi age fev gender act5 actc;
USEVARIABLES ARE smo em sym pf ef sf re bmi age fev gender act5;
missing = .;
MODEL: act5 ON smo em sym pf ef sf re bmi age fev gender;
 Linda K. Muthen posted on Monday, February 27, 2012 - 1:31 pm

act5 ON smo em sym pf ef sf re bmi age fev gender;

smo em sym pf ef sf re bmi age fev gender;
 Owis Eilayyan posted on Monday, February 27, 2012 - 1:45 pm
Thanks a lot.

Usually i use SAS for statistical analysis. in SAS we can define the categorical variable (i.e reference group). Does Mplus do that?

 Linda K. Muthen posted on Monday, February 27, 2012 - 6:16 pm
You can use the DEFINE command to create a set of dummy variables for a categorical covariate.
 Owis Eilayyan posted on Monday, February 27, 2012 - 7:02 pm
Actually i have the categories already. but in the results interpretation, we should have reference group in categorical variables. for example, male are more likely to have the disease. so the reference group is male and the interpretation is for male not female.

 Linda K. Muthen posted on Tuesday, February 28, 2012 - 10:14 am
With a dummy variable, 0 is the reference group and 1 is the group for which the effect is relevant.
 Owis Eilayyan posted on Tuesday, February 28, 2012 - 6:31 pm
Thanks a lot Linda
 Owis Eilayyan posted on Wednesday, February 29, 2012 - 7:08 am
Hi Linda again,
i did as what you told me last time regarding analyse all available values. you told me to mention their means, variances, or covariances in the MODEL command. you gave me the command, but it is the covariances and the is no covariances between these exogenous variables. could you please tell me how can i mention just their means or variances.

 Linda K. Muthen posted on Wednesday, February 29, 2012 - 9:28 am
I gave you the command for the variances not the covariances. If you have further problems with this, please send your output and license number to
 chris pp posted on Thursday, April 05, 2012 - 9:59 pm

I'm running a SEM where one of the variables ('nage') is categorical (dichotomous) and is modelled as a predictor of one continous variable, and an outcome of another continuous variable.

There seems to be less output on the standardized results when this variable
is set as categorical (only showing StdYA under the STANDARDIZED MODEL RESULTS). Are there additional commands required to generate more output in this case?

The syntax is:

nage social selfr pa pop;



MODEL: sr BY selfr@0.78;


soc BY social@.72;


sr ON edu cfit;

soc ON sr pop cfit;

pa ON pop soc;

nage ON edu;

cfit ON edu;

pop ON nage edu;
 Linda K. Muthen posted on Friday, April 06, 2012 - 7:02 am
With the default WLSMV estimator, standard errors for the standardized solution are not available when the model has covariates.
 Aylin posted on Monday, June 25, 2012 - 8:39 am
Dear Professors,
I am running the following model: (using WLSMV estimator)

Usevariables are
vic IQ daw emot PL12 PL18 dep sex;
Categorical are
vic daw PL12 PL18 dep;

Parameterization = theta;

IQ with daw;
vic with emot;
vic on IQ daw;
emot on IQ daw;
dept with PL12 ;
vic emot daw IQ on sex;
dept PL12 on sex vic emot IQ daw;
PL18 on sex dept PL12 vic emot IQ daw;

So my questions are:
1) As I am using both categorical and continuous variables does that mean my path analysis consists of both linear and probit regressions? So "emo (continuous) on daw (categorical) = linear regression" and "vic (categorical) on IQ (continuous) = probit regqression" is this correct?

2) Then what are the correlations? for example "IQ (continuous) with daw (categorical)" is this a coveriance or a tetrachoric correlation?

Thank you very much.
 Linda K. Muthen posted on Monday, June 25, 2012 - 10:27 am
1. If the dependent variable is continuous, the regression coefficient is linear. If the dependent variable is categorical, the regression coefficient is probit.

2. Continuous with categorical is a biserial or polyserial correlation. Binary with binary is a tetrachoric correlation.
 Aylin posted on Monday, June 25, 2012 - 10:28 am
Thank you very much
 Ellen posted on Thursday, April 10, 2014 - 8:08 am
I am doing a path analysis that includes a dicotomous endogenous variable(Y1) and censored endogenous variables(Y2, Y3).
I want to test the relationship Y2 with Y3(non recursive model)

X1, X2 -> Y1(dicotomous)
Y1 -> Y2(censored)
Y1 -> Y3(censored)
Y2 -> Y3
Y3 -> Y2

But I get an error message that the latent variable covariance matrix(psi) is not positive definite.PROBLEM INVOLVING VARIABLE Y3.
I checked TECH 4, but I don't know what is wrong.
Could you help?
 Linda K. Muthen posted on Thursday, April 10, 2014 - 9:42 am
Please send the output and your license number to
 Marie Alice posted on Thursday, March 19, 2015 - 12:28 am
I am doing path analysis with a continuous outcome variable. My understanding is that when an independent variable is categorical, it is not an issue.

1. But when I have a mediator (parental aspiration: university and above=1) that is categorical, should I specify so using the command "categorical="?

2. When I have an independent variable, that is controlled by another variable, should I also specify it as categorical? (For example, in modelling single parent->income->child's achievement, I also model parent's education->single parent to control for selection bias. In such case, should I specify single parent as categorical?)

I ask, as I see that the coefficients change dramatically whether I specify the variable as categorical or not.
 Linda K. Muthen posted on Thursday, March 19, 2015 - 6:02 am
You should specify the scale of any endogenous dependent variable that you want to treat other than as continuous. The scale of observed exogenous variables should not be specified.
 Marie Alice posted on Thursday, March 19, 2015 - 6:39 pm
Thank you for response.
So for that path (pointing into a dichotomous variable), I should interpret the coefficient as a logit coefficient, whereas all other variables (pointing into continuous variables) are regression coefficients?
 Linda K. Muthen posted on Friday, March 20, 2015 - 8:28 am
With WLSMV, you obtain a probit regression coefficient for a categorical dependent variable. With ML, you obtain a logistic regression coefficient for a categorical dependent variable as the default. You can request a probit regression coefficient using the LINK option.

Linear regression coefficients are obtained for continuous dependent variables.
 thanoon younis posted on Monday, March 23, 2015 - 2:14 am
Hi Dr. Muthen
I want to ask you about path analysis with ordered categorical variables. I have 6 independent variables (ordered categorical) treated as exogenous variables and 2 dependent variables (endogenous var.) one of them is mediator. My question now how can i analysis this type of path analysis when the dependent variables are latent variables and the independent vars. are ordered categorical and can i use probit regression on indep. vars. and put a zero variables in latent vars.
many thanks in advance
 Bengt O. Muthen posted on Monday, March 23, 2015 - 1:55 pm
Don't put IVs on your categorical list; that should be done only to DVs.

Apart from that, your setting is standard SEM.
 thanoon younis posted on Monday, March 23, 2015 - 7:16 pm
Many thanks to you for your response i want to ask you what about latent variables now are a dependent variables so can i put a zero values for that?

 Bengt O. Muthen posted on Monday, March 23, 2015 - 8:38 pm
With latent variables as dependent variables you use regular linear regression even though the DV is latent. I don't know what you mean by putting a zero value for that.
 thanoon younis posted on Monday, March 23, 2015 - 9:42 pm
because and as you know the latent variables are unobserved so there is no values for these variables. can i put a zero's matrix for these variables.

 Bengt O. Muthen posted on Tuesday, March 24, 2015 - 8:49 am
What is your aim - what is it that you are trying to do?
 thanoon younis posted on Tuesday, March 24, 2015 - 8:48 pm
Thank you for your response
I want to estimate the parameters of path analysis using Bayesian inference but i cannot leave the latent variables without any values. Can i use zero's matrix for that?

 Bengt O. Muthen posted on Wednesday, March 25, 2015 - 7:34 am
I still don't understand what you are doing. If you want to estimate a model with latent variables you don't need to give the latent variables values.
 sh wong posted on Thursday, May 14, 2015 - 9:03 am
Hi, I'm trying to run path analysis with one dichotomous dependent variable and few continuous dependent variables similar to example 3.14.

1. May I know which part of output should I refer to for the path coefficients?

2. What kind of R-square is it and how can I interpret it?

3. Can I have a equation similar to running a multiple linear regression of continuous variables, such as u1 = coeff1(y1) + coeff2(y2) + coeff3 (x1) + coeff4 (x2) + coeff5 (x3) ?

Thank you very much.
 Bengt O. Muthen posted on Thursday, May 14, 2015 - 10:56 am
You can learn about these matters in our Topic 2 short course video and handout on our website. Short answers (but these Q & A's is not the way to learn about this topic).

1. The ON statements.

2. R-square for the continuous latent response variable u1* for the DV u1.

3. Yes for u1*.
 sh wong posted on Friday, May 15, 2015 - 8:44 am
Thanks very much Dr. Muthen. May I ask further for I'm a beginner in statistical modelling.

1. Should I base on the StdYX Estimate column under standardized model results section for the path coefficients ?

2. Base on the standardized model results, I can generate the equation with the StdYX Estimate:
u1* = coeff1(y1) + coeff2(y2) + coeff4 (x2).

But how can I find the corresponding coefficients when all the variables are included, i.e.

u1* = coeff1(y1) + coeff2(y2) + coeff3 (x1) + coeff4 (x2) + coeff5 (x3)

Which output setting should I make to generate the results? Please also correct me if my understanding on the above is wrong.

Thank you very much.
 Bengt O. Muthen posted on Friday, May 15, 2015 - 5:43 pm
1. Not for predicting u1*.

2. Mplus does not provide individual estimated u* values. Typically one doesn't look at them.

You may want to ask these types of questions on SEMNET.
 sh wong posted on Friday, May 15, 2015 - 7:47 pm
Thank you. I would post the questions on SEMNET then. Thanks.
 Bengt O. Muthen posted on Saturday, May 16, 2015 - 8:07 am
And you really want to study our Topic 2 course on our website because it gives many of the answers.
 sh wong posted on Sunday, May 17, 2015 - 9:39 am
Thanks Dr. Muthen. I've studied the topic 2 course and that helps cleared some of my questions. In Mplus, I understand that WLSMV is the default estimator. However, when I try using ML estimation which uses logit regression, no model fit indices of Chi-2, RMSEA, CFI, SRMR, etc could be found. Since I want to do a path analysis and to test the model fit, is it a must to use WLSMV? If so, how should I interpret WRMR as it seems some posts mentioned WRMR is experiential and not good for results reporting.

Also, I'm not sure if appropriate to ask here, but when I try using LISREL with ML estimation (with u1 marked as categorical), it can still give model fit indices of Chi-2, RMSEA, CFI, NNFI, SRMR, etc. Do you have any idea on the difference between Mplus and LISREL regarding the ML estimation in path analysis?

Thank you very much.
 Bengt O. Muthen posted on Monday, May 18, 2015 - 12:37 pm
With ML for categorical outcomes there are no generally accepted overall tests of model fit available because the model is not a covariance structure model as with continuous outcomes. Instead of fitting a covariance matrix, you work with raw data. but you can check TECH10 for testing of pairs of variables and you can also test neighboring, nested models. And you can specify a saturated model and then use Model Test to check if some coefficients are zero.

The difference between LISREL and Mplus in this regard is explained in Web Note 4, where I argue against the LISREL approach.
 Alissa Mahler posted on Tuesday, April 12, 2016 - 5:29 pm

I am running a cross-lag model with both a continuous and dichotomous observed variables, using the WLSMV estimator.

I am having some trouble deciding what coefficients to use and how to interpret the findings. For example:

R1 and R2 = dichotomous
E1 and E2 = continuous

So, my syntax would be something like:

R2 on R1 E1;
E2 on R1 E1;

I am assuming I can interpret the continuous regression as I would any standardized regression coefficient, but I am unclear how to interpret the regression with the dichotomous DV (R2 ON R1 E1). With a dichotomous DV I would typically use an odds ratio, so I do not really understand how to interpret the stdyx coefficients or what would be appropriate to report. Would it be possible to get an odds ratio instead? Any assistance would be greatly appreciated!
 Bengt O. Muthen posted on Wednesday, April 13, 2016 - 4:21 pm
WLSMV uses probit so odds ratios are essentially out of the picture.

ML uses logit so you can work with odds ratios. But you may get slower computations due to many dimensions of integration - try it.

With WLSMV you can use the standardized results and for binary outcomes they refer to a continuous latent response variable behind the binary observed.
 Lauren L posted on Thursday, June 02, 2016 - 2:06 pm
I am testing multiple mediation (estimating significance of indirect effects through mediators using bootstrapped effects) with one IV (continuous), 8 MV (7 continuous, 1 binary), one DV (continuous), and 3 covariates.

What is the most appropriate estimator, given the combination of continuous and binary MV's? There is missing data for one MV so I am hesitant to use WLSMV. When trying WLSMV though, I receive an error stating that I should use PARAMETERIZATION =THETA. Including this only gives me WRMR for model fit info.

Thank you.
 Linda K. Muthen posted on Thursday, June 02, 2016 - 2:12 pm
I would use use the MLR estimator if you are concerned with missing data handling.
 Lauren L posted on Thursday, June 02, 2016 - 2:23 pm
Thank you for the quick response! When I specify MLR and bootstrap in analysis, I receive the following message: BOOTSTRAP is not allowed with ALGORITHM=INTEGRATION. I'm unsure what I'm missing.
 Anna MacKinnon posted on Friday, June 03, 2016 - 2:29 pm
I would like to use path analysis to test whether a multicategorical variable (3 groups) moderates the relationship between several continuous (X1, X2, X3) and dichotomous (X4) predictors and one continuous outcome (Y).

Would this be the proper code to use to test this model and subsequent simple slopes to find out for which group the relationship between the Xs and Y is significant?

[Y] (b0);
Y ON X1 (b1);
Y ON X2 (b2);
Y ON X3 (b3);
Y ON X4 (b4);
Y ON WD1 (b5);
Y ON WD2 (b6);
Y ON X1WD1 (b7);
Y ON X1WD2 (b8);
Y ON X2WD1 (b9);
Y ON X2WD2 (b10);
Y ON X3WD1 (b11);
Y ON X3WD2 (b12);
Y ON X4WD1 (b13);
Y ON X4WD2 (b14);

SIMP_W1X1 = b1 + b7;
SIMP_W1X2 = b2 + b9;
SIMP_W1X3 = b3 + b11;
SIMP_W1X4 = b4 + 13;
SIMP_W2X1 = b1 + b8;
SIMP_W2X2 = b2 + b10;
SIMP_W2X3 = b3 + b12;
SIMP_W2X4 = b4 + b14;
SIMP_W3X1 = b1;
SIMP_W3X2 = b2;
SIMP_W3X3 = b3;
SIMP_W3X4 = b4;

Or should I use multiple group analysis? If yes, could you provide an example of the code that should be used.

Thank you kindly
 Bengt O. Muthen posted on Friday, June 03, 2016 - 5:32 pm
Lauren L:

Version 7.4 allows bootstrap and integration.

You should use Estimator = ML when you do bootstrapping.
 Bengt O. Muthen posted on Friday, June 03, 2016 - 5:35 pm
Anna MacKinnon:

It sounds you have a nominal mediator. This is an advanced mediation model discussed in Muthen (2011). See our Mediation page on our website.
 Lauren L posted on Monday, June 06, 2016 - 7:38 pm
Thank you for your help! I have another follow-up question. Again, I am running multiple mediation, and one of my mediators is categorical (using drug alone or with others). Do I need to specify that it is categorical? I am now using the ML estimator (as per your suggestion) with bootstrapping to measure the indirect effects. When I do specify that it is categorical, I receive an error message. My IV, DV, and other MV's are continuous.
 Anna MacKinnon posted on Monday, June 06, 2016 - 8:05 pm
Thank you for your response. I was actually hoping to test a moderation model, that is whether the relationship between the predictors and the outcome variable differs between 3 naturally occurring birth groups of participants (multi categorical moderator). I found this sample code online but was not sure if it was correct or whether multiple group analysis would be more appropriate? I was not able to find sample code for multiple group analysis for path analysis with observed variables. Any more direction would be greatly appreciated!
 Bengt O. Muthen posted on Tuesday, June 07, 2016 - 2:54 pm
Lauren L:

You should specify that the mediator is categorical. You can use Bootstrapping also with WLSMV. The optimal way to compute indirect and direct effects is using a counterfactual approach but it sounds like you have several mediators for which this is not automated in Mplus yet. If you argue that it is the latent response variable behind the categorical mediator that is the relevant mediator you can use WLSMV. Otherwise, see

Nguyen, T.Q., Webb-Vargas, Y., Koning, I.K. & Stuart, E.A. (2016). Causal mediation analysis with a binary outcome and multiple continuous or ordinal mediators: Simulations and application to an alcohol intervention. Structural Equation Modeling: A Multidisciplinary Journal, 23:3, 368-383 DOI: 10.1080/10705511.2015.1062730
 Bengt O. Muthen posted on Tuesday, June 07, 2016 - 2:57 pm

Ok, so you have a regression model rather than a mediation model. It is usually sufficient to simply create products of variables to represent interactions. See UG example 3.18.
 Carillon J Skrzynski posted on Monday, June 20, 2016 - 11:11 am

My understanding is that models with count variables can't generate model fit indices (CFI/TLI, RMSEA, SRMR)—given this, how does one establish adequate model fit (especially if one wants to compare XWITH moderation models)? I was told one could use R2, but I can’t find papers corroborating this or that identify what R2 value designates acceptable fit.

Somewhat related, is it acceptable to change count outcomes to binary outcomes and then run the model with different estimators to generate all desired information—that is, could one establish model fit using an WLSMV estimator and then get AIC/BIC values using an MLR estimator to compare the count outcome and binary outcome models? (i.e. if the count outcome model has smaller AIC/BIC values than a well-fitted binary outcome model, could one say the count outcome model had good fit?)

In the case that it doesn’t make sense to compare across models with different outcomes, is it acceptable to do this within the same outcome model? That is, establish model fit of the binary outcome baseline model with WLSMV output, and then use the AIC/BIC values from a MLR output to compare all binary outcome models?

Please let me know if these questions aren't clear. Otherwise, any insight you can provide or suggestions for non-technical papers which might shed light on this would be incredibly helpful.

Thank you,
 Carillon J Skrzynski posted on Monday, June 20, 2016 - 11:15 am
Also, my apologies if this was the wrong place to post these questions.
 Bengt O. Muthen posted on Monday, June 20, 2016 - 4:34 pm
It is pretty much a research question how to assess overall model fit with count outcomes - at least when latent variables are present. There is not an underlying structure for a continuous latent response variable vector that can be assessed like WLSMV does. There is TECH10 for counts but it is not always very useful given the many count categories. Checking fit for count = 0 vs > 0 is a good idea even if in the approximate way that you describe.

A general answer is to compare the logL for neighboring models and use that to compute chi-square tests. That is, add a slightly less restricted model.
 Carillon J Skrzynski posted on Tuesday, June 21, 2016 - 10:40 am
Thank you for your prompt reply, but unfortunately I'm not sure I fully understand you.

Are you saying my proposed method for comparing the different outcome models is feasible (get model fit indices for a binary model then use AIC/BIC values to compare to the count model)? Or using MLR to obtain the AIC/BIC values doesn't make sense (even if the model runs and generates the AIC/BIC values)? In which case, is that what you meant by using the chi-square test? (Do the test comparing the binary model to the count model?)

Additionally, say that in the end the count model is not as good a fit as the binary model--I'm still left wondering how I can generate both model fit indices and AIC/BIC values to assess how good my baseline binary model is as well as compare across non-nested models (which I believe the mediation model and the moderation model would be).

Again, I appreciate any help you can provide.
 Carillon J Skrzynski posted on Tuesday, June 21, 2016 - 10:46 am
*I realize I didn't specify this earlier, but I'm looking to compare across a baseline, mediation, and moderation model all using the same variables.
 Bengt O. Muthen posted on Tuesday, June 21, 2016 - 5:59 pm
I am just saying that checking the fit of a model that dichotomizes the count outcome tells you a part of the story of how well the full model (the model for the count outcome) fits. The BIC values are not comparable to those for models of the count outcome.

I mentioned the likelihood-ratio chi-square test for neighboring models as a way to check the appropriateness of your count model.
 Carillon J Skrzynski posted on Tuesday, June 21, 2016 - 7:03 pm
Okay, got it--thank you.

A last question then (sorry to be repetitive!): if I stick with just using the binary outcome model (dropping the count outcome model altogether since I cannot figure out how to assess model fit in this case), is it appropriate to use the two estimators to assess fit of a baseline model as well as get BIC/AIC values to then compare against (binary) mediation and moderation models?

Based on a few papers / some prior posts on this site, I'm leaning more and more towards believing this is reasonable, but I would be really grateful if you could confirm this.
 Bengt O. Muthen posted on Wednesday, June 22, 2016 - 5:11 pm
I see nothing wrong with reporting results from two estimators.
 Ivan Lim posted on Monday, August 22, 2016 - 2:34 am

I am a stata user but start to learn mplus in this summer. My apologies if this was the wrong place to post these questions.

In the data, the variables define as:

Y(3 alternatives categorical variable)
Y1 (first alternative of Y, dummy variable)
Y2( second alternative of Y, dummy variable)
M1 (ordinal)
M2 (ordinal)
X1 (categorical: binary)
X3(categorical: binary)

My model shown as below:
(Y <- M1 M2 X1 X2 X3 X4 L)
(M1 <- X1 X2 L)
(M2<- X1 X2 L)

L is a share latent variable for Y & M1 & M2

The syntax I used in stata is:
gsem (M1 <-i. X1 X2 L , ologit) ///
(M2 <- i.X1 X2 L , ologit) ///
(y1 <- ib1.M1 M2 ib1.M3 M4 L@1) ///
(y2 <- ib1.M1 M2 ib1.M3 M4 L) if nomiss==1 , var(L@1) mlogit

my syntax in mplus is

M1 on X1 X2 ;
M2 on X1 X2 ;
Y1 on M1 M2 M3 M4 ;
Y2 on M1 M2 M3 M4;

how can I include a share latent variable (observe or unobserved variable) in Mplus?
and, how to setting the variance as var(Latent@1) ?

I am a new Mplus user, so I apologize for my beginning level questions.
 Bengt O. Muthen posted on Monday, August 22, 2016 - 12:59 pm

L BY y1-m2*1;
 Ivan Lim posted on Monday, August 22, 2016 - 1:09 pm
Thanks a lot. So my syntax should be

M1 on X1 X2 ;
M2 on X1 X2 ;
Y1 on M1 M2 M3 M4 ;
Y2 on M1 M2 M3 M4;
L BY y1-m2*1;

1. am i right?
2. what is the meaning of *1 ?
3. should i including a indirect effect syntax since this is a mediation model?

again, apologize for my beginning level questions and not fully understand.
 Bengt O. Muthen posted on Monday, August 22, 2016 - 3:02 pm
1. Right. Make sure your variables are in the order y1, m1, m2 for y1-m2 to work as intended.

2. *1 is to free the first loading since you set the metric in the factor variance: f@1.

3. If you want to.

If you are a beginner you might benefit from studying the User's Guide, our short courses on our website, as well as our new book (which goes from introductory to advanced).
 Ivan Lim posted on Tuesday, August 23, 2016 - 6:03 am
Dear Dr. Muthen,

Thank you for your promptly reply and suggestion. Mplus is really efficiency comparing to stata.

One more question(should be the last), is mplus able to estimate the average marginal effect of IV?
 Bengt O. Muthen posted on Tuesday, August 23, 2016 - 11:11 am
Do you mean in a regression or do you mean for an indirect/direct effect in a mediation model?

Also, is your Y variable nominal or ordinal and why do you split it into 2 binary variables Y1 and Y2?
 Ivan Lim posted on Tuesday, August 23, 2016 - 11:43 am
1. for an direct effect in a mediation model

2. I just realized the mistake, Y is a three alternative categorical variable.
I should declare "nominal is Y" & modified my equation as Y#1 Y#2 on M1 M2 M3 M4
and, L by Y#1-m2*1
is this right?
 Bengt O. Muthen posted on Tuesday, August 23, 2016 - 2:02 pm
1. Do you mean average effect in the sense of Imai's ADE(treatment) and ADE(control)?

2. Just say Y ON...
and Mplus will give you all the intercepts and slopes for the different categories of Y and the different predictors. See UG ex 3.6.

Same for L: L BY Y etc.
 Ivan Lim posted on Wednesday, August 24, 2016 - 12:26 am
1. Yes, and i think i get some idea from your research with Dr. Asparouhov (Causal Effects in Mediation Modeling: An
Introduction With Applications to Latent Variables)

by the way, average marginal effect for regression is able to estimate in mplus?
 Bengt O. Muthen posted on Wednesday, August 24, 2016 - 3:16 pm
Imai's average effects don't come out automatically in Mplus. Instead, we give the effects conditional on specific values of the control variables. But you can use that to get the average effects. This was for instance done in the recent SEM article by Nguyen et al: Causal mediation with a binary outcome. She used Mplus.

By average marginal effects for regression, perhaps you are thinking of the expressions for the derivative of say the probability of the binary outcome. I have not found that particularly useful and it is not included in Mplus.
 Ivan Lim posted on Thursday, August 25, 2016 - 8:05 am
Dr Muthen,

1. Thank you, this is really useful.

2. In the example i showed above, i found the result have change depends on the estimator(ML/MLR/MLF). Any recommended estimator for this example?
 Bengt O. Muthen posted on Thursday, August 25, 2016 - 4:05 pm
I would go with MLR.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message