Categorical mediator PreviousNext
Mplus Discussion > Categorical Data Modeling >
 Christian Niyonkuru posted on Friday, March 21, 2008 - 9:17 am
I'm trying to fit 2 different models, in model I have an ordinal (3 categories) as a mediating variable and the response is binary. Is it right that the mediating variable should be (or treated as) continuous? In one my second model, both the mediating and response variable are binary. Please let me know.

 Linda K. Muthen posted on Saturday, March 22, 2008 - 9:47 am
In Mplus, with logistic regression a categorical mediator is treated as a continuous variable. In probit regession, a categorical mediator is treated as an underlying latent response variable.
 Christian Niyonkuru posted on Tuesday, March 25, 2008 - 7:03 am
Thank you for your quick reply. If I specify the mediator as categorical using the categorical command, it will be treated as categorical (ordinal). Am I right about this?
 Linda K. Muthen posted on Tuesday, March 25, 2008 - 9:20 am
No, in Mplus, with logistic regression and maximum likelihood estimation, a categorical mediator is treated as a continuous variable even though it is declared categorical. In probit regession with weighted least squares estimation, a categorical mediator is treated as an underlying latent response variable.
 Christian Niyonkuru posted on Monday, April 14, 2008 - 8:55 am
Thanks for your response. Another question: I'm trying to use the PLOT command to get the estimated probabilities but all I get is scatterplots and histograms. Could you give me an example say when your dependent (u1) is binary and 2 independent are continuous (x1 and x2)? Thanks.
 Bengt O. Muthen posted on Monday, April 14, 2008 - 9:03 am
In regression with a binary dependent variable, Mplus plot command does not give you estimated probabilities for each individual.
 Angela Wolff posted on Tuesday, June 10, 2008 - 3:00 pm
I am looking at mediator variable and receive this warning....

There is an indirect effect involving a path between the following
variables, but no indirect or direct path exists in the model.
Indirect effect: EE IND INDCON

did I specify something incorrectly? the model is wtih categorical DV.

 Linda K. Muthen posted on Tuesday, June 10, 2008 - 3:05 pm
It seems you are specifying an indirect effect that does not exist in the model. If you can't see the problem, send the full output and your license number to
 Angela Wolff posted on Wednesday, July 23, 2008 - 6:55 am
THanks, I managed to identify my misspecification.
 james rosenthal posted on Saturday, October 25, 2008 - 12:15 pm

I am new to Mplus.

I have a path model with two ordered categorical variables, so defined in Mplus. One is purely independent and the other has a mediating role in the model (is both dependent and independent).

Where the ordered categorical variable is purely independent, does Mplus treat it as indicative of any underlying latent variable or does it treat it as, in effect, a numeric variable? Suppose, for instance, the variable is coded: 0,1,2,3,4. Is the coefficient for its effect on a dependent simply that of a similarly coded numeric variable, or is it something different?

Now, suppose the variable is a mediating variable. Regarding its effects on other variables my questions are the same as those in the prior paragraph. In other words, is there any difference from the purely independent situation?

Second, consider the (mediating) variable in its role as a dependent variable. Is it simply a numeric variable or have we now shifted to a probit model and, thus, interpretation has shifted accordingly (for instance, variability would determined by the conventions of probit analysis).

Thank-you for your help.

Jim Rosenthal
 Bengt O. Muthen posted on Saturday, October 25, 2008 - 12:24 pm
For both the IV and DV situation you have a choice between (1) working with the observed variable, treating the scores as continuous, or (2) working with an underlying continuous latent response variable that relates to the observed via thresholds.

For the IV case, if you don't put the variable on the categorical list, (1) will be used, and if you put it on the categ list, (2) will be used. I tend to prefer (1) if I think this results in approximately linear regressions.

For the DV case, (1) will be used when applying ML. When applying WLSMV, (2) will be used. The choice doesn't matter when considering the DV regressed on its predictors because then (1) and (2) are equivalent. The choice does matter a bit when considering the equation where the DV is a predictor. The choice depends on the substance of the application.
 yang posted on Tuesday, November 11, 2008 - 7:10 am
Dear Linda,

I have y,m1,m2,m3 and x all binary. I also have covariates z. I want to estimate the indirect effect of x on y through m1,m2 and m3. I would like the indirect effect to have odds ratio interpretation. What is the code for doing this? I was trying the following. I appreciate your help.


y on m1 z;
y on m2 z;
y on m3 z;
m1 m2 m3 on x z;

model indirect:
y ind m1 x;
y ind m2 x;
y ind m2 x;
 Matthew McBee posted on Tuesday, November 11, 2008 - 1:22 pm
I'd like to follow up to Linda's post on March 25, 2008:

"No, in Mplus, with logistic regression and maximum likelihood estimation, a categorical mediator is treated as a continuous variable even though it is declared categorical. In probit regession with weighted least squares estimation, a categorical mediator is treated as an underlying latent response variable."

I am struggling with this. Imagine a system where X -> Y -> Z and Y is categorical. If the intervening variable is categorical, then the indirect effect can only be transmitted from X to Z if Y changes state in response to X. Otherwise no indirect effect can be transmitted from X to Z. It would seem like any calculation of the indirect effect from X to Z through Y would have to be weighted by the probability of Y changing state.
 Linda K. Muthen posted on Wednesday, November 12, 2008 - 8:31 am
Yang: I think this only makes sense in the probit case with the weighted least squares estimator. With probit regression, there is not a constant odds ratio.

Matthew: The quote refers to how the mediator is treated in its role as a covariate.

When a mediator is categorical, I believe only the weighted least squares approach yields meaningful results.
 yang posted on Thursday, November 13, 2008 - 1:27 pm
Thank you Linda. I have a follow up question.

If I would like to create a latent variable that summarizes m1,m2 and m3

h by m1 m2 m3

1) what is the estimation technique used for
h on x z
y on h z

I am not sure about the first one. In the second, I would think it will be logit or probit depending on my choice of estimator.

2) what is the parameter being estimated using the following code (which is the indirect effect of x through h)
y ind h x

 Linda K. Muthen posted on Thursday, November 13, 2008 - 2:36 pm
1. h is a continuous latent variable so a linear regression is estimated for h ON x z. If y is binary, a probit regression is estimated if the weighted least squares estimator is used. If the maximum likelihood estimator is used, a logistic regression coefficient is estimated as the default. A probit link is also available.

2. I would only do this with weighted least squares. This product is a linear regression coefficient of the indirect effect x on y* where y* is the continuous latent response variable underlying y. See the MacKinnon et al. article in Clinical Trials for further information which is on our website.
 Oliver Arranz-Becker posted on Monday, November 17, 2008 - 7:34 am
we have a model with a binary exogenous predictor (X), several mediators (M, some binary, some continuous) and a binary DV (Y). We used a ML estimation, declaring all binary mediators and the DV as categorical. Now we have to questions:

1. We are interested in the residual covariances between all mediators. However, we get a error message ("Covariances for categorical, censored, count or nominal variables with other observed variables are not defined.") when using the corresponding WITH syntax. What can we do about this?

2. We compared two models: (a) X and M are predictors of Y (like in standard logistic regression), and (b) a "causal chain" X -> M -> Y (with an additional direct effect of X on Y). We observed that the effects of M on Y are identical in both models. Could you explain the reason for this?

Thanks in advance for your help,
 Linda K. Muthen posted on Monday, November 17, 2008 - 8:48 am
1. See Example 7.16 for a way to specify a residual covariances with categorical indicators and maximum likelihood estimation. Note that in this case, each residual covariance is one dimension of integration. If you want a lot of residual covariances, you should use weighted least squares estimation.

2. Please send the two outputs and your license number to
 james rosenthal posted on Monday, May 17, 2010 - 8:42 am
Kindly consider this path model :

Use variables are:
catTime01 catTime12
numTime1 numTime2;
categorical are catTime12;
numTime1 on catTime01;
numTime2 on numTime1 catTime12;
catTime12 on numTime1 catTime01;

catTime1 and catTime2 take on the values of 0,1, or 2. numTime1 and numTime2 are quantitative variables. This analysis (one quite similar) was carried out using the MLR estimator:

Consider the regression on numTime2. In its role as a predictor in this regression, is catTime12 treated as a numeric variable – in other words, is it treated in the same way as catTime01 is treated in the regression on numTime1. Or, on the other hand, is it treated as some kind of latent variable (say as a logistically distributed variable with its variance defined in some particular way)? This comes down to how I interpret the coefficient for catTime12 – if catTime12 is treated as numeric then the interpretation is: as catTime12 increases by 1 unit – either from 0 to 1 or from 1 to 2 – then numTime2 changes by X units. But if it is treated as a latent variable then I need assistance with interpretation.

I apologize for my confusion. I posted a similar question under the categorical mediator thread on October 25, 2008 and received a response but thought I should double check this. See also March 25, 2008 responses on this same thread.


Jim Rosenthal
 Linda K. Muthen posted on Tuesday, May 18, 2010 - 9:07 am
I see no regression where numTime2 is a covariate. Please restate your question.
 james rosenthal posted on Tuesday, May 18, 2010 - 3:27 pm

Sorry if I am not being clear.

Suppose the following:

One has several regressions in their model.

One uses the MLR estimator.

Variable Z takes on the values of 0 or 1.

Z is defined to be categorical.

In one regression, Y is regressed on Z.

In this regression, the regression coefficient for Z is .5

Would the interpretation of this coefficient be: when Z has the value of 1, the predicted value of Y is .5 units higher than when Z has the value of 0? (In other words, in this regression, is Z treated as a numeric variable?)


 Linda K. Muthen posted on Wednesday, May 19, 2010 - 11:15 am
Is z a mediator or an exogenous variable? I think it is a mediator because you say it is defined as categorical but I want to be certain.
 james rosenthal posted on Friday, May 21, 2010 - 11:27 am

I am interested both when Z is purely exogenous and when it is a mediator.

In my actual model, which models several waves of data, Z takes on both of these roles. even though in the first wave of data Z is exogenous, i nevertheless defined it as categorical. (i didn't think it would do any harm to define a purely exogenous variable as categorical, but tell me if I am wrong.) My actual model is much like the following where Z1 represents Z at time 1 and Z2 does so at time 2 (and Q1 and Q2 are continuuous variables at times 1 and 2):

Use variables are:
Z1 Z2
Q1 Q2;
categorical are Z1 Z2;
Q1 on Z1;
Q2 on Q1 Z2;
Z2 on Q1 Z1;

Thank-you again for your help.

 Linda K. Muthen posted on Saturday, May 22, 2010 - 8:54 am
In regression, covariates can be binary or continuous. In both cases, they are treated as continuous. The binary variable z1 is exogenous. The regression coefficent is the difference in going from category 0 to category 1. Note that z1 is not a mediator and should not be on the categorical list. The binary covariate z2 is a mediator. It is treated as a continuous observed variable using maximum likelihood estimation and a continuous latent resposnse variable using weighted least squares estimation.
 gibbon lab posted on Tuesday, April 12, 2011 - 11:07 am
Hi Dr. Muthen,

If I have such a path analysis:
z on y;
y on x;
where z and x are continuous and y is a binary variable (0/1). I declared that y is categorical and used the weighted least squares estimation under THETA parameterization. I am not sure how I should interpret the indirect effect from x to z. Suppose the Mplus output provides a beta coefficient=0.05 for the indirect path x->y->z. Is the 0.05 interpreted as the amount of increase in y if x increases by 1 unit? Thanks.
 gibbon lab posted on Tuesday, April 12, 2011 - 11:10 am
Sorry. There is a typo in my last sentence in the above post. It should be "Is the 0.05 interpreted as the amount of increase in z if x increases by 1 unit?" Thanks.
 Linda K. Muthen posted on Tuesday, April 12, 2011 - 1:39 pm
This sounds correct.
 gibbon lab posted on Thursday, April 21, 2011 - 11:22 am
Dr. Muthen,

Can you help me understand the interpretation of the indirect effect thru the two direct paths? Here is my thought, if the beta coef=0.5 for the first path x->y. Under THETA parameterization, the link is probit. So the interpretation is that the probability of y=1 increases by 0.5 under probit scale when x increases by 1 unit.

Now if the coef=0.1 for the second path y->z, the interpretation would be that z increases by 0.1 if y changes from 0 to 1 since this is just regular linear regression.

Do we get the beta coef for indirect path by 0.5*0.1=0.05? It looks strange to me because x only affect the probability of y=1 and the effect of y on z (0.1) has nothing to do with that probability. Thanks.
 Bengt O. Muthen posted on Thursday, April 21, 2011 - 1:09 pm
I think y is your mediator, and it is binary. With WLSMV you then have a probit regression of y on x. This is linear regression of y* (the cont's latent response variable) on x. Then you have y->z, where z is continuous, which is translated to z on y* in Mplus, also a linear regression. Note that it is not z on y.

To answer your first paragraph, it is not the probability that increases 0.5 but the probit, that is, y*. That then has to be translated into a probability change.

To answer your second paragraph, it is not a matter of y changing from 0 to 1, but y* changing.

The indirect effect is 0.5*0.1 but interpreted as above.

You should also read:

MacKinnon, D.P., Lockwood, C.M., Brown, C.H., Wang, W., & Hoffman, J.M. (2007). The intermediate endpoint effect in logistic and probit regression. Clinical Trials, 4, 499-513.

which is on our web site under Papers, Mediational Modeling.
 Xu, Man posted on Monday, January 16, 2012 - 10:45 am
Dear Dr. Muthen,

Thank you so much for your recent help on TSCORE and missing data. It has been very very helpful to me! I now have resolved my analysis regarding these issues. However, a new dilema arises while I try to look at the mediation effect of an ordinal variable on the intercept and slope parameter of the second order growth curve model. And this growth curve was as you know already fitted using the TSCORE fuction. Now I have strated looking at how predicators earlier in life can predict cognitive decline as modelled with the growth curve. This involves mediating variables, and one of them is ordinal (education attainment).

I noticed that in the model output that regression paths of the earliest variables to this ordinal mediator had logit links, and there were linear regression paths from this ordinal mediator to the later outcomes (the slope and the intercept of the growth curve).

I don't get the mediation effect from the output as it was noted that MODEL INDIRECT is not possible in the case of TYPE=RANDOM.

I was just wondering, how do you think I should specify the model in order to calculate the mediation effects here?

Here are some estimation specifiaiton that the programme has asked for:

The estimator showed in output was mlr. Link was logit.

Thanks a lot!

Best wishes,

 Linda K. Muthen posted on Wednesday, January 18, 2012 - 3:55 pm
You can treat you mediator as continuous and use MODEL CONSTRAINT to create the indirect effect as the product of the two linear regression coefficients. For another option, see the following paper which is available on the website:

Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus.
 Xu, Man posted on Saturday, January 21, 2012 - 7:08 am
Thank you! I will first try to have a read at this paper.


 Jan Zirk posted on Monday, July 16, 2012 - 8:26 am
Dear Linda or Bengt,
I have two competing models: M1) P on S S on G G on F W on F
M2) P on S S on G F on G W on F
G is a binary group variable.
First I estimated the two models with MLR and found that model 1) when G
defined as categorical performs better on AIC than 2)
in which G is not defined as categorical variable (because here G is only an independent binary variable).
Next I estimated them with Bayesian approach and again 1) fits better (in M1 PPP=.125 vs. in M2 PPP=.000), but there
is a difference in the number of free parameters between these 2 models (M1: 11, M2: 12).
Because of the categorical mediator in M1, DIC for this model is not available.
Is comparison of this 2 models based only on 95%CI_chi2 and PPP correct, despite they have different number
of free parameters?
 Linda K. Muthen posted on Tuesday, July 17, 2012 - 11:17 am
The variable g is dependent in one model and independent in the other. As a result the scales of the AIC's are different.
 Jan Zirk posted on Tuesday, July 17, 2012 - 11:55 am
Actually, I meant to report in the paper only on the Bayesian analyses because of small n and hence my question if comparison of PPP is enough to making preference of model 1 over model 2?
 Linda K. Muthen posted on Tuesday, July 17, 2012 - 6:06 pm
You can compare two non-nested models using PPP. However, there is no way to test if the two PPP's are significantly different from each other.
 Jan Zirk posted on Tuesday, July 17, 2012 - 6:23 pm
Ooh, I see; that is great :-) Thanks very much, this is what I needed.

All the best,
 Sarah Whittle posted on Saturday, January 05, 2013 - 3:56 pm

I apologise if this is a silly question, but I am fairly new to MPlus.

I am running a mediation model with a binary mediator variable (and 3 covariates, C1-C3).

My syntax is as follows:

bootstrap = 5000;
estimator = WLSMV;
M on X C1 C2 C3;
Y on X M C1 C2 C3;

Running this, the effect of M on X is highly non-significant.

However, I know that M and X are significantly related. That is, if I just run the regression:

M on X C1 C2 C3;

the effect of M on X is highly significant.

How is this possible??

 Bengt O. Muthen posted on Saturday, January 05, 2013 - 4:33 pm
Perhaps you get different sample sizes in the two analyses. You should get the same result running the model with M as the only DV or with M and Y since the latter model is just-identified.
 Sarah Whittle posted on Saturday, January 05, 2013 - 6:19 pm
Thanks for your quick reply!

Both models have exactly the same sample size.

I think I may have figured out the cause of the problem (maybe). My x variable is highly skewed. When I converted it to a binary variable and re-ran both analyses (above) I get the same results (i.e., M on X significant in both cases).

Do you think skewness was the problem here?

 Bengt O. Muthen posted on Saturday, January 05, 2013 - 9:31 pm
To see what's happening, you need to send both of your original 2 runs to
 Sarah Whittle posted on Saturday, January 05, 2013 - 10:28 pm
Will do. Thanks.
 Selahadin Ibrahim posted on Tuesday, January 08, 2013 - 1:47 pm
Hi Dr. Muthen,

I have a question about some unusual results I observed when modeling multiple mediators.

My exposure is a five-category ordinal variable (age), my outcome is a binary variable (claim status), and my two mediators are binary (diabetes and depression).

I get the exact same indirect estimates for each mediator when I include them in one model (multiple mediator model) as I do when I put each mediator in a separate model (two single mediator models).

I would have expected the estimates to be different. Do you find this result unusual? If not, could you help me understand why these indirect estimates are consistent across these two models?

Thanks for considering this question,
 Bengt O. Muthen posted on Tuesday, January 08, 2013 - 3:13 pm
Is your two-mediator model unrestricted (zero df); are your mediators uncorrelated?
 Selahadin Ibrahim posted on Wednesday, January 09, 2013 - 5:23 am
Thanks Bengt. The df of the multiple mediator model is one. we did not correlate the errors of the mediators.
 Bengt O. Muthen posted on Wednesday, January 09, 2013 - 2:30 pm
How does the comparison come out when you correlate the errors?
 Selahadin Ibrahim posted on Thursday, January 10, 2013 - 6:29 am
Thanks Bengt. The results are identical when I correlate the errors.
 David posted on Saturday, March 29, 2014 - 3:33 am
Dear Dr. Muthen,

I have a model with two binary response variables, one of them acting as mediator. Suppose we have

Y1 on Y2 X
Y2 on X

where X is centered at its mean.

Since I have used WLSMV with THETA parameterization coefficients are in the probit scale but I would like to compare and calculate the probability of Y1 when Y2 (instead of Y2*) goes from 0 to 1. Is there a method for obtaining such probablities? I have thought of averaging Y1 probabilities over Y2*<Threshold for Y2=0 and over Y2* greater than Threshold for Y2=1. Then such averaged probabilities could be compared. Is this ok or should I do it in a different manner?

Thank you,
 Bengt O. Muthen posted on Sunday, March 30, 2014 - 4:53 pm
This would be calculated using a bivariate normal distribution function, and it is not directly available via Mplus output. I don't see how you would get numbers to be able to average over Y2* being lower/higher than the threshold. The reason for the complication is that your modeling does not center on Y2 but Y2*. For probability-oriented mediation modeling, see the case of a binary mediator and a binary outcome discussed in the paper on our website:

Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus. Click here to view the Technical appendix that goes with this paper and click here for the Mplus input appendix. Click here to view Mplus inputs, data, and outputs used in this paper.
 David posted on Wednesday, April 02, 2014 - 12:58 am
Thank you for your kind response,

I decided to use the bivariate normal distribution function obtained as:

f(y1*|y2*)f(y2*) = f(y1*,y2*)

My model can be represented as:

y1* = k1 + by2* + e1
y2* = k2 + e2

Where e1 and e2 are both Normal(0,1) and uncorrelated. Constants k1, k2 are just the linear predictors (excluding the endogenous variable y2* in the first equation) evaluated at the mean of exogenous covariates.

Once I got the joint distribution f(y1*,y2*) I integrated it from threshold1-k1-by2* to infinity with respect to y1* and then with respect to y2* from threshold-k2 to infinity for obtaining P(Y1=1,Y2=1)

Does this makes sense to you?

I have also looked at the 2011 paper which I find interesting but would need more time to study it more in depth.

Thank you very much,
 Bengt O. Muthen posted on Wednesday, April 02, 2014 - 5:03 pm
This makes sense. It sounds like you have the technical background to do this.
 David posted on Wednesday, April 02, 2014 - 8:57 pm
Dear Dr. Muthen,

Thanks for the support, this discussion board has been of great help.
 David posted on Sunday, April 06, 2014 - 8:53 pm
Dear Dr. Muthen,

I have noticed that if f(y1*,y2*) is used for obtainig probabilites, the limits of integration should include thresholds but not the linear predictor part (i.e. k1+by2* in the y1* equation, k2 in the y2* equation).

It seems to me that this is so because y1* and y2* when transformed (from the bivariate vector e1,e2 to y1*,y2*) have been already centered at k1+by2* and at k2, respectively. If the standard bivariate normal distribution were used then the integration limits that included the linear predictor part gave the same computed probabilities.
 Bengt O. Muthen posted on Monday, April 07, 2014 - 4:35 pm
If you are working with f(y1*,y2*) you just get the means, variances, and covariance for these 2 variables as estimated by the model (taking X into account) and get the bivariate probabilities using these 4 quantities together with the 2 thresholds.
 David posted on Thursday, April 10, 2014 - 3:33 pm
Dear Dr Muthen,

I randomly generated two correlated errors (r=-0.5). Then I also generated an extra uniform variable "u". The latent variables y1*, y2* were generated as:

y1* = 0.6y2* + e1
y2* = 0.8u + e2

I also generated the binary variables with artitrary thresholds:

y1 = 1 when y1*>0.3 and zero otherwise
y2 = 1 when y2*>-0.1 and zero otherwise

Then I run the model on MPlus:

y1 y2;
y1 ON y2;
y2 ON u;
y1 WITH y2;

I was able to reproduce the theorical model very closely as indicated by the estimated equations. Also, The calculated probabilites reproduced the raw probabilities in the data remarkably well.

The question I have relates to reporting probabilities without the endogeneity induced by the correlated errors. When errors in this model are negatively correlated, not specifying correlated errors results in a dramatically biased coefficient of y2. In some instances, the bias can yield a significant coefficient with the opposite sign.

Do you have any suggestion for reporting the effect of y2 on y1 without the endogeneity problem but using probabilities instead of the probit scaled coefficient from the structural equations?

 Bengt O. Muthen posted on Friday, April 11, 2014 - 11:24 am
Are you saying that you don't get any identification warning when you say:

y1 y2;
y1 ON y2;
y2 ON u;
y1 WITH y2;

In a typical situation where y1 is also regressed on u, the residual covariance y1 WITH Y2 makes the model non-identified. Instrumental variable estimation could be tried instead to avoid bias.

Regarding reporting the effect of y2 on y1 using probabilities, I would recommend what I write about in my 2011 paper.
 David posted on Tuesday, April 15, 2014 - 9:23 pm
Dear Dr. Muthen,

The model is just identified, I was using "u" as an instrument so I did not included it in the y1 equation.

I will get into the 2011 paper,
 Leakhena Heng posted on Monday, February 23, 2015 - 8:25 pm
Hello all,

I am new to MPLUS and need some help figuring out what type of analyses/model to run in MPLUS.

I have three waves of data.

w2ipvg2, categorical variable with two level (IPV victimization experiences wave 2: yes and no)
w3_ipvg2, categorical variable with four levels (IPV experiences wave 3: none, victimization, perpetration, reciprocal)
depw3, continuous variable (depression wave 3)
subtuse, continuous variable (substance use wave 3)
edupath, ordinal variable with six levels (educational paths wave 4: no high school, high school, vocational training, some college, bachelor's degree, graduate degree)
empp1, ordinal variable with three levels (employment paths wave 4: unemployed, part time employed, full time employed)

The wave 3 variables: w3_ipvg2, depw3, and subtuse are mediators. Some of my categorical variables are ordinal and some are not. What type of analyses/model should I use, and where can I find the syntax for it. Is it possible to run something like this in MPLUS? Are there articles I can read to help me understand this type of modeling?

Thanks for any help.
 Bengt O. Muthen posted on Tuesday, February 24, 2015 - 7:44 am
Mplus can analyze longitudinal data on repeated measures in many different ways and Mplus can do mediation modeling. But it is not clear how you intend to relate your repeated measures over time, or how the mediation relates to an ultimate outcome. You may want to ask this type of general research strategy question on SEMNET.
 Leakhena Heng posted on Tuesday, February 24, 2015 - 9:44 am
Thank you, Dr. Muthen. I will search/ask on SEMNET.
 julie schiro posted on Sunday, December 06, 2015 - 4:37 pm
I'm running a mediation model where X and Y are continuous and M is dichotomous (-1,1).

m on x; y on x m;
y via m x;

I ran this model twice, once without specifying that M is categorical and once specifying: CATEGORICAL ARE m. This change results in different p-values for parameters in the model Y on X M. Why? Isn't Y on X M equivalent to an ANOVA where x is continuous and m is categorical, i.e., y = b0 + b1 (X) + B1 (m)? If so, then why is the p-value different for X and M different depending on the "categorical" specification?

M treated as continuous

Estimate S.E. Est./S.E. P-Value
0.229 0.041 5.646 0.000

Estimate S.E. Est./S.E. P-Value
0.132 0.075 1.760 0.078
0.285 0.103 2.781 0.005

M treated as dichotomous:

M ON X (makes sense)
Estimate S.E. Est./S.E. P-Value
0.332 0.075 4.398 0.000

Y ON X M (does not make sense)
Estimate S.E. Est./S.E. P-Value
0.084 0.085 0.987 0.324
0.340 0.123 2.771 0.006
 julie schiro posted on Sunday, December 06, 2015 - 5:26 pm
RE my last post, I think I figured it out, but would you confirm? When m is treated as continuous, the estimator is ML. When m is treated as categorical, the estimator is WLSMV. When I tried to constrain this model to ML, it would not estimate the indirect path:

bootstrap = 5000;
m on x; y on x m;
y via m x;

Hence, I'm assuming that I need to use WLSMV to estimate an indirect path where m is categorical. This also explains why a, b, and c' are different.

A follow-up question, then: when I report the parameters in my paper, do I need to report the WLSMV estimates for the a, b, and c' paths, or can I report the ML estimates which will be easier for me to explain? The ab path would still use WLSMV. Is this proper to do?
 Bengt O. Muthen posted on Sunday, December 06, 2015 - 6:12 pm
WLSMV treats the mediator as the continuous latent response variable M* behind the observed binary mediator M. That affects the regression of the outcome on the mediator in that WLSMV regresses Y ON M*, whereas the ML run you did regresses Y ON M.

Although WLSMV produces defensible effects, a better way is to use the new ML features referred to as counterfactually-defined causal effects. When you use ML together with

Model Indirect:


and declare M as categorical you will get these casual effects. The indirect effect is not as simple as the usual product of a and b. The description of this, including how to use Mplus, is given in the paper on our website:

Muthén, B. & Asparouhov, T. (2015). Causal effects in mediation modeling: An introduction with applications to latent variables. Structural Equation Modeling: A Multidisciplinary Journal, 22(1), 12-23. DOI:10.1080/10705511.2014.935843
 julie schiro posted on Sunday, December 06, 2015 - 6:39 pm
Thank you for the prompt reply!! I look forward to reading the paper. Thank you.
 Bengt O. Muthen posted on Monday, December 07, 2015 - 12:47 pm
I should have said that the causal effects are produced when saying

 Anonymous posted on Monday, February 15, 2016 - 1:26 am
Dear Profs. Muthèn,
I specified a 2-1-1 MSEM with a L2(YEAR) predictor (continuous), a L1 mediator (ordinal) and a L1 outcome (continuous). I used cross-classified as the data are clustered within persons and years.
Problems I ran into:
(a) When using M as cont. and L1 controls only for Y, PPC was sig. (bad model fit).
(b) When using M as cont. and L1 controls for Y AND M the PPC value was 0.158, so the model fit is ok. A strange result, because as a predictor on L1 M doesn't need controls.
(c) When using M as cat. without L1 controls, Mplus used M as DV and PPC was sig. (bad model fit).
All models yield quite similar parameter est., only model fit varies.
My questions:
(1) Is it correct that I have a latent mediator, because I analyze the mediator effect only on L2(YEAR)? This should allow for unbiased indirect/direct effects, even though M is categorical on L1?
(2) Cross-classified works with Bayes. Is it ok (in general) to use a categorical mediator and to define it as such? Is there something special to consider because of Bayes?
(3) I'm confused about the excellent model fit when I use the controls for M on L1 (which makes M a IV and a DV) despite using M as being continuous. Actually, M is ordinal...?!
(4) Do you see any problems regarding the model?
Thank you very much in advance!
 Anonymous posted on Monday, February 15, 2016 - 1:28 am
My model:
variable: usevar = Y ! outcome
C1 C2 C3 ! controls L1
M ! mediator
C4 ! control L2(YEAR)
C5 ! control L2(PERSON)
X; ! predictor L2(YEAR)
categorical = M;
cluster = PERSON YEAR;
within = C1 C2 C3;
between = (YEAR) C4 X;
between = (PERSON) C5;


analysis: type = crossclassified;
estimator = Bayes;
biterations = (2000);

model: %within%
Y M on C1 C2 C3;
Y M;
Y on M; ! 2-1-1 MSEM

%between PERSON%
Y M;
Y M on C5;
Y on M; ! 2-1-1 MSEM

%between YEAR%
Y M X;
Y M on C4;
M on X (a); ! 2-1-1 MSEM
Y on M (b) X; ! 2-1-1 MSEM

 Linda K. Muthen posted on Monday, February 15, 2016 - 10:08 am
We ask that posts on Mplus Discussion be limited to one window. Please send your question and related materials along with your license number to
 Mihaela Johnson posted on Monday, August 28, 2017 - 8:41 am
Hi Dr. Muth,
I read this thread and had a question about how to model a categorical mediator.

I am trying to run a model with a categorical independent variable (age - 4 categories) and a categorical mediator (4 category ordinal variable) and a continuous mediator (continuous/interval) and the dependent variable which will be continuous/interval, with a couple of control variables.

Does SEM accommodate this type of model? And what assumptions are made about the individual paths (e.g. does it assume that the categorical mediator is an indicator of an underlying continuous latent variable?)

Thank you!
 Bengt O. Muthen posted on Monday, August 28, 2017 - 4:40 pm
Yes, SEM accomodates this. Use WLSMV (or Bayes) which as you say considers linear relations between underlying continuous latent response variables so that regular indirect formulas are used.
 Zsófia Csajbók posted on Wednesday, November 14, 2018 - 7:26 am
Dear Dr. Muthen,

I have a continuous independent and a continuous dependent variable, with lots of continuous and binary mediators.

x -> b1-b4 c1-c4 -> y

where b1-b4 are binary; x, y, c1-c4 are continuous

I understood based on the User's Guide that I could choose between ML and WLSMV. But if I specify ML, I get a message that I should 'specify PARAMETERIZATION=RESCOV' then 'type=mixed'. Interestingly, if I don't specify anything with 'estimator=ML', it runs. I thought the baseline estimator is ML, so what is it if not ML? If I choose WLSMV it runs, but then I don't know if it differentiates between the binary and continuous variables.

So my questions are these:
1) When I have both binary and continuous mediators, I really have to choose WLSMV? Can it differentiate between the categorical and other mediators and performs different regressions? Why can't I choose the logistic regression for the binary variables? (I don't think I should assume continuous latent responses underlying my variables, e.g. sex, so logistic sounds more reasonable.)

2) What does it do when I don't specify the estimator and it runs?

3) Why do the non-standardized model results give higher than 1 correlations between the mediators?

Thank you!
 Bengt O. Muthen posted on Wednesday, November 14, 2018 - 5:01 pm
1) With a categorical M, WLSMV and Bayes consider a continuous latent response variable M*, not M as the mediator. In this way, all regressions are linear and regular indirect effects can be computed because M* is the predictor of Y. Logistic regression instead of probit does not make a difference; M*. With ML, M is the predictor of Y. Choosing M or M* as the predictor is a substantive choice.

2) See which estimator is used in the Summary section of the output. Typically with some categorical variables, Mplus defaults to WLSMV.

3) That suggests a misspecified model. If you like, you can send your output and data to Support along with your license number so we can diagnose.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message