Two-part modeling and SEM PreviousNext
Mplus Discussion > Structural Equation Modeling >
 Sanjoy Bhattacharjee posted on Wednesday, October 24, 2007 - 2:59 pm
Dear Prof. Muthen,

I have a cross-sectional dataset and fitting the following standard SEM model to estimate, where Y1 is binary and Y2-Y7 are ordered categorical indicators, X’s are strictly exogenous variables and eta’s are continuous latent variables.

Y1 = g(eta1)
Y2, Y3, Y4 = g(eta2)
Y5, Y6, Y7 = g(eta3)
Eta1 = f(eta2, eta3, X1);
Eta2 = f(X2);
Eta3 = f(X3);

I am using MPlus 4.0 and everything is fine with the estimation result. However, I want to modify the model introducing (Duan, Manning type) two-part rationale. In particular, we have the following situation:

Y0, a binary variable, indicates whether or not an individual participates in a program. Now, when an individual says yes to the participation question, we ask the question to generate Y1.
It is likely that the latent variables eta2 and eta3 influence Y0 as well

How could I do it in MPlus? Is there any theoretical paper which introduced factor analytic/SEM rationale in Duan-Manning type setup?

Thanks and regards
 Bengt O. Muthen posted on Thursday, October 25, 2007 - 9:17 am
I think Duan-Manning worked in a regression setting and there I think the 2 parts of the model can be estimated separately because you can't identify a random effect correlation between them like you can in a hierarchical data setting such as the longitudinal data analysis considered in Olsen & Schafer's JASA article on 2-part growth modeling.

Thinking out loud, it seems the question then is if the SEM setting changes the independence of the 2 parts. I guess if Y0 is influenced by the same etas as Y1, then their factor loadings should be held equal across the 2 parts and therefore a simultaneous analysis of the 2 parts should be done. I guess this calls for doing 2-part modeling - see the Mplus DATA TWOPART option. I haven't seen such an application. We have a Kim & Muthen paper on our web site which does 2-part mixture factor analysis but there again you have hierarchical data in that you have multiple indicators - you have only one Y1 - so that is different.
 Sanjoy Bhattacharjee posted on Thursday, October 25, 2007 - 12:31 pm
Thank you Prof. Muthen. I just read your working paper on “two-part factor mixture model”. It is really helpful and I believe this one should help to estimate my model. Besides, my model is simpler than yours. I am not dealing with the mixture portion.

Following is my model statement; where Y0 and Y1 are binary, Y2-Y7 are ordered categorical, and X0, X1, X2 and X3 are the vector of exogenous variables. Each of the X’s has unique elements necessary for the model to be estimated.


ETA2 BY Y2 Y3 Y4;
ETA3 BY Y5 Y6 Y7;

(Now adding the two part rationale where Y0 indicates response to the participation question: 1=yes and 0 otherwise and Y1 is valid only if Y0 is 1)


Q1: What should I write in the model section NOW to make Y1 “conditioned on” Y0? On page 11 of your article you wrote “correlated”. I might be confused but I don’t think we are doing simple correlation.

Q2: Reviewer will ask for the likelihood function or the final sets of equations that MPlus calculate for the model when we combine regular SEM with Y0 in a Duan - Manning type framework. Is there any reference for that?

Thanks and regards
 Bengt O. Muthen posted on Friday, October 26, 2007 - 10:03 am
Q1. You need to create the data in line with "DATA TWOPART" described in the User's Guide. This specifies missing on y1 when y0=0. There is no correlation that can be estimated here.

Q2. A related likelihood function is given in the Olsen-Schafer's article.
 Sanjoy Bhattacharjee posted on Monday, October 29, 2007 - 9:18 am
Thank you professor. Regards, Sanjoy
 Moira Haller posted on Saturday, October 15, 2011 - 3:53 pm
I want to simultaneously examine trauma exposure (binary) and PTSD symptoms as mediators of several covariate effects on an outcome variable. Rather than model trauma and PTSD symptoms separately, I want to examine them within the same model so that I can identify the effect of PTSD symptoms on the outcome, over and above the influence of trauma exposure. However, PTSD symptoms are conditioned on trauma exposure (PTSD symptoms are only valid if trauma=1). Is it sufficient to code PTSD symptoms as missing for cases with trauma=0? Also, because trauma exposure and PTSD symptoms may share predictors other than the covariates, I believe I need to specify the residual covariance between them. Is the syntax that I’ve used below an appropriate way to do this? I was also considering specifying the model using the TWOPART feature as you described above. Any advice on how to handle this issue would be much appreciated!

Categorical is trauma;
Missing are ALL (-99);

ptsdsx on gen advers ethinc pathol;
trauma on gen advers ethinc pathol;
y on trauma ptsdsx gen advers pathol ;

f1 BY ptsd trauma;
f1@1; [f1@0];
 Bengt O. Muthen posted on Saturday, October 15, 2011 - 6:07 pm
I think a Twopart approach is a bit more transparent than working with missing data approaches in this case. The two parts can have different predictors and different effects on the outcome.

I think a residual covariance can be specified as you do it here. But I don't think the residual covariance is identifiable, as it is in growth models. Perhaps a sensitivity analysis can be carried out, fixing it to different values to capture the effects of potential left-out covariates.
 Moira Haller posted on Sunday, October 16, 2011 - 1:00 pm
Thanks so much for your help.
When you recommend a sensitivity analysis, do you mean setting the covariance to different values and seeing which has the best fit (e.g., by inspecting the loglikelihood)? When I allow the model to estimate the covariance using the above syntax, the covariance is negative and most of the covariate effects on trauma exposure and PTSD symptoms are no longer significant (they are when the covariance is not estimated). Substantively, I would expect the covariance to be positive. Perhaps this means the model estimates are unreliable and I am better off constraining the covariance to a certain value as you suggested?

By the way, there are two reasons I thought it may be best to staty away from the twopart approach. First, I thought that I should try specifying PTSD symptoms as a count rather than continuous variable. Second, from the usersguide, it looks like when setting up the data to use twopart, cases with 0 PTSD symptoms would be coded as 0 on the binary part (no trauma exposure). However, some cases with trauma have no PTSD symptoms.
 Bengt O. Muthen posted on Sunday, October 16, 2011 - 8:43 pm
You can do two-part modeling with count outcomes - this is called hurdle modeling. The "continuous" part is a zero-truncated Poisson variable.

I didn't think the covariance you estimated was identified, so please send your output to Support.
 Alexander Kapeller posted on Thursday, March 01, 2012 - 10:16 am

In my two part Modell outut there ist the line:

Chi-Square Test for MCAR under the Unrestricted Latent Class Indicator Model

Could you please explain what the results tell me:
Value 100.355
Degrees of Freedom 130
p-Value 0.9749

is this the result for the continuous part?

 Linda K. Muthen posted on Thursday, March 01, 2012 - 12:52 pm
This is a test of whether the data are missing completely at random. This cannot be rejected with a p-value of .9749 For further information about this test, see the Little and Rubin book.
 Alexander Kapeller posted on Saturday, March 03, 2012 - 11:02 am
thanks Linda,

so is it right that i have no test statistic or descriptive measure of fit within the two part model at all?
 Alexander Kapeller posted on Saturday, March 03, 2012 - 11:06 am
Hi Linda,

I have a special question related to the mplus procedure two part with mediation.

Having a mediation on a binary Y which results from a two part data set , is the default a probit or logit modell. Concerning the standardization procedure to compare a*b and c-c'. There is already a stdy section in the output can't this be used or do i have to handcalulate the standardized effects via pi squared /3 and varinces ?
Next: following after standardization via the varince to compare the a*b with c-c' and testing the significance of the effect. Do i have to standardize the s.e. of the parameters also to do the sobel. is the formula for standardization of variance(ab) the same as for b(ab).

thanks in advence.
 Bengt O. Muthen posted on Saturday, March 03, 2012 - 4:47 pm
The default is a logit model. Note that the literature on indirect effects with a binary outcome y says that a*b will be different from c-c' and that c-c' is the wrong quantity to use.

You don't want to use a_stand * b_stand, but instead compute a*b and then standardize it by dividing by the estimated y SD and multiplying it by the x SD. That calculation can be done in Model Constraint using parameter labels given in Model. Model Constraint then also gives the significance, with SE calculated automatically by the Delta method (of which the Sobel formula is a special case).
 Alexander Kapeller posted on Sunday, March 04, 2012 - 2:46 pm
thanks Bengt, this all puzzles me seriously.

1) The a*b and c-c' problem I am aware. But I thought when standardizing both to the same scale they would be equal again (MacKinnon / Dwyer 1993)? Is this not true any longer? Could you please give a literature hint.

2) For the calculation: I am reading your 2011 paper "Applications of causally defined direct ..." . On the buttom of page 25 you state that " a latent mediator approach using logistic regression is not yet available in Mplus" <--> My X and M are latent constructs, y is a binary single variable --> Does this affect my calculations so that I have to switch to probit?

3) Could you please give a reference for the staandardization you describe?

your help is appreciated
 Alexander Kapeller posted on Sunday, March 04, 2012 - 3:46 pm
hi Bengt,

I also tried to get bcbootstrap CI. Mplus tells me:

*** ERROR in ANALYSIS command

even after I switched to ema as algorithm.
 Bengt O. Muthen posted on Sunday, March 04, 2012 - 6:07 pm
1) Look at the more recent paper

MacKinnon, D.P., Lockwood, C.M., Brown, C.H., Wang, W., & Hoffman, J.M. (2007). The intermediate endpoint effect in logistic and probit regression. Clinical Trials, 4, 499-513.

which is on our web site. Also look at the Imai and Vanderweele references in my paper.

2) I was not talking about a latent variable construct, but the case of an observed categorical variable m. The question was if the observed categorical m or the latent response variable m* behind m was the mediator of interest.

3)I don't know about a reference - it simply uses first principles for standardization where you always divide by the DV's SD and multiply by the IVs SD. The DV is y and the IV is x in an x-m-y mediation model.

Mplus does not yet offer bootstrap when numerical integration is needed. You can use Bayes if you are worried about non-normality of the a*b product.
 Alexander Kapeller posted on Tuesday, March 06, 2012 - 8:56 am
thanks for your advice.

best alex
 Alexander Kapeller posted on Monday, April 09, 2012 - 11:21 am

I am using a two part model in mplus as my data show a high number of zero’s.

Focussing at the quantitative part: the zero in the binary part are missings in the quantitative part. In this quantitative part I specified a latent factor. As I use FIML the missings process will take information from the information in distributions from the data.

I am concerned, these missing are not like missing data from a questionnaire – they have a meaning. So is FIML imputing the right way. But otherwise the deleting of missing cases will reduce the dataset to zero cases.

Can you give some comments on this issue please. Thanks a lot
 Bengt O. Muthen posted on Tuesday, April 10, 2012 - 8:37 am
The missing on the continuous part due to the binary part being zero is not missingness of the regular kind, but simply a way to arrange the data in order to obtain the likelihood of the model in line with what is described in the Olsen-Schafer (2001) JASA paper that originated two-part growth modeling. They don't use the data arrangement that Mplus uses, but the two approaches have the same likelihood.

On top of this "structural" missingness you can have missing on the continuous variables even when the binary part is one. That kind of missingness is handled via "FIML" under MAR as usual.

Note that two-part modeling is different from ZIP modeling. The latter is a special 2-class model, whereas the former is not. Note also that two-part modeling with counts also has the name hurdle modeling often used in the literature. Mplus can handle ZIP and hurdle models.
 Bengt O. Muthen posted on Tuesday, April 10, 2012 - 8:44 am

Note also the interesting discussion in Olsen-Schafer (2001) at the top of the right-most column of page 742. This concerns counterfactuals and the expected response on the continuous variable with different values of the binary variable.
 Alexander Kapeller posted on Tuesday, April 10, 2012 - 9:40 am
Hi Bengt,

this is hard to understand. puh.

So my interpretation is: there is no bias from the structural zeros in the continuous part, because the structural zero information ist left out in calculating the likelihood.

Put this to practice: it is not like estimating the model just without the cases with the structural zeros.

If I have 413 cases total, around 70% structural zeros mixed around in six y indicators of a latent variable eta, then the relationship beta between ksi and eta is based on the 30% remaining information in y connected to
a) all information of 413 cases in ksi
b) only that ksi cases which have an corresponding y (this I think is not the case)
c) ... - and there my understanding stops.

Is it also correct to say that there is not so much information left to calculate that beta.

 Alexander Kapeller posted on Tuesday, April 10, 2012 - 9:58 am
Hi Bengt, going on ..

concerning the Zip. I want to model an interaction effect with my continuous part, then mplus has really trouble with zip or zinb. I guess this comes from the 2class within the zip.

In the two part it is much more stable.

Considering a two part with an truncated poison at zero. how to tell Mplus that I have a truncated poison in the continuous part?

Isn't that a 2class model then?

As I understand, truncation is different to censoring. So the censoring (bi) specification should not be adequate as my y are positive continuous data.

 Bengt O. Muthen posted on Tuesday, April 10, 2012 - 6:38 pm
Yes, 70% who have the binary indicator=0 is a large percentage. This reduces your power and also makes the analysis rely more heavily on model assumptions.

It is not clear to me if your continuous part corresponds to a continuous variable or a count variable. Only for count variables would ZIP, ZINB, or truncated Poisson be relevant.

A two-part model for a count outcome can be specified using the negative binomial hurdle model specified in Mplus using Count = u(nbh). See page 493 in the V6 UG.
 rongqin posted on Wednesday, March 02, 2016 - 8:42 am
In the two-part model, are the two parts totally INDEPENDENT from each other? For instance, if I want to include a correlation between X (indicated with Binary X and Continuous X) and Y.
Do I have to specify both following paths?

Binary X with Y;
Continuous X with Y;

What I am wondering is whether I DOUBLE estimated correlations in the model by specifying both.
 Bengt O. Muthen posted on Wednesday, March 02, 2016 - 6:44 pm
Are you regressing Y on X and you want to treat Y as two-part?

It sounds like X is two-part which is strange if Y is the DV.
 Manuel Heinrich posted on Tuesday, October 17, 2017 - 9:13 am
I have several categorical variables (4-point rating scale) with strong celings at zero. I'd like to model a two part model. However, the "continious" part has only three categories. Is it appropriate to treat the continious part as ordered categorical and use the WLSMV estimator? If not, which estimator should I use instead? Many thanks!
 Bengt O. Muthen posted on Tuesday, October 17, 2017 - 6:20 pm
I would not treat the positive tail as continuous with only 3 scale points supporting it. In our Topic 11 short course we discuss this matter and how do handle inflated-ordinal modeling at the end of Part 1. See also our book Regression and Mediation Analysis using Mplus.
 Manuel Heinrich posted on Wednesday, October 18, 2017 - 4:31 am
Many thanks for your immediate response.

As suggested, I declared the continuous part as categorical in the variable command and did not transform the continuous part when creating the two-part data of my two-part factor model.

Since I have several traits/factors (different kinds of behaviors assessed with several items, all traits as two-part factor models, with a binary and ordered-categorical part), it seems like numerical integration becomes computational demanding. Using the WLSMV estimator, the model converges quickly. However, I was wondering if using WLSMV is appropriate, since all examples I have seen used numerical integration with ML(R) (I think that has something to do with the dealing of missing values, at least I think so). Is it okay to change the estimator to WLSMV in two-part factor modeling or do I have to stay with numerical integration?
 Bengt O. Muthen posted on Wednesday, October 18, 2017 - 2:39 pm
WLSMV does not handle missing data in a way that two-part needs. You can instead use Estimator = Bayes. See for instance my paper on our website:

Muthén, B. (2010). Bayesian analysis in Mplus: A brief introduction. Technical Report. Version 3. Click here to view Mplus inputs, data, and outputs used in this paper.
download paper contact author show abstract

and for Bayes in general, also Topic 11 at
 Oliver Buss posted on Wednesday, January 10, 2018 - 5:36 am
Dear Mr Muthen,

I am trying to model a two-part model with proportional data (A) with a high percentage at zero as a part of a dual change score model (A and B).

The second continuous variable (B) is normally distributed. My approach is to extend the continuous part of the two-part model (A) with latent difference scores.

I really appreciate any recommendations, be it applications and or theoretical input/studies. Especially, whether there are known applications in Mplus.

Thanks and regards.
 Bengt O. Muthen posted on Wednesday, January 10, 2018 - 10:17 am
I don't know of latent difference score modeling in a two-part context but since it refers to the continuous part it seems like it would not be different from regular modeling. You may want to ask on SEMNET.
 Johanna Ziemes posted on Thursday, June 04, 2020 - 5:52 am
Dear Mplus Team,
I combine two-part with multi-level analyses. The binary variable is the predicted variable. When I try to do latent aggregation to predict the binary variable on Lvl2 I get this error:

*** ERROR in MODEL command
An x variable is not declared as within or between only. Latent variable decomposition of an x
variable is not available using ALGORITHM=INTEGRATION unless the variable is turned into a y variable by
mentioning its variance. Problem with the following variable: S_OPDISC

NAMES = Vic_sum;

VIC_BIN on a b c d;

 Bengt O. Muthen posted on Friday, June 05, 2020 - 6:24 pm
We need to see your full output to diagnose this - send to Support along with your license number.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message