Message/Author 


Dear Prof. Muthen, I have a crosssectional dataset and fitting the following standard SEM model to estimate, where Y1 is binary and Y2Y7 are ordered categorical indicators, X’s are strictly exogenous variables and eta’s are continuous latent variables. Y1 = g(eta1) Y2, Y3, Y4 = g(eta2) Y5, Y6, Y7 = g(eta3) Eta1 = f(eta2, eta3, X1); Eta2 = f(X2); Eta3 = f(X3); I am using MPlus 4.0 and everything is fine with the estimation result. However, I want to modify the model introducing (Duan, Manning type) twopart rationale. In particular, we have the following situation: Y0, a binary variable, indicates whether or not an individual participates in a program. Now, when an individual says yes to the participation question, we ask the question to generate Y1. It is likely that the latent variables eta2 and eta3 influence Y0 as well How could I do it in MPlus? Is there any theoretical paper which introduced factor analytic/SEM rationale in DuanManning type setup? Thanks and regards Sanjoy 


I think DuanManning worked in a regression setting and there I think the 2 parts of the model can be estimated separately because you can't identify a random effect correlation between them like you can in a hierarchical data setting such as the longitudinal data analysis considered in Olsen & Schafer's JASA article on 2part growth modeling. Thinking out loud, it seems the question then is if the SEM setting changes the independence of the 2 parts. I guess if Y0 is influenced by the same etas as Y1, then their factor loadings should be held equal across the 2 parts and therefore a simultaneous analysis of the 2 parts should be done. I guess this calls for doing 2part modeling  see the Mplus DATA TWOPART option. I haven't seen such an application. We have a Kim & Muthen paper on our web site which does 2part mixture factor analysis but there again you have hierarchical data in that you have multiple indicators  you have only one Y1  so that is different. 


Thank you Prof. Muthen. I just read your working paper on “twopart factor mixture model”. It is really helpful and I believe this one should help to estimate my model. Besides, my model is simpler than yours. I am not dealing with the mixture portion. Following is my model statement; where Y0 and Y1 are binary, Y2Y7 are ordered categorical, and X0, X1, X2 and X3 are the vector of exogenous variables. Each of the X’s has unique elements necessary for the model to be estimated. MODEL: ETA2 BY Y2 Y3 Y4; ETA3 BY Y5 Y6 Y7; ETA2 ON X2; ETA3 ON X3; Y1 ON ETA2 ETA3 X1; (Now adding the two part rationale where Y0 indicates response to the participation question: 1=yes and 0 otherwise and Y1 is valid only if Y0 is 1) Y0 ON ETA2 ETA3 X0; Q1: What should I write in the model section NOW to make Y1 “conditioned on” Y0? On page 11 of your article you wrote “correlated”. I might be confused but I don’t think we are doing simple correlation. Q2: Reviewer will ask for the likelihood function or the final sets of equations that MPlus calculate for the model when we combine regular SEM with Y0 in a Duan  Manning type framework. Is there any reference for that? Thanks and regards Sanjoy 


Q1. You need to create the data in line with "DATA TWOPART" described in the User's Guide. This specifies missing on y1 when y0=0. There is no correlation that can be estimated here. Q2. A related likelihood function is given in the OlsenSchafer's article. 


Thank you professor. Regards, Sanjoy 


I want to simultaneously examine trauma exposure (binary) and PTSD symptoms as mediators of several covariate effects on an outcome variable. Rather than model trauma and PTSD symptoms separately, I want to examine them within the same model so that I can identify the effect of PTSD symptoms on the outcome, over and above the influence of trauma exposure. However, PTSD symptoms are conditioned on trauma exposure (PTSD symptoms are only valid if trauma=1). Is it sufficient to code PTSD symptoms as missing for cases with trauma=0? Also, because trauma exposure and PTSD symptoms may share predictors other than the covariates, I believe I need to specify the residual covariance between them. Is the syntax that I’ve used below an appropriate way to do this? I was also considering specifying the model using the TWOPART feature as you described above. Any advice on how to handle this issue would be much appreciated! Categorical is trauma; Missing are ALL (99); MODEL: ptsdsx on gen advers ethinc pathol; trauma on gen advers ethinc pathol; y on trauma ptsdsx gen advers pathol ; f1 BY ptsd trauma; f1@1; [f1@0]; 


I think a Twopart approach is a bit more transparent than working with missing data approaches in this case. The two parts can have different predictors and different effects on the outcome. I think a residual covariance can be specified as you do it here. But I don't think the residual covariance is identifiable, as it is in growth models. Perhaps a sensitivity analysis can be carried out, fixing it to different values to capture the effects of potential leftout covariates. 


Thanks so much for your help. When you recommend a sensitivity analysis, do you mean setting the covariance to different values and seeing which has the best fit (e.g., by inspecting the loglikelihood)? When I allow the model to estimate the covariance using the above syntax, the covariance is negative and most of the covariate effects on trauma exposure and PTSD symptoms are no longer significant (they are when the covariance is not estimated). Substantively, I would expect the covariance to be positive. Perhaps this means the model estimates are unreliable and I am better off constraining the covariance to a certain value as you suggested? By the way, there are two reasons I thought it may be best to staty away from the twopart approach. First, I thought that I should try specifying PTSD symptoms as a count rather than continuous variable. Second, from the usersguide, it looks like when setting up the data to use twopart, cases with 0 PTSD symptoms would be coded as 0 on the binary part (no trauma exposure). However, some cases with trauma have no PTSD symptoms. 


You can do twopart modeling with count outcomes  this is called hurdle modeling. The "continuous" part is a zerotruncated Poisson variable. I didn't think the covariance you estimated was identified, so please send your output to Support. 


Hello. In my two part Modell outut there ist the line: ChiSquare Test for MCAR under the Unrestricted Latent Class Indicator Model Could you please explain what the results tell me: Value 100.355 Degrees of Freedom 130 pValue 0.9749 is this the result for the continuous part? Thanks 


This is a test of whether the data are missing completely at random. This cannot be rejected with a pvalue of .9749 For further information about this test, see the Little and Rubin book. 


thanks Linda, so is it right that i have no test statistic or descriptive measure of fit within the two part model at all? 


Hi Linda, I have a special question related to the mplus procedure two part with mediation. Having a mediation on a binary Y which results from a two part data set , is the default a probit or logit modell. Concerning the standardization procedure to compare a*b and cc'. There is already a stdy section in the output can't this be used or do i have to handcalulate the standardized effects via pi squared /3 and varinces ? Next: following after standardization via the varince to compare the a*b with cc' and testing the significance of the effect. Do i have to standardize the s.e. of the parameters also to do the sobel. is the formula for standardization of variance(ab) the same as for b(ab). thanks in advence. 


The default is a logit model. Note that the literature on indirect effects with a binary outcome y says that a*b will be different from cc' and that cc' is the wrong quantity to use. You don't want to use a_stand * b_stand, but instead compute a*b and then standardize it by dividing by the estimated y SD and multiplying it by the x SD. That calculation can be done in Model Constraint using parameter labels given in Model. Model Constraint then also gives the significance, with SE calculated automatically by the Delta method (of which the Sobel formula is a special case). 


thanks Bengt, this all puzzles me seriously. 1) The a*b and cc' problem I am aware. But I thought when standardizing both to the same scale they would be equal again (MacKinnon / Dwyer 1993)? Is this not true any longer? Could you please give a literature hint. 2) For the calculation: I am reading your 2011 paper "Applications of causally defined direct ..." . On the buttom of page 25 you state that " a latent mediator approach using logistic regression is not yet available in Mplus" <> My X and M are latent constructs, y is a binary single variable > Does this affect my calculations so that I have to switch to probit? 3) Could you please give a reference for the staandardization you describe? your help is appreciated 


hi Bengt, I also tried to get bcbootstrap CI. Mplus tells me: *** ERROR in ANALYSIS command BOOTSTRAP is not allowed with ALGORITHM=INTEGRATION. even after I switched to ema as algorithm. 


1) Look at the more recent paper MacKinnon, D.P., Lockwood, C.M., Brown, C.H., Wang, W., & Hoffman, J.M. (2007). The intermediate endpoint effect in logistic and probit regression. Clinical Trials, 4, 499513. which is on our web site. Also look at the Imai and Vanderweele references in my paper. 2) I was not talking about a latent variable construct, but the case of an observed categorical variable m. The question was if the observed categorical m or the latent response variable m* behind m was the mediator of interest. 3)I don't know about a reference  it simply uses first principles for standardization where you always divide by the DV's SD and multiply by the IVs SD. The DV is y and the IV is x in an xmy mediation model. Mplus does not yet offer bootstrap when numerical integration is needed. You can use Bayes if you are worried about nonnormality of the a*b product. 


thanks for your advice. best alex 


Hello, I am using a two part model in mplus as my data show a high number of zero’s. Focussing at the quantitative part: the zero in the binary part are missings in the quantitative part. In this quantitative part I specified a latent factor. As I use FIML the missings process will take information from the information in distributions from the data. I am concerned, these missing are not like missing data from a questionnaire – they have a meaning. So is FIML imputing the right way. But otherwise the deleting of missing cases will reduce the dataset to zero cases. Can you give some comments on this issue please. Thanks a lot 


The missing on the continuous part due to the binary part being zero is not missingness of the regular kind, but simply a way to arrange the data in order to obtain the likelihood of the model in line with what is described in the OlsenSchafer (2001) JASA paper that originated twopart growth modeling. They don't use the data arrangement that Mplus uses, but the two approaches have the same likelihood. On top of this "structural" missingness you can have missing on the continuous variables even when the binary part is one. That kind of missingness is handled via "FIML" under MAR as usual. Note that twopart modeling is different from ZIP modeling. The latter is a special 2class model, whereas the former is not. Note also that twopart modeling with counts also has the name hurdle modeling often used in the literature. Mplus can handle ZIP and hurdle models. 


P.S. Note also the interesting discussion in OlsenSchafer (2001) at the top of the rightmost column of page 742. This concerns counterfactuals and the expected response on the continuous variable with different values of the binary variable. 


Hi Bengt, this is hard to understand. puh. So my interpretation is: there is no bias from the structural zeros in the continuous part, because the structural zero information ist left out in calculating the likelihood. Put this to practice: it is not like estimating the model just without the cases with the structural zeros. If I have 413 cases total, around 70% structural zeros mixed around in six y indicators of a latent variable eta, then the relationship beta between ksi and eta is based on the 30% remaining information in y connected to a) all information of 413 cases in ksi b) only that ksi cases which have an corresponding y (this I think is not the case) c) ...  and there my understanding stops. Is it also correct to say that there is not so much information left to calculate that beta. Thanks 


Hi Bengt, going on .. concerning the Zip. I want to model an interaction effect with my continuous part, then mplus has really trouble with zip or zinb. I guess this comes from the 2class within the zip. In the two part it is much more stable. Considering a two part with an truncated poison at zero. how to tell Mplus that I have a truncated poison in the continuous part? Isn't that a 2class model then? As I understand, truncation is different to censoring. So the censoring (bi) specification should not be adequate as my y are positive continuous data. Best Alex 


Yes, 70% who have the binary indicator=0 is a large percentage. This reduces your power and also makes the analysis rely more heavily on model assumptions. It is not clear to me if your continuous part corresponds to a continuous variable or a count variable. Only for count variables would ZIP, ZINB, or truncated Poisson be relevant. A twopart model for a count outcome can be specified using the negative binomial hurdle model specified in Mplus using Count = u(nbh). See page 493 in the V6 UG. 

rongqin posted on Wednesday, March 02, 2016  8:42 am



In the twopart model, are the two parts totally INDEPENDENT from each other? For instance, if I want to include a correlation between X (indicated with Binary X and Continuous X) and Y. Do I have to specify both following paths? Binary X with Y; Continuous X with Y; What I am wondering is whether I DOUBLE estimated correlations in the model by specifying both. 


Are you regressing Y on X and you want to treat Y as twopart? It sounds like X is twopart which is strange if Y is the DV. 


I have several categorical variables (4point rating scale) with strong celings at zero. I'd like to model a two part model. However, the "continious" part has only three categories. Is it appropriate to treat the continious part as ordered categorical and use the WLSMV estimator? If not, which estimator should I use instead? Many thanks! 


I would not treat the positive tail as continuous with only 3 scale points supporting it. In our Topic 11 short course we discuss this matter and how do handle inflatedordinal modeling at the end of Part 1. See also our book Regression and Mediation Analysis using Mplus. 


Many thanks for your immediate response. As suggested, I declared the continuous part as categorical in the variable command and did not transform the continuous part when creating the twopart data of my twopart factor model. Since I have several traits/factors (different kinds of behaviors assessed with several items, all traits as twopart factor models, with a binary and orderedcategorical part), it seems like numerical integration becomes computational demanding. Using the WLSMV estimator, the model converges quickly. However, I was wondering if using WLSMV is appropriate, since all examples I have seen used numerical integration with ML(R) (I think that has something to do with the dealing of missing values, at least I think so). Is it okay to change the estimator to WLSMV in twopart factor modeling or do I have to stay with numerical integration? 


WLSMV does not handle missing data in a way that twopart needs. You can instead use Estimator = Bayes. See for instance my paper on our website: Muthén, B. (2010). Bayesian analysis in Mplus: A brief introduction. Technical Report. Version 3. Click here to view Mplus inputs, data, and outputs used in this paper. download paper contact author show abstract and for Bayes in general, also Topic 11 at http://www.statmodel.com/course_materials.shtml 

Oliver Buss posted on Wednesday, January 10, 2018  5:36 am



Dear Mr Muthen, I am trying to model a twopart model with proportional data (A) with a high percentage at zero as a part of a dual change score model (A and B). The second continuous variable (B) is normally distributed. My approach is to extend the continuous part of the twopart model (A) with latent difference scores. I really appreciate any recommendations, be it applications and or theoretical input/studies. Especially, whether there are known applications in Mplus. Thanks and regards. 


I don't know of latent difference score modeling in a twopart context but since it refers to the continuous part it seems like it would not be different from regular modeling. You may want to ask on SEMNET. 

Back to top 