I have a cross-sectional dataset and fitting the following standard SEM model to estimate, where Y1 is binary and Y2-Y7 are ordered categorical indicators, X’s are strictly exogenous variables and eta’s are continuous latent variables.
I am using MPlus 4.0 and everything is fine with the estimation result. However, I want to modify the model introducing (Duan, Manning type) two-part rationale. In particular, we have the following situation:
Y0, a binary variable, indicates whether or not an individual participates in a program. Now, when an individual says yes to the participation question, we ask the question to generate Y1. It is likely that the latent variables eta2 and eta3 influence Y0 as well
How could I do it in MPlus? Is there any theoretical paper which introduced factor analytic/SEM rationale in Duan-Manning type setup?
I think Duan-Manning worked in a regression setting and there I think the 2 parts of the model can be estimated separately because you can't identify a random effect correlation between them like you can in a hierarchical data setting such as the longitudinal data analysis considered in Olsen & Schafer's JASA article on 2-part growth modeling.
Thinking out loud, it seems the question then is if the SEM setting changes the independence of the 2 parts. I guess if Y0 is influenced by the same etas as Y1, then their factor loadings should be held equal across the 2 parts and therefore a simultaneous analysis of the 2 parts should be done. I guess this calls for doing 2-part modeling - see the Mplus DATA TWOPART option. I haven't seen such an application. We have a Kim & Muthen paper on our web site which does 2-part mixture factor analysis but there again you have hierarchical data in that you have multiple indicators - you have only one Y1 - so that is different.
Thank you Prof. Muthen. I just read your working paper on “two-part factor mixture model”. It is really helpful and I believe this one should help to estimate my model. Besides, my model is simpler than yours. I am not dealing with the mixture portion.
Following is my model statement; where Y0 and Y1 are binary, Y2-Y7 are ordered categorical, and X0, X1, X2 and X3 are the vector of exogenous variables. Each of the X’s has unique elements necessary for the model to be estimated.
ETA2 BY Y2 Y3 Y4; ETA3 BY Y5 Y6 Y7; ETA2 ON X2; ETA3 ON X3; Y1 ON ETA2 ETA3 X1;
(Now adding the two part rationale where Y0 indicates response to the participation question: 1=yes and 0 otherwise and Y1 is valid only if Y0 is 1)
Y0 ON ETA2 ETA3 X0;
Q1: What should I write in the model section NOW to make Y1 “conditioned on” Y0? On page 11 of your article you wrote “correlated”. I might be confused but I don’t think we are doing simple correlation.
Q2: Reviewer will ask for the likelihood function or the final sets of equations that MPlus calculate for the model when we combine regular SEM with Y0 in a Duan - Manning type framework. Is there any reference for that?
I want to simultaneously examine trauma exposure (binary) and PTSD symptoms as mediators of several covariate effects on an outcome variable. Rather than model trauma and PTSD symptoms separately, I want to examine them within the same model so that I can identify the effect of PTSD symptoms on the outcome, over and above the influence of trauma exposure. However, PTSD symptoms are conditioned on trauma exposure (PTSD symptoms are only valid if trauma=1). Is it sufficient to code PTSD symptoms as missing for cases with trauma=0? Also, because trauma exposure and PTSD symptoms may share predictors other than the covariates, I believe I need to specify the residual covariance between them. Is the syntax that I’ve used below an appropriate way to do this? I was also considering specifying the model using the TWOPART feature as you described above. Any advice on how to handle this issue would be much appreciated!
Categorical is trauma; Missing are ALL (-99);
MODEL: ptsdsx on gen advers ethinc pathol; trauma on gen advers ethinc pathol; y on trauma ptsdsx gen advers pathol ;
I think a Twopart approach is a bit more transparent than working with missing data approaches in this case. The two parts can have different predictors and different effects on the outcome.
I think a residual covariance can be specified as you do it here. But I don't think the residual covariance is identifiable, as it is in growth models. Perhaps a sensitivity analysis can be carried out, fixing it to different values to capture the effects of potential left-out covariates.
Thanks so much for your help. When you recommend a sensitivity analysis, do you mean setting the covariance to different values and seeing which has the best fit (e.g., by inspecting the loglikelihood)? When I allow the model to estimate the covariance using the above syntax, the covariance is negative and most of the covariate effects on trauma exposure and PTSD symptoms are no longer significant (they are when the covariance is not estimated). Substantively, I would expect the covariance to be positive. Perhaps this means the model estimates are unreliable and I am better off constraining the covariance to a certain value as you suggested?
By the way, there are two reasons I thought it may be best to staty away from the twopart approach. First, I thought that I should try specifying PTSD symptoms as a count rather than continuous variable. Second, from the usersguide, it looks like when setting up the data to use twopart, cases with 0 PTSD symptoms would be coded as 0 on the binary part (no trauma exposure). However, some cases with trauma have no PTSD symptoms.
I have a special question related to the mplus procedure two part with mediation.
Having a mediation on a binary Y which results from a two part data set , is the default a probit or logit modell. Concerning the standardization procedure to compare a*b and c-c'. There is already a stdy section in the output can't this be used or do i have to handcalulate the standardized effects via pi squared /3 and varinces ? Next: following after standardization via the varince to compare the a*b with c-c' and testing the significance of the effect. Do i have to standardize the s.e. of the parameters also to do the sobel. is the formula for standardization of variance(ab) the same as for b(ab).
The default is a logit model. Note that the literature on indirect effects with a binary outcome y says that a*b will be different from c-c' and that c-c' is the wrong quantity to use.
You don't want to use a_stand * b_stand, but instead compute a*b and then standardize it by dividing by the estimated y SD and multiplying it by the x SD. That calculation can be done in Model Constraint using parameter labels given in Model. Model Constraint then also gives the significance, with SE calculated automatically by the Delta method (of which the Sobel formula is a special case).
1) The a*b and c-c' problem I am aware. But I thought when standardizing both to the same scale they would be equal again (MacKinnon / Dwyer 1993)? Is this not true any longer? Could you please give a literature hint.
2) For the calculation: I am reading your 2011 paper "Applications of causally defined direct ..." . On the buttom of page 25 you state that " a latent mediator approach using logistic regression is not yet available in Mplus" <--> My X and M are latent constructs, y is a binary single variable --> Does this affect my calculations so that I have to switch to probit?
3) Could you please give a reference for the staandardization you describe?
MacKinnon, D.P., Lockwood, C.M., Brown, C.H., Wang, W., & Hoffman, J.M. (2007). The intermediate endpoint effect in logistic and probit regression. Clinical Trials, 4, 499-513.
which is on our web site. Also look at the Imai and Vanderweele references in my paper.
2) I was not talking about a latent variable construct, but the case of an observed categorical variable m. The question was if the observed categorical m or the latent response variable m* behind m was the mediator of interest.
3)I don't know about a reference - it simply uses first principles for standardization where you always divide by the DV's SD and multiply by the IVs SD. The DV is y and the IV is x in an x-m-y mediation model.
Mplus does not yet offer bootstrap when numerical integration is needed. You can use Bayes if you are worried about non-normality of the a*b product.
I am using a two part model in mplus as my data show a high number of zero’s.
Focussing at the quantitative part: the zero in the binary part are missings in the quantitative part. In this quantitative part I specified a latent factor. As I use FIML the missings process will take information from the information in distributions from the data.
I am concerned, these missing are not like missing data from a questionnaire – they have a meaning. So is FIML imputing the right way. But otherwise the deleting of missing cases will reduce the dataset to zero cases.
Can you give some comments on this issue please. Thanks a lot
The missing on the continuous part due to the binary part being zero is not missingness of the regular kind, but simply a way to arrange the data in order to obtain the likelihood of the model in line with what is described in the Olsen-Schafer (2001) JASA paper that originated two-part growth modeling. They don't use the data arrangement that Mplus uses, but the two approaches have the same likelihood.
On top of this "structural" missingness you can have missing on the continuous variables even when the binary part is one. That kind of missingness is handled via "FIML" under MAR as usual.
Note that two-part modeling is different from ZIP modeling. The latter is a special 2-class model, whereas the former is not. Note also that two-part modeling with counts also has the name hurdle modeling often used in the literature. Mplus can handle ZIP and hurdle models.
Note also the interesting discussion in Olsen-Schafer (2001) at the top of the right-most column of page 742. This concerns counterfactuals and the expected response on the continuous variable with different values of the binary variable.
So my interpretation is: there is no bias from the structural zeros in the continuous part, because the structural zero information ist left out in calculating the likelihood.
Put this to practice: it is not like estimating the model just without the cases with the structural zeros.
If I have 413 cases total, around 70% structural zeros mixed around in six y indicators of a latent variable eta, then the relationship beta between ksi and eta is based on the 30% remaining information in y connected to a) all information of 413 cases in ksi b) only that ksi cases which have an corresponding y (this I think is not the case) c) ... - and there my understanding stops.
Is it also correct to say that there is not so much information left to calculate that beta.
Yes, 70% who have the binary indicator=0 is a large percentage. This reduces your power and also makes the analysis rely more heavily on model assumptions.
It is not clear to me if your continuous part corresponds to a continuous variable or a count variable. Only for count variables would ZIP, ZINB, or truncated Poisson be relevant.
A two-part model for a count outcome can be specified using the negative binomial hurdle model specified in Mplus using Count = u(nbh). See page 493 in the V6 UG.
rongqin posted on Wednesday, March 02, 2016 - 8:42 am
In the two-part model, are the two parts totally INDEPENDENT from each other? For instance, if I want to include a correlation between X (indicated with Binary X and Continuous X) and Y. Do I have to specify both following paths?
Binary X with Y; Continuous X with Y;
What I am wondering is whether I DOUBLE estimated correlations in the model by specifying both.
I have several categorical variables (4-point rating scale) with strong celings at zero. I'd like to model a two part model. However, the "continious" part has only three categories. Is it appropriate to treat the continious part as ordered categorical and use the WLSMV estimator? If not, which estimator should I use instead? Many thanks!
I would not treat the positive tail as continuous with only 3 scale points supporting it. In our Topic 11 short course we discuss this matter and how do handle inflated-ordinal modeling at the end of Part 1. See also our book Regression and Mediation Analysis using Mplus.
As suggested, I declared the continuous part as categorical in the variable command and did not transform the continuous part when creating the two-part data of my two-part factor model.
Since I have several traits/factors (different kinds of behaviors assessed with several items, all traits as two-part factor models, with a binary and ordered-categorical part), it seems like numerical integration becomes computational demanding. Using the WLSMV estimator, the model converges quickly. However, I was wondering if using WLSMV is appropriate, since all examples I have seen used numerical integration with ML(R) (I think that has something to do with the dealing of missing values, at least I think so). Is it okay to change the estimator to WLSMV in two-part factor modeling or do I have to stay with numerical integration?
WLSMV does not handle missing data in a way that two-part needs. You can instead use Estimator = Bayes. See for instance my paper on our website:
Muthén, B. (2010). Bayesian analysis in Mplus: A brief introduction. Technical Report. Version 3. Click here to view Mplus inputs, data, and outputs used in this paper. download paper contact author show abstract