Yes, you get correlations between categorical variables by simply requesting type = BASIC. I assume that by "correlations between each of the error terms", you simple mean the correlations? There are no error terms unless you specify a model, such as a 1-factor model.
Anonymous posted on Saturday, July 16, 2005 - 8:41 am
I'm not sure that I was clear with my question.
I have a situation in which selection occurs and I want to deal with that selection by modeling the outcomes simultaneously (a heckman selection model of sorts though with more than two outcomes). I have a series of independent variables that I am using to predict four dichotomous outcomes (college attendence, college type, transfer, and college completion). Is it possible to simultaneously model these outcomes also finding the correlations between the residuals of each outcome, accounting for selection into each population?
bmuthen posted on Saturday, July 16, 2005 - 11:08 am
Yes, if you have a regression of 4 binary dependent variables on covariates, the residuals of the dependent variables can be correlated. The default Mplus estimator is limited-information weighted least squares (WLSMV) using probit regressions, so the residual correlations concern residuals for underlying continuous latent response variables that are multivariate normal conditional on the covariates. I believe the default setting is that the residuals are correlated (check in Tech1).
Anonymous posted on Friday, July 22, 2005 - 5:12 pm
Dear Bengt and Linda,
This is a follow-up to the Heckman modeling question. We have run a version of this model using aML with one selection equation (college attendance) and 2 binary outcomes observed only for the population attending college (junior college attendance and transfer to 4-yr college). This appears to work. Our problem is that we have a third binary outcome still to be included, 4-year college attendance, but aML complained about estimating a 4x4 non-diagonal covariance matrix so we are now looking for alternative software.
I'm wondering how MPlus handles this model, in particular the selection, and if it handles it in a similar way as aML or even Stata. (Stata is capable of running a simplified model with 1 selection equation and 1 binary outcome observed for the selected population.) Do you know? Are there examples in MPlus documentation that discuss this type of model, how it is specified, and/or how it is estimated?
Ok, so you are aiming for the classic Heckman model. ML estimation of a Heckman selection modeling builds on probit/tobit modeling. I looked over the old Muthen-Joreskog (1983) article in Evaluation review. Eqns 9 - 11 describe 2 regressions. One is a probit regression for the binary outcome u determining if an outcome y is observed or not (a missing data indicator in essence) and the other is a regression for the outcome y. The residuals of the two continuous latent response variables of the two regressions are correlated. You have missing data on y for one of the two u outcomes. I wonder if this could be tricked into a 2-class mixture model in Mplus where the classes are the same as the observed missing data indicator (perfect measurement) and where one class does not have a y regression (coefficients zero) and y is scored as missing in that class. Mplus ML works with logit and not probit, but that may be ok - the residual correlation (which is zero by default with ML logit) could perhaps be orchestrated by using a factor that influences both outcomes. The y variable can be continuous or categorical and there can be several y's. Sounds like this might be possible but would take some thinking and doing.
Anonymous posted on Tuesday, July 26, 2005 - 8:56 pm
Thanks for the response. I've been looking into various other packages, but without much success so far. We might end up back trying Mplus. I guess we could compare estimates with aMLresults to see if things are working ok.
Thanks so much for your input. Laura
bmuthen posted on Wednesday, July 27, 2005 - 6:40 pm
Let us know if you resurrect this analysis interest in Mplus.
Are you aware of any papers using Mplus to estimate a Heckman model where the selection model has a binary DV and the outcome model has a continuous DV?
I've been trying to think of a way to do this in Mplus, thereby allowing for the inclusion of mediation after the outcome model. I thought perhaps using knownclasses (0,1) for the DV of the selection model and then only estimating the outcome model when knownclass = 1. But can't seem to estimate the effects for the outcome model separate from the selection model.
Actually, it will be posted tomorrow, or possibly late tonight.
Alice posted on Tuesday, September 05, 2017 - 4:36 am
Dear Professor Muthen,
I just read the paper you refer to above as "the old Muthen-Joreskog (1983) article." It was very interesting to me to see the discussion of the selection model with latent exogenous variables in section 6.2.
In the model, the measurement model (equation 55) is assumed to have group-invariant thresholds and factor loadings. I wonder if this is the standard/default assumption; i.e., the one one should make when estimating a selection model with exogenous latent variables? That seems to be a very crucial assumption.
The section ends up with proposing "a simple ad hoc estimator" in which the factors scores are estimated in the first step and the structural model (the selection equation and the equation of interest) is estimated in the second step. I wonder if this is the way MPLUS also estimates a Heckman selection model with latent exogenous variables - or whether MPLUS estimates all three equations simulatenously? I would have thought the latter, but given the paper, I am now in doubt.
Alice posted on Tuesday, September 05, 2017 - 4:38 am
Does the strategy suggested in the "simple ad hoc estimator" also hold for the Heckman two-stage estimator (discussed in the paper following equation 13) with latent exogenous variables?
What I mean is whether the estimation in this case should be carried out in three steps. Step 1: estimate the factor scores for the full sample and use these in the following steps. Step 2: estimate the selection equation for the full sample. Step 3: estimate the equation of interest on the selected sample.
I would have thought that one should estimate this model in two steps. Step 1: estimate the selection equation and the latent variables in that equation simultaneously for the full sample. Step 2: estimate the equation of interest and the latent variables in that equation simulatenously. However, given the paper, I am now in doubt whether this is considered to be incorrect. This also goes back to my question above on the group-invariant thresholds and loadings.
Alice posted on Tuesday, September 05, 2017 - 4:42 am
To clarify: In the last paragraph, I mean "Step 2: estimate the equation of interest and the latent variables in that equation simultaneously for the sample selected."
The intermediate factor score step is not needed in the current Mplus. It can estimate Heckman models using 1-step ML also with a factor model for the exogenous part. How to do Heckman regression modeling in Mplus is discussed in our new RMA book - you can see the setup at this link:
Hello, I'm estimating change in a count outcome two years post baseline, adjusting for baseline values and covariates. Missingness in y appears NMAR. My plan is to jointly estimate a count model of the primary variable of interest and a logit model of attrition in y. I'm using selection modeling as described in approach 16 (p. 475-476) described in the RMAM book.
a) With a count outcome, are there additional issues I am not accounting for?
b) Is the missing data model adjusted for the Poisson distribution of y when missing is regressed on y?
c) Would including both y and y(t-1) in the logit predicting missingness be over-adjustment?
y - count outcome y(t-1) - count outcome measured at baseline x - covariates missing - indicator of missingness in y ___ count is y; categorical is missing;
Following chap10 ex10.25_A6, the suggested model would look as follows. My research question doesn't include mediation, does excluding the MODEL INDIRECT statement negate the benefit of using a selection model in this scenario?
I've read that a Heckman selection model is commonly used to address non-random missingness in a follow-up outcome measure. This type of model is described as a way to address censoring in the RMAM text, but could it also be adapted to my needs?
model: y on y(t-1) x; y(t-1) on x; missing on y(t-1);
Let's assume that x and y_t-1 have no missing data and that y_t does have missing data. For the model you specify, I think you will get the same estimates if you include
missing on y_t-1;
or not. I say that because y_t-1 is a missing data predictor and y_t-1 is already included in the model due to your first 2 model statements. So you get FIML, that is, ML under MAR. It is just like RMA Fig 10.11 with m playing the role of y_t-1. You can try and see if you get the same values in those 2 runs.
If instead you change to
missing on y_t;
perhaps you would get NMAR selection modeling because y_t is partly latent due to missingness and therefore MAR does not hold.