Message/Author 

Anonymous posted on Friday, July 15, 2005  10:08 am



I am not very familiar with Mplus. I was wondering if it would jointly estimate 4 dichotomous outcomes, estimating the correlations between each of the error terms. I'm sorry if this is the wrong topic to post under, I wasn't sure where the question fit. Thanks for your help 

bmuthen posted on Friday, July 15, 2005  6:15 pm



Yes, you get correlations between categorical variables by simply requesting type = BASIC. I assume that by "correlations between each of the error terms", you simple mean the correlations? There are no error terms unless you specify a model, such as a 1factor model. 

Anonymous posted on Saturday, July 16, 2005  8:41 am



I'm not sure that I was clear with my question. I have a situation in which selection occurs and I want to deal with that selection by modeling the outcomes simultaneously (a heckman selection model of sorts though with more than two outcomes). I have a series of independent variables that I am using to predict four dichotomous outcomes (college attendence, college type, transfer, and college completion). Is it possible to simultaneously model these outcomes also finding the correlations between the residuals of each outcome, accounting for selection into each population? Thanks again! 

bmuthen posted on Saturday, July 16, 2005  11:08 am



Yes, if you have a regression of 4 binary dependent variables on covariates, the residuals of the dependent variables can be correlated. The default Mplus estimator is limitedinformation weighted least squares (WLSMV) using probit regressions, so the residual correlations concern residuals for underlying continuous latent response variables that are multivariate normal conditional on the covariates. I believe the default setting is that the residuals are correlated (check in Tech1). 

Anonymous posted on Friday, July 22, 2005  5:12 pm



Dear Bengt and Linda, This is a followup to the Heckman modeling question. We have run a version of this model using aML with one selection equation (college attendance) and 2 binary outcomes observed only for the population attending college (junior college attendance and transfer to 4yr college). This appears to work. Our problem is that we have a third binary outcome still to be included, 4year college attendance, but aML complained about estimating a 4x4 nondiagonal covariance matrix so we are now looking for alternative software. I'm wondering how MPlus handles this model, in particular the selection, and if it handles it in a similar way as aML or even Stata. (Stata is capable of running a simplified model with 1 selection equation and 1 binary outcome observed for the selected population.) Do you know? Are there examples in MPlus documentation that discuss this type of model, how it is specified, and/or how it is estimated? Thanks so much, Laura 

bmuthen posted on Friday, July 22, 2005  5:58 pm



Ok, so you are aiming for the classic Heckman model. ML estimation of a Heckman selection modeling builds on probit/tobit modeling. I looked over the old MuthenJoreskog (1983) article in Evaluation review. Eqns 9  11 describe 2 regressions. One is a probit regression for the binary outcome u determining if an outcome y is observed or not (a missing data indicator in essence) and the other is a regression for the outcome y. The residuals of the two continuous latent response variables of the two regressions are correlated. You have missing data on y for one of the two u outcomes. I wonder if this could be tricked into a 2class mixture model in Mplus where the classes are the same as the observed missing data indicator (perfect measurement) and where one class does not have a y regression (coefficients zero) and y is scored as missing in that class. Mplus ML works with logit and not probit, but that may be ok  the residual correlation (which is zero by default with ML logit) could perhaps be orchestrated by using a factor that influences both outcomes. The y variable can be continuous or categorical and there can be several y's. Sounds like this might be possible but would take some thinking and doing. 

Anonymous posted on Tuesday, July 26, 2005  8:56 pm



Dear Bengt, Thanks for the response. I've been looking into various other packages, but without much success so far. We might end up back trying Mplus. I guess we could compare estimates with aMLresults to see if things are working ok. Thanks so much for your input. Laura 

bmuthen posted on Wednesday, July 27, 2005  6:40 pm



Let us know if you resurrect this analysis interest in Mplus. 


Dear Dr. Muthen, I am trying to estimate a Type II tobit. The only deviation from the standard model is that I have a survival regression in the second stage with right censoring. I have used simulation to estimate my parameters from the joint likelihood as of now using LIMDEP. Will mplus be able to estimate the same model? 


Isn't Type II Tobit (Maddala term?) where you estimate the censoring point as well? I don't think Mplus handles that. So when you talk about "stages", do you mean: stage 1: do you engage in the activity or not? stage 2: if you engage in the activity, when do you stop? If so, it sounds like a "2part (semicontinuous)" model, but where the second part is a survival model. Perhaps that is doable in Mplus (with fixed censoring point). 


Dr. Muthen, Yes, that is what I mean by stages. I have a fixed censoring point across all my observations. Can you give me any references of papers or direct me to any examples with Mplus code which can do this. Girish 


See the Mplus Version 4.1 User's Guide example 6.16 which also has a reference. 


Are you aware of any papers using Mplus to estimate a Heckman model where the selection model has a binary DV and the outcome model has a continuous DV? I've been trying to think of a way to do this in Mplus, thereby allowing for the inclusion of mediation after the outcome model. I thought perhaps using knownclasses (0,1) for the DV of the selection model and then only estimating the outcome model when knownclass = 1. But can't seem to estimate the effects for the outcome model separate from the selection model. Any thoughts would be greatly appreciated. Scott 


I have a setup for doing Heckman modeling using ML. I will send it to you. 


Fantastic. Thank you. 


Done. 


Dear Professor Muthen, Is it possible you could send me your Heckman setup in ML to me too? I would greatly appreciate it! Thank you, Tom Eagle 


It will be posted tomorrow under FAQs. It is set up as a Monte Carlo simulation, but the realdata specification is found in the MODEL command. 


Thank you! Tom 


Actually, it will be posted tomorrow, or possibly late tonight. 

Alice posted on Tuesday, September 05, 2017  4:36 am



Dear Professor Muthen, I just read the paper you refer to above as "the old MuthenJoreskog (1983) article." It was very interesting to me to see the discussion of the selection model with latent exogenous variables in section 6.2. In the model, the measurement model (equation 55) is assumed to have groupinvariant thresholds and factor loadings. I wonder if this is the standard/default assumption; i.e., the one one should make when estimating a selection model with exogenous latent variables? That seems to be a very crucial assumption. The section ends up with proposing "a simple ad hoc estimator" in which the factors scores are estimated in the first step and the structural model (the selection equation and the equation of interest) is estimated in the second step. I wonder if this is the way MPLUS also estimates a Heckman selection model with latent exogenous variables  or whether MPLUS estimates all three equations simulatenously? I would have thought the latter, but given the paper, I am now in doubt. 

Alice posted on Tuesday, September 05, 2017  4:38 am



To continue: Does the strategy suggested in the "simple ad hoc estimator" also hold for the Heckman twostage estimator (discussed in the paper following equation 13) with latent exogenous variables? What I mean is whether the estimation in this case should be carried out in three steps. Step 1: estimate the factor scores for the full sample and use these in the following steps. Step 2: estimate the selection equation for the full sample. Step 3: estimate the equation of interest on the selected sample. I would have thought that one should estimate this model in two steps. Step 1: estimate the selection equation and the latent variables in that equation simultaneously for the full sample. Step 2: estimate the equation of interest and the latent variables in that equation simulatenously. However, given the paper, I am now in doubt whether this is considered to be incorrect. This also goes back to my question above on the groupinvariant thresholds and loadings. 

Alice posted on Tuesday, September 05, 2017  4:42 am



To clarify: In the last paragraph, I mean "Step 2: estimate the equation of interest and the latent variables in that equation simultaneously for the sample selected." 


The intermediate factor score step is not needed in the current Mplus. It can estimate Heckman models using 1step ML also with a factor model for the exogenous part. How to do Heckman regression modeling in Mplus is discussed in our new RMA book  you can see the setup at this link: http://www.statmodel.com/Mplus_Book_Tables.shtml See the Table 7.5 run. 


Hello, I'm estimating change in a count outcome two years post baseline, adjusting for baseline values and covariates. Missingness in y appears NMAR. My plan is to jointly estimate a count model of the primary variable of interest and a logit model of attrition in y. I'm using selection modeling as described in approach 16 (p. 475476) described in the RMAM book. a) With a count outcome, are there additional issues I am not accounting for? b) Is the missing data model adjusted for the Poisson distribution of y when missing is regressed on y? c) Would including both y and y(t1) in the logit predicting missingness be overadjustment? Thanks, Nicholas ___ y  count outcome y(t1)  count outcome measured at baseline x  covariates missing  indicator of missingness in y ___ count is y; categorical is missing; ANALYSIS: estimator = mlr; integration=montecarlo; model: y ON y(t1) x; missing ON y y(t1) x; 


I think you need 2 DVs, not only 1 to benefit from missing data modeling. Perhaps y at the two time points. a) No, except see b) below. b) No, y_t1 could only be treated as a continuous or categorical predictor. c) ? 


Thank you for your reply. Following chap10 ex10.25_A6, the suggested model would look as follows. My research question doesn't include mediation, does excluding the MODEL INDIRECT statement negate the benefit of using a selection model in this scenario? I've read that a Heckman selection model is commonly used to address nonrandom missingness in a followup outcome measure. This type of model is described as a way to address censoring in the RMAM text, but could it also be adapted to my needs? model: y on y(t1) x; y(t1) on x; missing on y(t1); 


Let's assume that x and y_t1 have no missing data and that y_t does have missing data. For the model you specify, I think you will get the same estimates if you include missing on y_t1; or not. I say that because y_t1 is a missing data predictor and y_t1 is already included in the model due to your first 2 model statements. So you get FIML, that is, ML under MAR. It is just like RMA Fig 10.11 with m playing the role of y_t1. You can try and see if you get the same values in those 2 runs. If instead you change to missing on y_t; perhaps you would get NMAR selection modeling because y_t is partly latent due to missingness and therefore MAR does not hold. 

Back to top 