

Sample selection, censoring and trunc... 

Message/Author 


I have been looking at Examples 3.2 and 3.2 in the MPLUS Version 3 manual in relation to two analysis issues in a large complex sample (crosssectional). 1) Analysis of a measure of hazardous use of alcohol in the past 12 months (AUDIT). Those who did not drink in the last twelve months are assigned a zero score. I think that this requires a censorinflated regression (Example 3.2) as they could not have experienced problems if they did not drink so y=0. 2) Analysis of either alcohol dependence or alcohol dependence symptom counts (lifetime). Not everyone was asked these questions, based on consumption and other questions. Consequently it cannot be assumed that there was no dependence, so when y is missing it cannot be set to 0. Reading around I think that this is a sample selection problem, not a censoring or truncation problem. As such I think that MPLUS does not have a way of analysing this. I would be very grateful for comments. Thank you. Elisabeth Wells 

bmuthen posted on Saturday, October 08, 2005  2:27 pm



1) Both a censored and a censoredinflated analysis could be considered here since both acknowledge the y=0 situation. There is a large literature on modeling with zeros, particularly in the health literature. 2) If consumption and other variables affect the symptom questions being asked, you might consider the missingness of y as a function of these variables and therefore fall into the "MAR" case of ML estimation. This would imply that those who weren't asked are included with y = 999 in a Type = Missing analysis. 


Does the new Mplus 5.1 have any other means to cope with sample selection in the case of count data? One of my variables is visits to the doctor, and the other is how many of this visits are workrelated (asked only if the first is not 0). By the way, both variables are also right censored (0, 1, 2, 3, 4 or more), with over 40% in the 0 count, is there any way to cope with this type of censoring? Many thanks in advance, Fernando. 


I forgot to add that my sample size is 5236 and count frequencies are:
count  v1  v2  0  40,2  81,8  1  21,1  8,7  2  16,6  3,2  3  7,8  1,2  4 or more  13,7  2,7  missing  0,6  2,6  



Seems like you could formulate your dependent variable as "How many of your visits to the doctor (in the last x months) are workrelated?" And then you could use a suitable count model  see the Mplus Web Talk on count modeling in 5.1 (see home page). 


So, then only positive values of v1 must be considered in v2? The 0 frequency of v2 diminishes to a 42.9% but with 41.4% of missing values (which could be modeled). 


No, I was thinking you would include the zeros  not all visits are work related. This way you could also have a separate prediction of zeros and non zeros (see my web talk). 

Back to top 

