

Zeroinflated gamma regression 

Message/Author 


Hi! I have in a first step done a LCA where a six cluster solution revealed the best fit of data. In the next step I want to analyze the relation between these clusters and sickness absence measured in days. Sickness absence has a preponderance of zero values and the distribution of values above zero is skewed to the right. I decided to use the data twostep in MPlus in order to do a zeroinflated gamma regression but the beta coefficients for my clusters make no sense in the logistic regression part, e.g. one cluster, according to data, have a larger proportion of zero’s compared to the reference cluster but MPlus gives the opposite. I have found a reference how a zeroinflated gamma regression is carried out in SAS, syntax is written by Dale McLerran and have been referenced by others. Applying that code in SAS gives beta coefficients that fit my data better. Specifically, the difference in beta coefficients from the logistic regression is merely a shift of + or – sign, the magnitude is exactly the same, for every cluster between MPlus and SAS whereas the beta coefficients from the gamma regression, reminded of each other. Thus, the magnitude of intercept and the beta coefficients for clusters respectively in MPlus were 113%, 67%, 66%, 71%, 64%, 57% compared to the ones obtained from SAS. Why are there differences in either magnitude or sign for beta coefficients between softwares? Magnus 


My MPlus syntax: TITLE: Distribution drivers DATA TWOPART: NAMES= sjldag; BINARY= bin1; CONTINUOUS =cont1; DATA: File is sjuklonedagar.txt; VARIABLE: Names are personnummer1 sjldag kluster; IdVariable is personnummer1; Usevariables are sjldag kluster1 kluster2 kluster3 kluster4 kluster5 bin1 cont1; CATEGORICAL is bin1; DEFINE: kluster1 = kluster ==1; kluster2 = kluster ==2; kluster3 = kluster ==3; kluster4 = kluster ==4; kluster5 = kluster ==5; ANALYSIS: Estimator = ML; STARTS = 400 100; LRTSTARTS = 0 0 200 40; MODEL: cont1 on kluster1 kluster2 kluster3 kluster4 kluster5; bin1 on kluster1 kluster2 kluster3 kluster4 kluster5; OUTPUT: sampstat; tech4; tech7; tech10; tech11; tech14; tech15; PLOT: type=plot2; SAVEDATA: File is item_prob_plot_sjuklonedagar; 


The SAS syntax: proc nlmixed data=sjldagar; parms b0_f=0 b1_f=0 b2_f=0 b3_f=0 b4_f=0 b5_f=0 b6_f=0 b0_h=0 b1_h=0 b2_h=0 b3_h=0 b4_h=0 b5_h=0 b6_h=0 log_theta=0; eta_f=b0_f+b1_f*kluster1+b2_f*kluster2+b3_f*kluster3+b4_f*kluster4+b5_f*kluster5+b6_f*kluster6; p_yEQ0=1/(1+exp(eta_f)); eta_h=b0_h+b1_h*kluster1+b2_h*kluster2+b3_h*kluster3+b4_h*kluster4+b5_h*kluster5+b6_h*kluster6; mu=exp(eta_h); theta=exp(log_theta); r=mu/theta; if sjldag=0 then ll=log(p_yEQ0); else ll=log(1p_yEQ0)lgamma(theta)+(theta1)*log(sjldag)theta*log(r)sjldag/r; model sjldag ~ general(ll); predict (1p_yEQ0)*mu out=expect_zig; predict r out=shape; estimate "scale" theta; run; 


Note that 2part modeling is not the same as zeroinflated modeling; the latter is a mixture model and the former is not. So the results should not be expected to be the same, only similar. This is described in chapter 7 of our new book http://www.statmodel.com/Mplus_Book.shtml Note also that the binary part of 2part modeling in Mplus describes the probability of not being at zero. Zeroinflated modeling typically describes the probability of being in the zero class. The latter is the case for example using the censoredinflated model in Mplus. Also, we request that postings are limited to one window. Longer questions should be sent to Mplus Support. 

Back to top 

