

Regarding skeness in my sample 

Message/Author 


hi, i have a 2 level multilevel model i am trying to estimate. i have 46 units that provide a total of 3200 observations of y and x respectively. however, the problem is that 45 contribute a total of 400 and the remaning 2800 observations come form one unit, thereby making it a 2 level model with a very skewed distribution.i don;t want to throw out the last unit since it contributes 2800 data points, but at the same time, i don;t want to introduce severe skewness in my sample, so i employed the following procedure a) take the 400 from the 45 units and add 100 more drawn randomly from the 2800 observations that the 46th provides. b) get the estimates from the 2 level model of the effect of the x vector on the dv y with a sample size of 400+100=500 c) repeat ab 2000 times d) obtain the mean value of the estimate and the mean vaue of the standard error. divide both for the mean tvalue i know this is somewhat like bootstrapping, but at the same time, it is different becuase my sample is drawn without replacement (at least a large part of it). i have 2 questions does this seem like a reasonable approach? what would you do if you were me? what would be a good way to infer whether the estimates are statistically significant? right now, i am using the mean value of the estimate and the mean value of the std err (from the repititions) to get the mean t values thanks! hari 


The approach sounds reasonable but I would use the IMPUTATION option of the DATA command for the analysis to get correct standard errors. 


I will definitely do the same, but just out of curiosity, what would the imputation option be correcting for? thanks! hari 


Instead of only using the average SE, you add the betweenimputation parameter estimate variation. So srt of within + between imputation variance in line with the literature on mult imp. 

Back to top 

