Phil Wood posted on Wednesday, August 03, 2011 - 8:37 am
This is a simple syntax question. I have a data set with five variables, continuously measured. Based on leverage statistics, I want to delete a particular observation, say #335. Writing a useobservations statement is cumbersome, given that the variables are all continuously measured. Is there any way to easily delete a given observation in the file without listing it's particular values? This would make it easier to use the influence plots and rerun the analysis without influential observations. If it's not current syntax, it might be a useful thing to add. Just a thought. thanks!
If you have an id variable, you can delete using the number the person has on the id variable.
Phil Wood posted on Wednesday, August 03, 2011 - 11:32 am
That's true, but in this case, it's just the variables. I think the best answer at this point is to savedata the influence diagnostics and then useobservations based on that, including just usevariables for only the analysis variables. it's a little cumbersome for large datasets. That's why I suggested that maybe one could consider a reserved variable for observation number (analogous to the _N_ variable in SAS) which would index the observation number. Thanks.
It might be worthwhile to add an id variable to your data set. Then if you identify it using the IDVARIABLE option, you will see the ID number when you hold the cursor on the outlier in the plot and you can easily exclude the observations.
Phil Wood posted on Thursday, October 20, 2016 - 10:06 am
OK, so I tried the following approach: I defined an ID variable, scrid: idvariable is scrid; and then added the following: define: if (scrid eq 2758.0) then delete; I verified that this number appears in the file (it's actually 2758.00000000), but I'm dismayed to see that this observation still appears in the influence plots for the analysis. I tried moving it before and after the model statement, but nothing seems to affect it. Any ideas?