I will be appreciated if you can help me about clustering problem. I am triying to learn what is the best method to classify data while we are using logistic regression?
What i mean is, while we are using linear regression, it is simple to divide the data set into four groups to provide the equal data sets. 0-0,25 becomes the first group and 0,25 - 0,5 is the second etc.. But when we prefer logistic regression, the shape of the graphic obstructs this way. I don't know how to divide the data set and have the appropriate clusters. Can you help me about this ??
Ok.. I will try to explain my question with a simple example.
We have score points produced by a scorecard which uses a logistic regression model. For example, these are 20 of our data set and they are the scores produced by our model. Totally, there are 800 datas in our set from 0 to 1.
I get two associations from your message and perhaps you can look at the corresponding literature to see what fits your interests.
One is finite mixture logistic regression as described in the 1989 J Amer Stat Assoc article by Follman & Lambert. Here you find clusters based on subpopulations with different logistic regression intercepts. This can be done in Mplus.
Another is propensity score techniques. This uses the estimated probability from logistic regression to form strata in which groups of individuals (such as treatment and control) are compared. You will find many papers on this.