How to classify by logistic regression
Message/Author
 evrim gunduz posted on Thursday, June 01, 2006 - 7:56 am
I will be appreciated if you can help me about clustering problem. I am triying to learn what is the best method to classify data while we are using logistic regression?

What i mean is, while we are using linear regression, it is simple to divide the data set into four groups to provide the equal data sets. 0-0,25 becomes the first group and 0,25 - 0,5 is the second etc.. But when we prefer logistic regression, the shape of the graphic obstructs this way. I don't know how to divide the data set and have the appropriate clusters. Can you help me about this ??
 Linda K. Muthen posted on Friday, June 02, 2006 - 8:38 am
I'm afraid I don't understand the division you are referring to for linear regression. Can you please explain?
 evrim gunduz posted on Tuesday, June 06, 2006 - 12:15 am
Ok.. I will try to explain my question with a simple example.

We have score points produced by a scorecard which uses a logistic regression model. For example, these are 20 of our data set and they are the scores produced by our model. Totally, there are 800 datas in our set from 0 to 1.

0,750634658
0,769944881
0,782449776
0,795922176
0,80486768
0,80736832
0,814119021
0,815928131
0,836169639
0,845795766
0,863892109
0,872138434
0,886250401
0,893309406
0,899257349
0,923012521
0,927975567
0,940082555
0,944011921

What i want to do is, to cluster these recors in groups. For example, i want to say
group 1 from 0,750634658 to 0,886250401.
group 2 from 0,893309406 to 0,944011921 etc..

I cannot use the K-means cluster methods cause i don't have enough variables. I have just the scores..

Is it now more clear?
 Bengt O. Muthen posted on Tuesday, June 06, 2006 - 10:05 am
I get two associations from your message and perhaps you can look at the corresponding literature to see what fits your interests.

One is finite mixture logistic regression as described in the 1989 J Amer Stat Assoc article by Follman & Lambert. Here you find clusters based on subpopulations with different logistic regression intercepts. This can be done in Mplus.

Another is propensity score techniques. This uses the estimated probability from logistic regression to form strata in which groups of individuals (such as treatment and control) are compared. You will find many papers on this.