home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: bit.listserv.spssx-l
- Path: sparky!uunet!pipex!ibmpcug!eff!news.oc.com!spssig.spss.com!nichols
- From: nichols@spss.com (David Nichols)
- Subject: Response to Sue Anderson on CLUSTER
- Message-ID: <Jul21.215331.37123@spss.com>
- Date: Tue, 21 Jul 1992 21:50:11 CUT
- Nntp-Posting-Host: spssrs2.spss.com
- Organization: SPSS Inc.
- Lines: 62
-
- Sue Anderson asks:
-
- I am using the CLUSTER procedure with a sample of 316 cases. In
- attempting to compare the results of using the WAVERAGE vs WARD
- methods, I noticed that the values of the coefficients printed in the
- agglomeration schedules vary quite a bit from one method to the other.
-
- For example with WAVERAGE, I get:
- Stage Cluster 1 Cluster 2 Coefficient
- 310 2 3 5.392510
- 311 1 13 5.749814
- 312 5 14 6.314435
- 313 2 5 7.502407
- 314 1 2 8.621726
- 315 1 6 9.946472
-
- Whereas, with WARD, I get:
- Stage Cluster 1 Cluster 2 Coefficient
- 310 3 19 639.756470
- 311 5 10 707.005676
- 312 1 14 780.497070
- 313 3 5 958.662842
- 314 1 2 1190.600586
- 315 1 3 1566.569336
-
- The SPSS-X Advanced Statistics Guide explains that the actual value of
- these coefficients depend on the clustering method and the distance
- measure used. I assume that means that it is NOT worthwhile to compare
- these values from one method to the next. Is that correct?
-
- Also, I find the size of the coefficients obtained from WARD to be
- alarming. The Advanced Guide explains that small coefficients indicate
- that fairly homogeneous clusters are being merged and large
- coefficients indicate that clusters containing dissimilar members are
- being combined. My question is... for each method/distance measure,
- what should be considered "small" and what is "large?"
-
- --------------------------------------------------------------------------
-
- It is a correct deduction that distance coefficients are generally not
- comparable across measures and methods. The same data will give different
- coefficients with different measures and may give very different results
- with the same measure but with different methods.
-
- The coefficients given for the Ward method differ from all of the other
- methods in that rather than being the distance at which two clusters
- were joined, they represent the within cluster sums of squares after
- joining the two clusters.
-
- I don't think any standard rules can be given for what is large and what
- is small. This depends on the scale and other characteristics of the data.
- The general guidelines for usage of these coefficients are that they can
- be used to look for breaks, or places where joining two clusters produces
- either a much larger within clusters sums of squares (for Ward's method)
- or where clusters that are relatively much farther apart than those last
- joined would have to be joined in order to continue.
-
- --
- David Nichols Statistical Support Specialist SPSS, Inc.
- Phone: (312) 329-3684 Internet: nichols@spss.com Fax: (312) 329-3657
- *******************************************************************************
- Any correlation between my views and those of SPSS is strictly due to chance.
-