home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.programming
- Path: sparky!uunet!gatech!rpi!uwm.edu!csd4.csd.uwm.edu!markh
- From: markh@csd4.csd.uwm.edu (Mark)
- Subject: Re: clustering algorithm wanted
- Message-ID: <1992Aug27.164656.10905@uwm.edu>
- Sender: news@uwm.edu (USENET News System)
- Organization: Computing Services Division, University of Wisconsin - Milwaukee
- References: <1992Aug27.103332.16604@nuscc.nus.sg>
- Date: Thu, 27 Aug 1992 16:46:56 GMT
- Lines: 25
-
- In article <1992Aug27.103332.16604@nuscc.nus.sg> tmp16@nuscc.nus.sg (temporary account for courses) writes:
- >I am working on some numerical data and I wish to obtain an algorithm
- >for clustering these data into discrete groups ( i.e a pattern ). I've
- >considered using multivariate analysis but can't get a workable method.
- >So if anyone knows of such an algorithm or has a reference, I would
- >appreciate it if one could post it to me.
-
- Well, here's an idea. Suppose you want to cluster the objects into a pre-
- determined number of clusters (N). For a given clustering, define the
- cluster error to be the sum of the square deviations of each item from it's
- cluster's average. The task is to minimize this -- which defines the optimum
- clustering.
-
- So start with all objects in one cluster, and with N - 1 empty clusters. At
- each step choose a random object and random cluster. Move the object to this
- cluster if and only if the error value mentioned above decreases as a result.
-
- Stop when the error value become stationary long enough. The result should be
- a good approximation of the optimal clustering.
-
- Do obvious things to minimize the computation, such as maintaining cluster
- sums and sum-of-squares for each cluster. Then when an object is moved,
- you only need to update 4 items and you only need to perform a test to see if
- the total square deviation of those two clusters decreases, instead of
- computing the square deviation for all the clusters.
-