home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!decwrl!purdue!mentor.cc.purdue.edu!rain!gpetty
- From: gpetty@rain.atms.purdue.edu (Grant W. Petty)
- Newsgroups: sci.math.stat
- Subject: Analyzing highly non-Gaussian, n-variate data
- Message-ID: <56884@mentor.cc.purdue.edu>
- Date: 14 Aug 92 23:41:27 GMT
- Sender: news@mentor.cc.purdue.edu
- Organization: Earth & Atmospheric Sciences, Purdue University
- Lines: 80
-
- I'm not sure if the subject line makes sense, but here are two
- questions which have been plaguing me (a non-statistician) for
- a long time:
-
- 1. You have a large set of measurements consisting of N-dimensional
- vectors, where N > 3 (in my case, N=7). Elements in the vectors are
- correlated, albeit in a nonlinear, non-Gaussian fashion. You have no
- a priori knowledge of the precise functional form of the physical
- relationship between the elements, though you are certain one exists.
- Under these circumstances, what can you do to
-
- a) determine the effective dimensionality of the data (i.e., the
- minimum number of independent parameters which are capable of
- explaining most of the "volume" of the cloud of points in
- N-dimensional space?
-
- b) determine a segmented curve, surface, or hypersurface (depending on
- how many parameters you choose to specify) which passes "optimally"
- through the cloud of points?
-
- If the data exhibited something like a multivariate Gaussian pdf, then
- it would make sense to just compute the eigenvalues/eigenvectors of
- the N x N covariance matrix; the effective dimensionality would then
- just be the number of eigenvectors which are required to explained the
- bulk of the total variance. However, this approach gives meaningless
- results if, say, your data all fall exactly on a single wildly
- contorted curve in N-space: the true dimensionality in this case would
- be only one, but PCA looks for something like the principal axes of an
- ellipsoidal volume containing the points and therefore would find
- several significant basis vectors.
-
- If you could somehow calculate the effective N-D "volume" of the cloud
- of points for a subset of the elements of your ensemble, and see how
- that volume changes as you increase the number of variables
- considered, it seems to me that that could give you a good clue. For
- example, if the cloud of points was truly one-dimensional (in some
- unknown non-linear transformation of your coordinate system), then the
- cloud of points should, in some sense, occupy a very small volume
- which is almost independent of the number of dimensions of the
- subspace onto which you are projecting the cloud of point. That is, a
- projection of the points onto a 2-D surface would follow a simple 2-D
- curve with zero volume; a projection of the points onto a 3-D subspace
- would also yield a curve with zero volume, etc. Whereas if the
- data were intrinsically 2-D, then projection onto a 2-D surface would
- yield a 2-D cloud with finite area; projection onto a 3-D space would
- yield a surface with finite area but zero volume, and so on for higher
- dimensional subspaces.
-
- Does this make sense? And if so, do algorithms exist for looking into
- this behavior, given a sufficiently large set of multivariate data?
-
-
- 2. The second question is related: Given two N-dimensional clouds of
- points of the type described above, can one quantify the "overlap"
- between the volumes occupied by the two clouds and thus say something
- about whether the populations from which the points were taken appear
- to occupy the same or different regions in N-dimensional space?
- Obviously, one could compute multi-dimensional histograms and then
- compare the number of boxes which contain points from one, both, or
- neither of the clouds, but 7-dimensional histograms can get pretty
- bulky for any reasonable box size.
-
- Are there any applications-oriented (rather than theoretical)
- textbooks which address these sorts of issues?
-
- E-mail replies welcome
-
-
-
-
-
-
-
-
-
- --
- Grant W. Petty gpetty@rain.atms.purdue.edu
- Assistant Prof. of Atmospheric Science (317) 494-2544
- Dept. of Earth & Atmospheric Sciences "All standard disclaimers apply"
- Purdue University, West Lafayette IN 47907-1397
-