home *** CD-ROM | disk | FTP | other *** search
- Comments: Gated by NETNEWS@AUVM.AMERICAN.EDU
- Path: sparky!uunet!paladin.american.edu!auvm!GSF.DE!BENZ
- X-Mailer: ELM [version 2.3 PL11]
- Message-ID: <9301060926.AA24213@cony.gsf.de>
- Newsgroups: bit.listserv.stat-l
- Date: Wed, 6 Jan 1993 10:25:04 MET
- Sender: STATISTICAL CONSULTING <STAT-L@MCGILL1.BITNET>
- From: Joachim Benz <benz@GSF.DE>
- Subject: elimination of correlation (fwd)
- Lines: 99
-
- For a colleague of me (Anja Oetmann, email: oetmann@gsf.de) I forward
- this request:
- >---------------------------------------------------------------------
- > Our project aims to measure and to describe the present phenotypical
- > variability of german indigenous populations of a wide-spread grass
- > species (population in biological sense). We are interested in variation
- > within and between populations as well as in the total variability of each
- > character.
- >! The object is to minimize the number of sites that covers the
- >! over-all variance and further to preserve this variance through IN SITU
- >! conservation.
- > The problem is how to consider/eliminate the existing correlations
- > between several characters before starting mathematical analyses.
- >
- >Data:
- >=====
- >
- > * 100 sites, each divided in 3 subpopulations covering
- > spaced heterogeneity of the sites
- > * total plant number: 18 000
- > * 13 variables for each plant were observed:
- > 5 measures: 30 values/subpopulation
- > 1 measure: 60 values/subpopulation
- > 7 estimations (in german: Bonituren): 60 scores/subpopulation
- > * Data of 1 year and 1 experimental site (spaced plants)
- >
- >
- >STRATEGY TO SOLVE THE PROBLEM:
- >=============================
- > 1. Estimation of the univariate variability of one subpopulation
- > We define the term variability of a specific variable as follows:
- >
- > VB = [ VBL .... VBH ]
- > i,j,l i,j,l i,j,l
- >
- > with: VBL = MEAN -STDEV
- > i,j,l i,j,l i,j,l
- > VBH = MEAN +STDEV
- > i,j,l i,j,l i,j,l
- > (assumption: the distributions are symetric)
- > i = index of variables [1, ..., 13]
- > j = index of subpopulations [1,2,3]
- > l = index of sites [1, ..., 100]
- >
- > 2. Estimation of the univariate variability at one site
- >
- > VBS = [ VBMIN .... VBMAX ]
- > i,l i,l i,l
- >
- > with: VBMIN = MIN(VBL ,VBL ,VBL )
- > i,l i,1,l i,2,l i,3,l
- > VBMAX = MAX(VBH ,VBH ,VBH )
- > i,l i,1,l i,2,l i,3,l
- >
- > 3. Estimation of the univariate variability over all sites
- >
- > VBG = [ VBGMIN .... VBGMAX ]
- > i i i
- >
- > with: VBGMIN = MIN(VBMIN ,l=1,...,100)
- > i i,l
- > VBGMAX = MAX(VBMAX ,l=1,...,100)
- > i i,l
- >
- > 4. Selection of sites
- > Select a minimal number of sites, that the cover of the total
- > variabilties in all variables by the selected sites becomes a
- > maximum.
- >
- >But there is a serious problem we have no solution at the moment.
- >Between the variables the covariances are not 0. PCA we can't use as an
- >intermediate step to come to uncorrelated variables because the assumption
- >of one density function doesn't hold. On the other hand we assume if we
- > ---
- >carry out the optimization with the original variables we will get a too
- >strong influence of highly correlated variables in the results.
- >
- >Any suggestions and discussion are appreciated.
- >
- >
- >Thanks in advance ....
- >
-
- --
-
- Sincerely,
-
- Joachim Benz
-
- University of Kassel
- Faculty of Agriculture (FB 20)
- Nordbahnhofstr. 1a
- D-3430 Witzenhausen
- (FRG)
-
- Phone: (+49)-5542-503-560
- Fax: (+49)-5542-503-588
- email: benz@gsf.de
- C=de; A= ; P=gsf; S=benz (X.400)
-