home *** CD-ROM | disk | FTP | other *** search
- 3 Sample Pages from Langley's "UNDERSTANDING EASY-STATS".
- Best print 12 characters/inch (Elite) with 1 inch left margin.
- -------------------------------------------------------------------------------
- ANOVA 1-Way
- But how could "small" or "big" be assessed objectively? Fisher saw that since
- the variance of means is defined by Var(XBAR) = Var(X) / n, cross-multiplying
- Var(XBAR) by n will give another statistic which will be an UNBIASED ESTIMATOR
- OF THE POPULATION VARIANCE. He called this the BETWEEN-SAMPLES MEAN SQUARE ---
-
- MS_between = SS_between / df_between
-
- where SS_between = n * Σ (XBAR_i - XDOUBLEBAR)²,
- & df_between = g - 1.
-
-
- F TEST
- ------
- It is important to realize that Between and Within MS's are INDEPENDENT of one
- another. The spread between means needn't affect the spread within samples,
- and vice versa. This fact permitted the development of the F TEST to compare
- these 2 estimates of some σ² (common to the populations from which the sample
- have been drawn), using the famous formula ---
-
- F(df_betw, df_within) = MS_betw / MS_within
-
- which tests ---
- H0: σ²_between = σ²_within
- H1: σ²_between > σ²_within (1-tail)
-
- Note (1): These hypotheses are exactly equivalent to those expressed in terms
- of µ_i, at the top of this section.
- Note (2): This is a 1-tail test because σ²_between must be abnormally large if
- there is a real difference between the population means. σ²_between
- can only become very small if the means are very close together.
- Note (3): F Tables for interpreting the variance ratio values only show the
- right hand tail (F values > 1), since ANOVA is their main use.
- [See VARIANCE RATIO TEST for their use as 2-tailed tests.]
- Note (4): Big values of F are produced by much spread between sample means,
- and will reject H0.
- Note (5): Values of F < 1 occur if the sample means are closer together than
- expected with random sampling. This will happen sometimes by chance
- when H0 is true. But don't guess. See the "F TABLES" notes herein
- for how to find its probability --- if P > 5% accept H0, otherwise
- suspect some violation of assumptions such as non-random sampling or
- unequal population variances. [Ref: Bennett & Franklin 7.25]
- Note (6): See ANOVA ASSUMPTIONS, which tells when you can trust this F Test.
-
-
- TOTAL MEAN SQUARE
- -----------------
- A third variance can also be computed from multiple samples, namely, the TOTAL
- MEAN SQUARE. This is a measure of the spread of all the sampled measurements
- around their grand mean ---
-
- MS_total = SS_total / df_total
-
- where SS_total = Σ (X - XDOUBLEBAR)²,
- & df_total = N - 1,
- & N = Σn_i = total number of measurements in all g samples.
-
- All these SS's, df's, and MS's are displayed in an ANOVA TABLE, together with
- the F Test. The MS_total is not independent of the other two MS's, so isn't
- used for testing these hypotheses.
- -------------------------------------------------------------------------------
-
-
- UNDERSTANDING EASY-STATS CORRELATION, Grouped Data
-
- A scattergram of these figures would be like this ---
-
- Aggression Score 50+
- Y | o
- 40+ o
- | o
- 30+ 2
- |
- 20+ o
- | o
- 10+-------+-------+-------+-------+
- 0 1 2 3 4
- Birth Order, X
-
- To get Pearson's r, you could enter these pairs into CORRELATIONS (VARIOUS),
- but it will be quicker, and you'll get a LINEARITY TEST of the relationship,
- if you use our REGROUP program to regroup the pairs by the X-variable, then
- look on X as a sample ID, and enter the Y-values into the 1-WAY ANOVA program,
- thus --- Sample # Scores
- 1 20 15
- 2 39 28 29
- 4 46 37
-
- Choose a Weighted Means Analysis, and when asked ---
- "Are the levels of Factor `A' Quantitative?" - answer Y for yes, then
- "Enter `E' if Equally spaced, otherwise enter their 3 values in free format:"
- - enter 1 2 4 to suit the present case.
-
- For a more detailed analysis of relationships with repeated X's, use our
- REGRESSION program.
-
-
-
- CORRELATION, PARTIAL
- ----------------------
-
- "Partialling" was introduced by Yule (1897) to correct an observed correlation
- between 2 variables for the disturbing influence of other variables (which are
- then said to be "partialled out" of the main correlation).
-
- E.g. the correlation between reading and writing computed from a random sample
- of children of various ages could be wrong because the relationship may depend
- in part on age. Instead of restricting the sample to children of the same age
- ("experimental control"), we can statistically "partial out" the effect of age
- on the reading and writing scores. This could be done by using each child's
- reading and writing DEVIATE SCORE from the mean of his/her age group. The
- unadulterated correlation could then be obtained by correlating these deviate
- scores, from which the influence of age has been purged. In practice,
- alternative formulations, based on the correlations between all possible pairs
- of variables, are used. The net result will be AS THOUGH the childrens' ages
- had been constant in the sample.
- -------------------------------------------------------------------------------
-
-
- UNDERSTANDING EASY-STATS NON-PARAMETRIC TESTS
-
- It must be stressed that parametric tests (e.g. Student's t) have been
- formulated to apply to random samples from populations with certain
- characteristics (e.g. Normal Distribution). You must not expect them to give
- true answers if applied to data from populations which don't conform to such
- specifications (e.g. if the population is Lognormal when a test assumes a
- symmetrical distribution). Don't take this too lightly --- it is my
- experience that about 50% of biological measurements are Lognormal.
-
- Non-parametric tests are generally safe to use when analysing measurements and
- you're not sure about their scale &/or population features. Accordingly, they
- have much to recommend for novices. But let's face it, if the assumptions for
- a parametric test are met, the use of a parametric test will usually give a
- somewhat stronger test (i.e. smaller P-values) than a non-parametric
- alternative. And furthermore, the mathematical restrictions of ranks and
- counts is why they cannot be used for sophisticated analyses like ANCOVA or
- multiple regression. [Ref: Bradley Chap 2]
-
-
-
- NORMALITY TESTS
- -----------------
-
- The EASY-STATS Descriptive Statistics provides the following tests to
- assess whether your sample measurements are likely to derive from a Normally
- Distributed population or not ---
- HISTOGRAM of Z SCORES.
- THOMPSON & GRUBBS' TEST [see OUTLIERS].
- SKEWNESS COEFFICIENT & KURTOSIS COEFFICIENT.
- RANGE/SD RATIO [see OUTLIERS].
- Other programs in this package also use these tests when appropriate.
-
-
-
- ODDS RATIO
- ------------
- See ASSOCIATION, STRENGTH OF.
-
-
-
- OUTLIERS
- ----------
-
- Outliers are measurements which differ considerably from the rest of the
- values in your sample. Outliers may be extreme-but-valid members of the
- parent population (in which case discarding them would bias results), or they
- may be truly illegal values (in which case results will be biased unless you
- do discard them).
-
- If the smallest or largest value in the sample can be traced to a clerical or
- instrumental error, discard it and re-test the remaining values. If the
- parent population is expected to have a Normal Distribution, outliers should
- be detected by any NORMALITY TEST (e.g. below), though these tests vary in the
- features to which they are most sensitive.
-
- However, if you are unsure about the distribution the parent population, you
- should analyse the data WITH and WITHOUT the suspect outlier (and discard the
- whole sample and start afresh if the outcomes differ importantly --- you
- mustn't trust a conclusion hanging on 1 suspicious value). [Ref: Kruskal 1960]