ANOVA - Two Factor with Replication


Two way analysis of variance with equal replication (ANOVA)

The analysis of variance (often termed ANOVA or AOV) is a technique used to test multi-sample hypotheses whereby a variable (the mean) is measured from three or more samples.

In cases where multi-samples are to be tested, series of two-sample tests (i.e., t-tests, etc.) should not be employed. Aside from the fact that such an approach would be very time consuming, it may also be statistically invalid. For example, testing at the 0.05 level of significance, if three sample means are compared two at a time using the two-sample t-test, then there is a 13% chance of committing a Type I error (i.e., one in which the null hypothesis is rejected when it is actually true). As the number of samples increase the chance becomes greater (i.e., 63% when comparing ten sample means and 92% when comparing twenty sample means).

The two-factor (or two-way) analysis of variance is a means of conducting simultaneous analysis of the effect of two factors on a population variable. For example, an investigation may need to determine whether fertilizer and temperature differences result in different heights of a plant. In this case the first factor is fertilizer application, the second factor is ambient temperature and the variable is plant height. As an example, in an experimental situation the application or non-application of fertilizer and the temperature under which different samples of plants are grown are known as levels of the factor.

The advantage of factorial analysis of variance (i.e., analysis of the effect of more than one factor on population means) is that it is more economical. Use of this form of analysis makes it unnecessary to conduct a one-way analysis of variance for each factor in an investigation. It also enables testing for interactions between factors.

In an ANOVA test with equal replication there are equal numbers of measurements of the variable in question in each factorial level. This parametric analysis of variance technique makes certain assumptions about the sample data used. Sample data used should follow a normal distribution and exhibit homogeneity of variances.

Script operation

This tool operates in a slightly different way to others. Enter sample data in the way shown below. After entering the data into the input requestor, a further requestor will demand the number of rows in each sample. In other words the script demands the amount of replicate results. In the example data below this would be 5.

Click here for information about general script usage.

 Raw data: 	  Arexx output:

 Hormone treatmnt:   ANOVA - Two Way: With Replication
     FEMALE  MALE
 NO    16.5  14.5    Summary Statistics
       18.4    11
       12.7  10.8       	  FEMALE     MALE     Total
	 14  14.3
       12.8    10    NO
 YES   39.1    32    Count:            5	 5        10
       26.2  23.8    Sum:           74.4      60.6       135
       21.3  28.8    Mean:         14.88     12.12      13.5
       35.8    25    Variance:     6.217     4.477    6.8689
       40.2  29.3

		     YES
		     Count:            5	 5        10
		     Sum:          162.6     138.9     301.5
		     Mean:         32.52     27.78     30.15
		     Variance:    69.717    11.182   42.1961


		     Col. Total

		     Count:           10	10
		     Sum:            237     199.5
		     Mean:          23.7     19.95
		     Variance:  120.1844   75.0806

		     ANOVA

		     Source of        Sum of  Degrees of   Variance  F Ratio
		     Variation       Squares     Freedom   Estimate
		     Sample:       1386.1125           1  1386.1125  60.5336
		     Column:         70.3125           1    70.3125   3.0706
		     Interaction:     4.9005           1     4.9005    0.214
		     Within Cells:   366.372          16    22.8983
		     Total:        1827.6976          19

		     P(F Sample <=f) one-tail:  	   0.000001
		     F-Critical (95%):  		      4.494
		     F-Critical (99%):  		     8.5309
		     P(F Column <=f) one-tail:  	   0.098852
		     F-Critical (95%):  		      4.494
		     F-Critical (99%):  		     8.5309
		     P(F Interaction <=f) one-tail:         0.64987
		     F-Critical (95%):  		      4.494
		     F-Critical (99%):  		     8.5309

Interpretation

In the example detailed above the variable being examined may be considered to be blood calcium concentration and the the two factors thought to be influencing this are sex and an unspecified hormone treatment. Note that both factors have two levels (i.e., male/female and treatment/no treatment) and that there are 5 replicate results (i.e., blood calcium concentration in each factorial level of 5 animals).

In this example case the null hypotheses may take the following form:

  1. Hormone treatment has no effect on the mean blood calcium concentration.
  2. There is no difference in the mean blood calcium concentration between males and females.
  3. There is no influence of both hormone treatment and sex on the mean blood calcium concentration.

The output produced by the script has two components. In the first block above the title 'ANOVA' summary statistics are calculated for each factorial level and these are then summed across the rows and down the columns. The second block contains the ANOVA sources of variation.

The 'Sample'(representing the hormone factor) sum of squares (or SS), the 'Column' (representing the sex factor) SS, and the 'Interaction' (representing the combined factors) SS are used in conjunction with their respective degrees of freedom to produce three variance estimates known as MS (an abbreviation of mean squared deviations from the mean). There is also a further MS value calculated known as the 'Within Cells' MS, or error MS.

If the means of the factorial levels are not equal, and the null hypotheses must be rejected, then each MS value will be greater than the within cells MS. This can be tested by a simple one-tailed variance ratio test:

Hypothesis 1: F = Sample MS / Within cells MS

In this case:

F = 1386.1125 / 22.8983 = 60.5336

In order to test the significance of the calculated values of F, the probability of obtaining these statistics and the critical values at the 0.05 and 0.01 levels of significance are provided in the output.

In the case of hypothesis No.1, the critical value at the 0.05 level of significance may be seen to be 4.494 and this is exceeded by the F value of 60.5336. In other words, the null hypothesis must be rejected and it can be stated that there is a statistically significant difference between blood calcium concentration dependent on hormone treatment.

If this process is continued for the further remaining two hypotheses it can be seen that both are retained at the 0.05 level of significance and the overall conclusion would be that hormone treatment has a highly significant effect on blood calcium concentration but that there is no significantly different blood calcium concentration between males and females. Furthermore, in relation to hypothesis No.3, there is no significant interaction between hormone treatment and sex - the effect of the hormone treatment in terms of blood calcium concentration is no different in males and females.



Back to Main Document