Fisher Exact Test for 2x2 contingency table data
In experimental situations where there is simultaneous collection of data for two variables with two categories, a hypothesis may be formulated to detect independence of the frequencies of occurrence in the categories of one variable with the other. When the raw data is set out in a contingency table the table itself is denoted by rows(r) x columns(c) to form 'cells' that compose the table. Note that a contingency table may have a configuration other than 2x2, although this is not currently supported by this test.
The usual method of analyzing contingency table data is to employ a chi-square (x²) test to detect any significant difference between observed and expected data frequencies. In order to do this, a null hypothesis must be constructed of the form 'frequencies of observed row values are independent of the frequencies of observed column values', or vice versa. The use of chi-square analysis to test such a hypothesis is found elsewhere:
Related tool:
Chi-square test for independence of sample data.
The Fisher Exact Test is sometimes referred to as the Fisher-Irwin test or the Fisher-Yates test. In this test the output takes the form of direct binomial probabilities of the observed frequencies being derived at random from the sum of row and column data. In effect, the calculations involve determination of the probability of the whole contingency table occurring by random chance. This probability is calculated once for the observed data table and then for corresponding tables of the next extreme data data values. Any one-tail or two-tail hypothesis is tested for significance by analysis of the total probabilities of all the tables being considered.
The following is the probability formula used by this test:
Script operation
This tool operates in much the same way as most of the others with only one departure from the usual methods needed (i.e., row and column titles for the 2x2 contingency table should not be included in the data input range).
Click here for information about general script usage.
Raw sample data is entered as a 2x2 contingency table so that the observed data values, or frequencies, for each variable are present. See the example raw data input range below for further details:
Raw data (2x2 cont. table): Spreadsheet output: Beetles Snails Fisher Exact Test for Independence Upper leaf 12 7 Lower leaf 2 8 Prob. for Obs. Freq.: 0.0292 Prob. Extreme Freq. (RH Tail): 0.0015 Total Prob. (RH Tail): 0.0308 Prob. Extreme Freq. (LH Tail): 0.0036 Total Prob. (LH Tail): 0.0329 Total of Observations: 29
Interpretation
In the example data input the number of beetles and snails were obtained from the upper and lower surfaces of leaves. For one reason or another we may suspect that the proportions of animals may differ on either side of the leaf and we may set up appropriate null hypotheses.
For example the following one-tail (i.e., directional) null hypothesis may be proposed (amongst others):
HO: The proportion of beetles is less than, or equal to, the proportion of snails on the upper surface of the leaf.
Note that it is possible to test a two-tail null hypothesis using the Fisher Exact Test but that this will not be further mentioned here and the spreadsheet output itself provides no direct test results for this.
Although a two-tail test may be useful when small sample sizes or expected values are encountered, this form of hypothesis test may be tested using the usual chi-square test for independence where the independence of row and column data is determined.
In the case of HO above and with reference to the spreadsheet output it can be seen that the analysis tool first determines the probability of obtaining the observed values by random chance (P=0.0292). The nature of the null hypothesis means that we are then interested in the probabilities associated with the left-hand distribution tail. The total probability for contingency table data more extreme than the observed data is computed to be P=0.0036. This is then added to the probability value previously calculated to give a total probability (P=0.0329) of obtaining all the contingency table data by chance if HO is true. As P=0.0329 is less than 0.05 (if testing at this level of significance) then the null hypothesis (HO) should be rejected.