Mann-Whitney U-test
This non-parametric test, as with many others, examines sample data only after it has been transformed into ranked data. It also makes no estimation or hypothesis about population parameters (i.e., the mean, etc.) and does not require the raw data to be derived from a normal distribution. This test is based on detection of statistically significant differences in two independent samples.
The alternative parametric equivalent of this test is the ordinary t-test of two independent sample means.
Script operation
This tool operates in much the same way as most of the others with no specific departures from the usual methods needed.
Click here for information about general script usage.
Raw sample data is entered in two columns as shown. Note that in this example summary statistics (i.e., the count (or 'n'), the sorted data and the ranked sorted data) are computed for the two columns and these have been labelled in the output under 'Morning' and 'Afternoon'. If sample data columns contain titles this can be reproduced in the output by including these in the input data range.
Note in the example that all sample data is output to the spreadsheet after being both sorted on the basis of size and corresponding rank.
Raw data: Spreadsheet output: Morning Afternoon Mann-Whitney U Test 66 53 Sorted Data 70 81 Morning Afternoon 73 83 66 53 75 84 70 81 75 84 73 83 79 84 75 84 82 85 75 84 87 86 79 84 95 88 82 85 90 87 86 91 95 88 92 90 91 92 Table of Ranks for Sorted Data 2 1 3 8 4 10 5.5 12 5.5 12 7 12 9 14 16 15 21 17 18 19 20 U(Sample 1): 80 U(Sample 2): 28 No. of Cases: 21 n(Sample 1): 9 n(Sample 2): 12 Normal Approximation: Mean: 54 Variance: 198 St.Dev.: 14.0712 z: -1.8477 P(z to left tail): 0.03232 P(z to mean): 0.46768 P(z to right tail): 0.96768 P Density Function: 0.0724 Z-Critical One-tail(95%): 1.65 Z-Critical One-tail(99%): 2.33 Z-Critical Two-tail(95%): 1.96 Z-Critical Two-tail(99%): 2.58
Interpretation
There are two ways in which the output from the script may be interpretated for the purpose of drawing conclusions. These are based on the two sample sizes used.
If the size of the smaller sample is <20 and the larger sample size is <40 then the distribution of U can be used in conjunction with the U statistics calculated, in order to determine the statistical significance of the result. If the sample sizes do not fall within these categories and are subsequently larger, then the normal approximation method may be used. When larger sample sizes are available the distribution of U approaches the normal distribution. In effect this then means that a value of z may be calculated and compared with critical values of the t-distribution, with degrees of freedom of infinity, which is identical to the normal distribution.
In the example above, the following two-tailed null hypothesis may be proposed:
HO: Morning and afternoon figures are the same, or have no significant difference.
The first method: using the U distribution
Access is required to a table of critical values of the Mann-Whitney U distribution (widely available in most text books) in order to ascertain whether a statistically significant result has been obtained. Using such a table, the larger of the two U statistics determined by the analysis tool is compared with the critical value at a given level of significance (i.e., 0.05 or 0.01 etc.). In order to do this, the table must be entered using the n(Sample 1) and n(Sample 2) values provided in the output. There are two ways in which you may proceed:
If n(Sample 1) < n(Sample 2) then the table should be entered using the value of n(Sample 1) first and then followed by the value of n(Sample 2).
If n(Sample 1) > n(Sample 2) then the table should be entered using the value of n(Sample 2) first, to be followed by the value of n(Sample 1).
In the example output above it can be seen that the former method should be employed as 9 < 12 and that the critical value derived from a table for U(0.05)(2-tail)(9,12) is 82.
As the U-critical value (82) is not exceeded by the largest calculated value of U (80) then the null hypothesis should be retained at the 0.05 level of significance.
The second method: using the normal approximation
As the t-distribution (with v=infinity) is identical to the normal distribution, the z-critical value is equal to the t-critical value. As the value of z is computed (i.e., -1.8477) it can then be compared with the two-tailed t-critical value at a given level of significance in order to determine the validity of the null hypothesis.
At the 0.05 level of significance the t-critical value with v=infinity is 1.96. As this is not exceeded by the computed value of z this also indicates that the null hypothesis should be retained.