Spearman's rank correlation coefficient
Investigation of the degree of correlation between two variables in effect determines the strength of association between them. There are two commonly used correlation techniques. Spearman's rank correlation coefficient is calculated for data variables that do not exhibit a normal distribution, and is therefore a non-parametric form of analysis.
Related tools:
The parametric equivalent to this test: Pearson product-moment correlation test.
A further non-parametric equivalent: Kendall's rank correlation.
The Pearson product-moment correlation coefficient is more sensitive and therefore preferable but assumes that sample variables are normally distributed.
The Spearman's rank correlation coefficient was one of the earliest developed statistics based on ranks. The statistic generated, sometimes known as 'rho' is usually represented as 'r(s)'.
Script operation
This tool operates in much the same way as most of the others with no specific departures from the usual methods needed.
Click here for information about general script usage.
Raw sample data must be entered as two equal sized samples, the data being arranged in columns. Note that sample titles may be included within the output by inclusion within the input range.
Raw data: Spreadsheet output: X Y Spearmans Rho - Rank Correlation Non-Parametric Test 10.4 7.4 Rank_X Rank_Y 10.8 7.6 4 5 11.1 7.9 8.5 7 10.2 7.2 10 11 10.3 7.4 1.5 2.5 10.2 7.1 3 5 10.7 7.4 1.5 1 10.5 7.2 7 5 10.8 7.8 5 2.5 11.2 7.7 8.5 9.5 10.6 7.8 11 8 11.4 8.3 6 9.5 12 12 Ties: 5 Rho(no ties): 0.8531 Rho(corrected for ties): 0.8511 P(Rho<=rho): 0.000323 d.f.: 10 t: 5.1261 P(T<=t) one-tail: 0.000223 T-Critical (95%): 1.8125 T-Critical (99%): 2.7638 P(T<=t) two-tail: 0.000447 T-Critical (95%): 2.2281 T-Critical (99%): 3.1693
Interpretation
In the example above the correlation coefficient calculated (0.8531) indicates that there is a fairly strong degree of positive correlation between the two data variables.
If the correlation coefficient is near to -1, correlation is known to be negative; a value of +1 means positive correlation; a value of 0 means that there is no correlation.
Although we can see that the positive correlation between the two example variables is reasonably strong it is orthodox practice to express this result in the usual statistical terms of probability.
There are two ways in which this may be carried out using the output provided:
The first method will require access to a table of significance levels of the value of 'r(s)'. This table is widely available in the appendices of most text books.
The first method: using the r(s) distribution
In the example above, r(s)=0.8531 based upon 12 pairs of observations. The n=12 value is used to enter the table. Reference to the r(s) correlation coefficient table will indicate that the value of 0.8531 exceeds the critical value (0.587) for a two-tailed test at the 0.05 level of significance. Use of such a table will indicate that the calculated value of r(s) is significant beyond the 0.001 level of significance and this is demonstrated in the probability value in the output (i.e., 0.000323). In other words P<0.05 and we can be more than 95% certain that there is a statistically significant correlation between the two variables.
Note that a further value of r(s), or rho, is calculated. This statistic should be used in preference if the number of ties (occurrences of identical data elements in a sample) is larger than approximately a quarter of the sample size.
The second method: using the t-distribution
If large sample sizes are available the significance of the r(s) statistic (corrected for ties) can also be tested by comparison of a calculated t-statistic with a critical value at a given level of significance. If the critical value at the 0.05 or 0.01 level of significance is exceeded by the t-statistic then the null hypothesis, that there is no correlation, should be rejected.
Note that this procedure assumes that the null hypothesis is HO:r(s)=0. If another value other than '0' is hypothesised then the correlation coefficient cannot be predicted to have come from a distribution approximated by the normal distribution, and this in turn invalidates the calculation of a t-statistic.
Using the example above, at the 0.05 level of significance the two-tailed t-critical value with d.f.=10 is 2.2281. As this is exceeded by the computed value of 't' this also indicates that the null hypothesis should be rejected.