Pearson's product-moment correlation coefficient
Investigation of the degree of correlation between two variables in effect determines the strength of association between them. There are two commonly used correlation techniques. Spearman Rank correlation coefficient is calculated for data variables that are ranked and do not exhibit a normal distribution, and is therefore a non-parametric form of analysis.
Related tool:
The non-parametric equivalent to this test: Spearman Rank or rho test.
The Pearson product-moment correlation coefficient is more sensitive and therefore preferable but assumes that sample variables are normally distributed. The statistic calculated (usually defined as 'r') is independent of the raw data units and can only fall within the range of -1 and +1.
Script operation
This tool operates in much the same way as most of the others with only one specific departure from the usual methods needed. After the data input and script output range has been defined a further requestor will appear demanding the hypothesized value of the true population correlation coefficient. By default this is set as '0' and for the majority of cases can be left as this. This parameter can be changed (within the range of -1 to +1) but the considerations in doing so will be discussed later.
Click here for information about general script usage.
This test requires the input of equal sample sizes. Note that the sample labels 'x' and 'y' have been incorporated into the output by inclusion within the data range. Note also that the hypothesized population correlation coefficient was left as the default '0'.
Raw data: Spreadsheet output: Correlation: UnGrouped Data x y 5 1 x y 10 6 x 1 0.575 5 2 y 0.575 1 11 8 12 5 Pearson r: 0.575 4 1 r sq.: 0.3307 3 4 Std. Error of r: 0.2893 2 6 n: 10 7 5 d.f.: 8 1 2 t: 1.9880 P(T<=t) one-tail: 0.0410 T-Critical (95%): 1.8595 T-Critical (99%): 2.8965 P(T<=t) two-tail: 0.0820 T-Critical (95%): 2.306 T-Critical (99%): 3.3554 Confidence Intervals: 95% (+1.96): 1.3958 95% (-1.96): -0.0858 99% (+2.58): 1.6301 99% (-2.58): -0.3201
Interpretation
In the example above the correlation coefficient calculated (0.575) indicates that there is an intermediate degree of positive correlation between the two data variables.
If the correlation coefficient is -1, correlation is known as perfect and negative; a value of +1 means perfect positive correlation; a value of 0 means that there is no correlation.
Although we can see that the positive correlation between the two example variables is not particularly strong it is orthodox practice to express this result in the usual statistical terms of probability. There are two ways in which this may be accomplished.
The first method, based on determination of the significance of the 'r' statistic obtained, will require access to a table of significance levels of the value of 'r' for this particular sample size. This table is widely available in the appendices of most text books.
In the example above, r=0.575 based upon 10 pairs of observations. This means that there are n-2 degrees of freedom (i.e., 10-2=8 d.f.) and this is used to enter the table. Reference to the correlation coefficient table indicates that with d.f.=8 the value of 0.575 is less than the tabulated critical value of 0.632 found at the 0.05 level of significance.
In other words as P>0.05, we are less than 95% certain that there is statistically significant correlation between the two variables and the null hypothesis that there is no significant correlation must be retained.
In the second method the Student's t-test may be used. The correlation coefficient (r) is calculated from a pair of samples and is taken to be an estimate of the correlation present in the population data that was sampled. Where the population correlation is concerned the coefficient is termed 'p'. In this case the null hypothesis may take the form of:
HO: p = 0 and therefore HA: p is not equal to 0.
As with the first method this is again a two-tailed test as we are not particularly interested in directional change, only whether there is significant correlation in the form of a coefficient significantly different from 0 (which is effectively no correlation).
In the two-tail output above it can be seen that the calculated value of 't' (1.9880) does not exceed the t-critical value at the 0.05 level of significance (2.306). As P>0.05 there is less than 95% certainty of statistically significant correlation between the two variables and again the null hypothesis must be retained.
Note that in the second method if it is hypothesized that 'p' is anything other than '0' then the t-test is unsuitable. In this case 'r' must be converted to a 'z' value using Fisher's z-transformation. If you alter the default hypothesized population correlation coefficient of '0' when the appropriate requestor appears then a Fisher z-transformation will be carried out and the results will be output to the spreadsheet.