SciPy Spearman Correlation Coefficient

Photo by Melnychuk Nataliya on Unsplash
Photo by Melnychuk Nataliya on Unsplash
Spearman Correlation Coefficient is a nonparametric method. It calculates the ranks by sorting the two variables, and then calculates the difference between the ranks to measure the correlation between the two variables.

Spearman Correlation Coefficient is a nonparametric method. It calculates the ranks by sorting the two variables, and then calculates the difference between the ranks to measure the correlation between the two variables.

Spearman Correlation Coefficient

Spearman correlation coefficient calculates the direction of the correlation between two variables X (independent variable) and Y (dependent variable). When Spearman’s correlation coefficient is:

  • 0 < ρ <= 1: When X increases, Y tends to increase.
  • -1 <= ρ <0: When X increases, Y tends to decrease.
  • ρ = 0: When X increases, Y has no trend.

We define the following assumptions:

Next, we calculate the correlation ( ρ ) and define a critical value. Generally speaking, the critical value will be 0.05. When p-value is greater than or equal to the critical value, H0 is true.

\text{H0 holds true } if \text{ p-value} \geq \text{critical value}

Ranking

The following is  part of the data obtained from  Women Entrepreneurship and Labor Force.

No.CountryWomen Entrepreneurship Index
(X)

Entrepreneurship Index
(Y)
1Germany63.667.4
2Greece43.042.0
3Ireland64.365.3
4Italy51.441.3
5Latvia56.654.5
6Lithuania58.554.6
7Netherlands69.366.5
8Slovakia54.845.4
9Slovenia55.953.1
10Spain52.549.6

First, sort the data according to the Women Entrepreneurship Index, and then rank it in Rank(X). Again, sort the data according to Entrepreneurship Index and rank it in Rank(Y). After that, subtract the pairs of Xi and Yi to di. Finally, sum up all di^2.

CountryIndex (X)Index (Y)Rank (X)Rank (Y)didi x di
Germany63.667.4810-24
Greece43.042.012-11
Ireland64.365.39811
Italy51.441.32111
Latvia56.654.56600
Lithuania58.554.67700
Netherlands69.366.510911
Slovakia54.845.44311
Slovenia55.953.15500
Spain52.549.634-11
Sum10

Calculate Correlation ( ρ )

Use the following formula to calculate the correlation ( ρ ).

\rho=1-\frac{6\sum d_i^2}{n(n^2-1)}

Therefore, according to the above table, we can obtain ρ = 0.9393.

\rho=1-\frac{6\sum 10^2}{10(10^2-1)}=0.9393

Python SciPy

We can use SciPy’s spearmanr() to calculate the correlation ( ρ ) and p-value.

correlation, p = spearmanr(x, y)
  • x, y: Two samples. The type is array_like.
  • correlation: correlation ρ. The type is float.
  • p: p-value. The type is float.

Example

from scipy.stats import spearmanr
x = [63.6, 43.0, 64.3, 51.4, 56.6, 58.5, 69.3, 54.8, 55.9, 52.5]
y = [67.4, 42.0, 65.3, 41.3, 54.5, 54.6, 66.5, 45.4, 53.1, 49.6]
correlation, p = spearmanr(x, y)
print('Correlation:', correlation)
print('p-value:', p)
if p >= 0.05:
    print('H0 is accepted')
else:
    print('H0 is rejected')

The output is as follows.

Correlation: 0.9393939393939393
p-value: 5.484052998513666e-05
H0 is rejected
Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like