SciPy Mann-Whitney U Test

Photo by Jaeyoung Geoffrey Kang on Unsplash
Photo by Jaeyoung Geoffrey Kang on Unsplash
Mann-Whitney U test is a nonparametric test. It combines the two samples, sort them, and assign them ranks according to the orders to test if the distribution of two population are equal.

Mann-Whitney U test is a nonparametric test. It combines the two samples, sort them, and assign them ranks according to the orders to test if the distribution of two population are equal.

Nonparametric Tests

Nonparametric test means that there is no need to make any assumptions about the distribution of a population. If a sample is too small, it may not be possible to assume that the population is a normal distribution. Therefore, it is very useful in the case of small samples.

In addition, because nonparametric test uses rank, when we want to test populations by rank, we can also use it.

Mann-Whitney U Test

We define the following assumptions:

Next, we need to calculate U statistic and p-value, and define a critical value. Generally speaking, the critical value will be 0.05. When p-value is greater than or equal to the critical value, H0 is true.

\text{H0 holds true } if \text{ p-value} \geq \text{critical value}

Ranking

The following is part of the information obtained from Women Entrepreneurship and Labor Force.

No.CountryEuropean MemberWomen Entrepreneurship IndexWomen Entrepreneurship Index
1AustriaMember54.9
2BelgiumMember63.6
3FinlandMember66.4
4FranceMember68.8
5IrelandMember64.3
6AustraliaNot Member74.8
7JapanNot Member40.0
8NorwayNot Member66.3
9SwitzerlandNot Member63.7
10TaiwanNot Member53.4

First sort the data according to the Women Entrepreneurship Index, then rank them, and calculate the sum of the ranks.

No.CountryEuropean MemberIndexIndexRankRank
1JapanNot Member401
2TaiwanNot Member53.42
3AustriaMember54.93
4BelgiumMember63.64
5SwitzerlandNot Member63.75
6IrelandMember64.36
7NorwayNot Member66.37
8FinlandMember66.48
9FranceMember68.89
10AustraliaNot Member74.810
Sum3025

Calculating U Statistic

Use the following formula to calculate U statistic.

U_1=N_1N_2+\frac{N_1(N_1+1)}{2}-\Sigma R_1

  • U1: U statistic of the first sample.
  • N1: The number of the first sample.
  • N2: the number of second sample.
  • ER1: The total ranks of the first sample.

The following is the U statistic of Member.

U_{Member}=5\cdot 5+\frac{5(5+1)}{2}-30=10

The following is the U statistic of Not Member.

U_{NotMember}=5\cdot 5+\frac{5(5+1)}{2}-25=15

After calculating two U statistics, take the smaller one, so the calculated U statistic is 10.

Python SciPy

We can use SciPy’s mannwhitneyu() to obtain U statistic and p-value.

u_statistic, p = scipy.stats.mannwhitneyu(x, y)
  • x, y: Two samples. The type is array_like.
  • u_statistic: U statistic. The type is float.
  • p: p-value. The type is float.

Example

from scipy.stats import mannwhitneyu
x = [54.9, 63.6, 66.4, 68.8, 64.3]
y = [74.8, 40.0, 66.3, 63.7, 53.4]
u_statistic, p = mannwhitneyu(x, y)
print('U statistic:', u_statistic)
print('p-value:', p)
if p >= 0.05:
    print('H0 is accepted')
else:
    print('H0 is rejected')

The output is as follows.

U statistic: 10.0
p-value: 0.33805165701157347
H0 is accepted
Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like