Mann-Whitney U test is a nonparametric test. It combines the two samples, sort them, and assign them ranks according to the orders to test if the distribution of two population are equal.
Table of Contents
Nonparametric Tests
Nonparametric test means that there is no need to make any assumptions about the distribution of a population. If a sample is too small, it may not be possible to assume that the population is a normal distribution. Therefore, it is very useful in the case of small samples.
In addition, because nonparametric test uses rank, when we want to test populations by rank, we can also use it.
Mann-Whitney U Test
We define the following assumptions:
- Null hypothesis (H0) : The distribution of two population are equal.
- Alternative hypothesis (H1) : The distribution of the two population are not equal.
Next, we need to calculate U statistic and p-value, and define a critical value. Generally speaking, the critical value will be 0.05. When p-value is greater than or equal to the critical value, H0 is true.
Ranking
The following is part of the information obtained from Women Entrepreneurship and Labor Force.
No. | Country | European Member | Women Entrepreneurship Index | Women Entrepreneurship Index |
---|---|---|---|---|
1 | Austria | Member | 54.9 | |
2 | Belgium | Member | 63.6 | |
3 | Finland | Member | 66.4 | |
4 | France | Member | 68.8 | |
5 | Ireland | Member | 64.3 | |
6 | Australia | Not Member | 74.8 | |
7 | Japan | Not Member | 40.0 | |
8 | Norway | Not Member | 66.3 | |
9 | Switzerland | Not Member | 63.7 | |
10 | Taiwan | Not Member | 53.4 | |
First sort the data according to the Women Entrepreneurship Index, then rank them, and calculate the sum of the ranks.
No. | Country | European Member | Index | Index | Rank | Rank |
---|---|---|---|---|---|---|
1 | Japan | Not Member | 40 | 1 | ||
2 | Taiwan | Not Member | 53.4 | 2 | ||
3 | Austria | Member | 54.9 | 3 | ||
4 | Belgium | Member | 63.6 | 4 | ||
5 | Switzerland | Not Member | 63.7 | 5 | ||
6 | Ireland | Member | 64.3 | 6 | ||
7 | Norway | Not Member | 66.3 | 7 | ||
8 | Finland | Member | 66.4 | 8 | ||
9 | France | Member | 68.8 | 9 | ||
10 | Australia | Not Member | 74.8 | 10 | ||
Sum | 30 | 25 |
Calculating U Statistic
Use the following formula to calculate U statistic.
- U1: U statistic of the first sample.
- N1: The number of the first sample.
- N2: the number of second sample.
- ER1: The total ranks of the first sample.
The following is the U statistic of Member.
The following is the U statistic of Not Member.
After calculating two U statistics, take the smaller one, so the calculated U statistic is 10.
Python SciPy
We can use SciPy’s mannwhitneyu() to obtain U statistic and p-value.
u_statistic, p = scipy.stats.mannwhitneyu(x, y)
- x, y: Two samples. The type is array_like.
- u_statistic: U statistic. The type is float.
- p: p-value. The type is float.
Example
from scipy.stats import mannwhitneyu x = [54.9, 63.6, 66.4, 68.8, 64.3] y = [74.8, 40.0, 66.3, 63.7, 53.4] u_statistic, p = mannwhitneyu(x, y) print('U statistic:', u_statistic) print('p-value:', p) if p >= 0.05: print('H0 is accepted') else: print('H0 is rejected')
The output is as follows.
U statistic: 10.0 p-value: 0.33805165701157347 H0 is accepted