曼-惠特尼U檢定(Mann-Whitney U Test)是一種無母數檢定(nonparametric test)。它將兩個樣本的資料先合併,再依照大小排序的等級(rank)來測定兩個母群體在分佈上是否相同。
Table of Contents
無母數檢定(Nonparametric Tests)
無母數檢定是指不需要對母群體之分配情況做任何假定。如當樣本數太小時,可能不能假定母群為常態分佈。所以,無母數檢定在小樣本的情況下非常有用。
另外,因為無母數檢定是用等級(rank)來測定,所以當我們想要用等級的方式來測定母群體時,也可以使用無母數檢定。
曼-惠特尼U檢定
我們定義以下的假設:
- Null hypothesis (H0):兩個母體群在分佈上相同。
- Alternative hypothesis (H1):兩個母體群在分佈上不相同。
接下來,我們要計算 U statistic 和 p-value,並且定義一個臨界值(critical value)。一般來說,臨界值會取 0.05。當 p-value 大於等於臨界值時,H0 為真。
分等級
以下是從 Women Entrepreneurship and Labor Force 取得的部分資料。
No. | Country | European Member | Women Entrepreneurship Index | Women Entrepreneurship Index |
---|---|---|---|---|
1 | Austria | Member | 54.9 | |
2 | Belgium | Member | 63.6 | |
3 | Finland | Member | 66.4 | |
4 | France | Member | 68.8 | |
5 | Ireland | Member | 64.3 | |
6 | Australia | Not Member | 74.8 | |
7 | Japan | Not Member | 40.0 | |
8 | Norway | Not Member | 66.3 | |
9 | Switzerland | Not Member | 63.7 | |
10 | Taiwan | Not Member | 53.4 | |
先將資料根據 Women Entrepreneurship Index 做排序,然後標上等級,並計算出等級的總和。
No. | Country | European Member | Index | Index | Rank | Rank |
---|---|---|---|---|---|---|
1 | Japan | Not Member | 40 | 1 | ||
2 | Taiwan | Not Member | 53.4 | 2 | ||
3 | Austria | Member | 54.9 | 3 | ||
4 | Belgium | Member | 63.6 | 4 | ||
5 | Switzerland | Not Member | 63.7 | 5 | ||
6 | Ireland | Member | 64.3 | 6 | ||
7 | Norway | Not Member | 66.3 | 7 | ||
8 | Finland | Member | 66.4 | 8 | ||
9 | France | Member | 68.8 | 9 | ||
10 | Australia | Not Member | 74.8 | 10 | ||
Sum | 30 | 25 |
計算 U Statistic
利用以下的公式計算 U statistic。
- U1:第一樣本的 U statistic。
- N1:第一樣本的個數。
- N2:第二樣本的個數。
- ER1:第一樣本的等級總和。
以下是 Member 的 U statistic。
以下是 Not Member 的 U statistic。
計算出兩個 U statistics 後,取較小的,所以計算出的 U statistic 是 10。
Python SciPy
我們可以利用 SciPy 的 mannwhitneyu() 來取得 U statistic 與 p-value。
u_statistic, p = scipy.stats.mannwhitneyu(x, y)
- x, y:兩個樣本。型態為 array_like。
- u_statistic:U statistic。型態為 float。
- p:p-value。型態為 float。
範例
from scipy.stats import mannwhitneyu x = [54.9, 63.6, 66.4, 68.8, 64.3] y = [74.8, 40.0, 66.3, 63.7, 53.4] u_statistic, p = mannwhitneyu(x, y) print('U statistic:', u_statistic) print('p-value:', p) if p >= 0.05: print('H0 is accepted') else: print('H0 is rejected')
其輸出如下。
U statistic: 10.0 p-value: 0.33805165701157347 H0 is accepted