交叉表(Crosstab)顯示兩個變數的值的次數分佈(frequency distribution ),可以用於找出兩個變數間是否有關聯。pandas.crosstab() 可以幫我們計算交叉表,並且顯示漂亮的表格。
pandas.crosstab( index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name='All', dropna=True, normalize=False )
index
:顯示於列(rows)的值。型態為 array-like、Series、list of arrays/Series。columns
:顯示於行(columns)的值。型態為 array-like、Series、list of arrays/Series。margins
:顯示列和行的小計(subtotals)。
範例
以下的資料是從 Women Entrepreneurship and Labor Force 取得,我們只擷取部分的資料。
import pandas as pd import numpy as np df = pd.DataFrame( np.array([ ['Austria', 'Developed', 'Member', 'Euro'], ['Spain', 'Developed', 'Member', 'Euro'], ['Japan', 'Developed', 'Not Member', 'National Currency'], ['Argentina', 'Developing', 'Not Member', 'National Currency'], ['Bolivia', 'Developing', 'Not Member', 'National Currency'], ['Taiwan', 'Developed', 'Not Member', 'National Currency'], ]), columns=['Country', 'Level of development', 'European Union Membership', 'Currency'] ) pd.crosstab(df['Level of development'], df['European Union Membership'])
pd.crosstab(df['Level of development'], df['European Union Membership'], margins=True)
pd.crosstab(df["Level of development"], [df['European Union Membership'], df['Currency']])