邏輯斯回歸（Logistic Regression）

Logistic regression 是一個資料分析技術，被用來做分類（classification）。它大概是分類演算法中最簡單的模型。因此，很適合當初學者的第一個分類演算法。

完整程式碼可以在下載。

邏輯斯回歸（Logistic Regression）
成本函數（Cost Function）
多元邏輯斯回歸（Multiple Logistic Regress）
結語
參考

邏輯斯回歸（Logistic Regression）

S 函數（Sigmoid Function）

在介紹 logistic regression 之前，我們先來了解 sigmoid function。Sigmoid function 如下，它可以將 z 轉換成 0 和 1 之間的值。

$g(z)=\frac{1}{1+e^{-z}}, 0<g(z)<1$

如下圖所示，sigmoid function 的兩端會非常接近 0 和 1。

Sigmoid function, outputs between 0 and 1.

以下的程式碼中，sigmoid() 實作了 sigmoid function。

def sigmoid(z):
    """
    Sigmoid function

    Parameters
    ----------
    z: (ndarray (m,)) or (scalar)
        m is the number of samples

    Returns
    -------
    (ndarray (m,)) or (scalar)
    """

    return 1 / (1 + np.exp(-z))

邏輯斯回歸模型（Logistic Regression Model）

Logistic regression 是用於分類的 regression，而談到分類時，自然就與機率有關。Sigmoid function 的輸出值介於 0 到 1 之間，符合機率必須介於 0% 到 100% 之間的基本要求。Logistic regression 的 model 是，先將 x 帶入到 linear regression model，得到 y。再將 y 帶入到 sigmoid function，得到一個介於 0 和 1 的值。Logistic regression 的 model function 如下。

$z=wx^{(i)}+b \\\\ g(z)=\frac{1}{1+e^{-z}} \\\\ f_{w,b}(x^{(i)})=g(wx^{(i)}+b)=\frac{1}{1+e^{-(wx^{(i)}+b)}} \\\\ \text{parameters}:w,b \\\\ i:i \text{-th example}$

假設 x 是腫瘤的大小，當 y 是 0 表示非惡性腫瘤，當 y 是 1 表示惡性腫瘤。如果，將 x 帶入 logistic regression model 並得到 0.7，這表示有 70% 的機率 y 是 1。因為 y 是 0 的機率加上 y 是 1 機率總和為 1，所以當有 70% 的機率 y 是 1，那代表有 30% 的機率 y 是 0。

$P(y=0)+P(y=1)=1$

以下的程式碼中，f_wb() 實作了 logistic regression model。

def f_wb(x, w, b):
    """
    Logistic regression model

    Parameters
    ----------
    x: (ndarray (m,)) or (scalar)
        m is the number of samples
    w: (scalar)
    b: (scalar)

    Returns
    -------
    (ndarray (m,)) or (scalar)
    """

    return sigmoid(w * x + b)

決策邊界（Decision Boundary）

剛剛提到 logistic regression 的輸出值是介於 0 到 1 之間，其代表著 y 是 1 的機率。因此，我們需要設定一個 threshold。當 f_w,b 大於等於 threshold 時，ŷ 為 1，反之當 f_w,b 小於 threshold 時，ŷ 為 0。

當 z 大於等於 0 時，f_w,b 也就是 g(z) 就會大於等於 threshold 0.5。所以，z 為 0 的左邊是 ŷ 為 0，右邊是 ŷ 為 1。此界線稱為 decision boundary。

下面的例子中，我們可以透過 z = 0 來計算出 decision boundary。

成本函數（Cost Function）

概似函數與最大概似估計法（Likelihood Function and Maximum Likelihood Estimation）

Likelihood function 是一種統計方法，用於描述在給定參數下觀測到的數據出現的可能性。假設我們擲 4 次硬幣，結果如下，其中每次擲正面的機率為 $\theta$ 。

x_i	Result	Probability
x₁	Font	$\theta$
x₂	Font	$\theta$
x₃	Front	$\theta$
x₄	Back	1 – $\theta$

根據上面的結果，其 likelihood function $L(\theta)$ 如下。 $p(x_i|\theta)$ 是條件機率（conditional probability），指在 $\theta$ 機率下 x_i 的機率。

$L(\theta)=\displaystyle\prod_{i=1}^{4}p(x_i|\theta) \\\\ \hphantom{L(\theta)}=p(x_1|\theta) \cdot p(x_2|\theta) \cdot p(x_3|\theta) \cdot p(x_4|\theta) \\\\ \hphantom{L(\theta)}=\theta \cdot \theta \cdot \theta \cdot (1-\theta) \\\\ \hphantom{L(\theta)}=\theta^3 \cdot (1-\theta)$

假設擲正面的機率 $\theta$ 是 0.6，則 likelihood function $L(\theta)$ 為 0.0846。

$L(\theta)=0.6^3 \cdot 0.4=0.0846$

得到 likelihood function 後，對 likelihood function 做微分並令其等於 0 來算出其最大值的點。最後，算出擲正面的機率 $\theta$ 為 $\frac{3}{4}$ 。這就是 maximum likelihood estimation。

$\frac{d}{d\theta}L(\theta) =0 \\\\ \Rightarrow \frac{d}{d\theta}(\theta^3 \cdot (1-\theta))=0 \\ \Rightarrow \frac{d}{d\theta}(\theta^3-\theta^4)=0 \\\\ \Rightarrow 3\theta^2-4\theta^3=0 \\\\ \Rightarrow \theta=\frac{3}{4}$

成本函數與損失函數（Cost Function and Loss Function）

Cost function 是用來衡量 f_w,b 的準確率。有了 cost function，我們就可以衡量調整後的 w 和 b 是否比原來的好。那我們要用什麼樣的 cost function 來衡量 f_w,b 呢？

以之前擲硬幣的例子來看，每一次擲硬幣的機率如下：

$p(x^{(i)}|\theta)=\begin{cases} \theta & \text{if } y^{(i)}=1 \\ 1-\theta & \text{if } y^{(i)}=0 \end{cases} \\\\ \text{0: back, 1: front}$

為了方便，我們可以將 $p(x^{(i)}|\theta)$ 的兩個式子合併成一個式子，如下：

$p(x^{(i)}|\theta)=\theta^{(y^{(i)})}+(1-\theta)^{(1-y^{(i)})}$

那擲 n 次硬幣時的 likelihood function 如下。依據 maximum likelihood estimation，我們可以利用對 likelihood function 求取最大值，來作為 cost function。

$L(\theta)=\displaystyle\prod_{i=1}^{n}p(x_i|\theta)$

由於式子中是乘法不利計算，我們先將它化簡一下，對式子兩邊取對數。

$L(\theta)=\displaystyle\prod_{i=1}^{n}p(x_i|\theta) \\\\ \Rightarrow L(\theta)=p(x_1|\theta) \cdot p(x_2|\theta) \cdots p(x_n|\theta) \\\\ \Rightarrow \ln L(\theta)=\ln p(x_1|\theta) + \ln p(x_2|\theta) +\cdots+ \ln p(x_n|\theta) \\\\ \Rightarrow \ln L(\theta)=\displaystyle\sum_{i=1}^{n} \ln p(x_i|\theta)$

式子中的 $\ln p(x_i|\theta)$ 可展開如下。

$\ln p(x_i|\theta)=\ln (\theta^{(y^{(i)})}+(1-\theta)^{(1-y^{(i)})}) \\\\ \hphantom{\ln p(x_i|\theta)}=y^{(i)}\ln\theta+(1-y^{(i)})\ln(1-\theta)$

最後，我們的 likelihood function 變成對數概似函數（log-likelihood function）如下。

$\ln L(\theta)=\displaystyle\sum_{i=1}^{n} \ln p(x_i|\theta) \\\\ \hphantom{\ln L(\theta)}=\displaystyle\sum_{i=1}^{n}y^{(i)}\ln\theta+(1-y^{(i)})\ln(1-\theta)$

依據 maximum likelihood estimation，我們必須要對 likelihood function 求取最大值。然而，慣例上，我們習慣對 cost function 求取最小值。因此，我們對 likelihood function 乘上 -1，這樣就可以改為求取最小值。

$-\ln L(\theta)=\displaystyle\sum_{i=1}^{n} -y^{(i)}\ln\theta -(1-y^{(i)})\ln(1-\theta)$

在 logistic regression 中，機率 $\theta$ 會由 f_w,b 算出，所以將 f_w,b 代入 $p(x_i|\theta)$ 就可以得出 logistic regression 的 loss function。

$loss(f_{w,b}(x^{(i)}),y^{(i)})=-y^{(i)}\log(f_{w,b}(x^{(i)}))-(1-y^{(i)})\log(1-f_{w,b}(x^{(i)}))$

最終，logistic regression 的 cost function 如下：

$J(w,b)=\frac{1}{m}\displaystyle\sum^{m}_{i=1}loss(f_{w,b}(x^{(i)},y^{(i)})) \\\\ \hphantom{J(w,b)}=\frac{1}{m}\displaystyle\sum^{m}_{i=1}-y^{(i)}\log(f_{w,b}(x^{(i)}))-(1-y^{(i)})\log(1-f_{w,b}(x^{(i)})) \\\\ m:\text{number of examples} \\\\ \text{Objective}:minimize_{w,b}J(w,b)$

以下的程式碼中，compute_cost() 實作了 cost function J(w,b)。

def compute_cost(x, y, w, b):
    """
    Compute cost

    Parameters
    ----------
    x: (ndarray (m,))
        m is the number of samples
    y: (ndarray (m,))
    w: (scalar)
    b: (scalar)

    Returns
    -------
    (scalar)
    """

    m = x.shape[0]
    y_hat = f_wb(x, w, b)
    cost = 1 / m * np.sum(-y * np.log(y_hat) - (1 - y) * np.log(1 - y_hat))
    return cost

梯度下降（Gradient Descent）

Gradient descent 是一個最佳化演算法，用來找到一個函數的局部值。讀者可以先參考以下的。linear regression 來了解 gradient descent。

Photo by Louis Hansel @shotsoflouis on Unsplash

- Machine Learning
- Supervised Learning

線性回歸（Linear Regression）

ByWayne
08/12/2024

以下為 logistic regression 的 gradient descent 演算法。首先，先隨機選一組 w 和 b，或直接設為零。Cost function 的導數乘上一個 learning rate $\alpha$ 會是 w 和 b 要下降的梯度。此外，我們還會給定一個 iteration 次數，代表執行 gradient descent 的次數。因此，我們也許不會得到最佳的 w 和 b。

$\text{repeat until convergence } \{ \\\\ \phantom{xxxx} w=w-\alpha \frac{\partial J(w,b)}{\partial w} \\\\ \phantom{xxxx} b=b-\alpha \frac{\partial J(w,b)}{\partial b} \\\\ \}$

J(w, b) 對 w 和 b 的偏導數（partial derivative）計算方法如下：

$\frac{\partial}{\partial w}J(w,b)=\frac{1}{m} \displaystyle\sum^{m}_{i=1} (f_{w,b}(x^{(i)})-y^{(i)})x^{(i)} \\\\ \frac{\partial}{\partial b}J(w,b)=\frac{1}{m} \displaystyle\sum^{m}_{i=1} (f_{w,b}(x^{(i)})-y^{(i)})$

Logistic regression 和 linear regression 的 cost function 的導數計算方式看起來一樣，但注意他們的 f_w,b 不一樣。

$\text{Linear regression: } f_{w,b}(x^{(i)})=wx^{(i)}+b \\ \text{Logistic regression: } f_{w,b}(x^{(i)})=\frac{1}{1+e^{-(wx^{(i)}+b)}}$

以下的程式碼實作 J(w,b) 對 w 和 b 的偏導數計算。

def compute_gradient(x, y, w, b):
    """
    Compute the gradient for logistic regression

    Parameters
    ----------
    x: (ndarray (m,))
        m is the number of samples
    y: (ndarray (m,))
    w: (scalar)
    b: (scalar)

    Returns
    -------
    dj_dw: (scalar)
    dj_db: (scalar)
    """

    m = x.shape[0]
    dj_dw = 1 / m * np.sum((f_wb(x, w, b) - y) * x)
    dj_db = 1 / m * np.sum(f_wb(x, w, b) - y)
    return dj_dw, dj_db

以下的程式碼實作 gradient descent。其中參數 alpha 是 learning rate，而 epochs 是 iteration 次數。

def perform_gradient_descent(x, y, w_init, b_init, alpha, epochs):
    """
    Perform gradient descent

    Parameters
    ----------
    x: (ndarray (m))
        m is the number of samples
    y: (ndarray (m,))
    w_init: (scalar)
    b_init: (scalar)
    alpha: (float)
    epochs: (int)

    Returns
    -------
    w: (scalar)
    b: (scalar)
    """

    w = w_init
    b = b_init
    for i in range(epochs):
        dj_dw, dj_db = compute_gradient(x, y, w, b)
        w -= alpha * dj_dw
        b -= alpha * dj_db
    return w, b

範例

接下來，我們會用一個例子來講解如何使用 logistic regression。以下的範例中有 20 個數值，小於 10 給予標籤 0，大於等於 10 的給予標籤 1。

x_train = np.array([1, 0, 17, 8, 13, 19, 15, 10, 8, 7, 3, 6, 17, 3, 4, 17, 11, 12, 16, 13])
y_train = np.int_(x_train[:] >= 10)

plt.scatter(x_train[y_train == 0], y_train[y_train == 0], 60, marker='^', c='c', label='y_train == 0')
plt.scatter(x_train[y_train == 1], y_train[y_train == 1], 60, marker='o', c='m', label='y_train == 1')
plt.xlabel('x_train[:,0]')
plt.ylabel('x_train[:,1]')
plt.legend()

以下程式碼中，我們令 learning rate 為 0.01，iteration 次數為 10000 次。

w, b = perform_gradient_descent(x_train, y_train, 0, 0, 0.01, 10000)
w, b

# Output
(np.float64(0.6199251592044446), np.float64(-5.42258186823073))

之後，用算出來的 w 和 b 來預測 x_train。得到的 prediction 是一連串的機率。我們將 decision boundary 設為 0.5，所以機率小於 0.5 的給予標籤 0，機率大於等於 0.5 的給予標籤 1。

prediction = f_wb(x_train, w, b)
y_hat = np.int_(prediction >= 0.5)
(y_train, prediction, y_hat)

將 y_hat 畫在座標圖上，可以上面 y_train 的座標圖比較。

plt.scatter(x_train[y_hat == 0], y_train[y_hat == 0], 60, marker='^', c='c', label='y_hat == 0')
plt.scatter(x_train[y_hat == 1], y_train[y_hat == 1], 60, marker='o', c='m', label='y_hat == 1')
plt.xlabel('x_train[:,0]')
plt.ylabel('x_train[:,1]')
plt.legend()

y_hat with decision boundary 0.5, 0: < 0.5, 1: >= 0.5.

多元邏輯斯回歸（Multiple Logistic Regress）

至目前為止，我們介紹的是 simple logistic regression，只有一個變數。接下來，我們將介紹 multiple logistic regression。

多元邏輯斯回歸模型（Multiple Logistic Regression Model）

Multiple logistic regression 的 model function 如下。

$f_{w,b}(x^{(i)})=g(z^{(i)})=\frac{1}{1+e^{-z^{(i)}}} \\\\ z^{(i)}=w_1x_1^{(i)}+w_2x_2^{(i)}+\cdots+w_nx_n^{(i)}+b$

向量化的 model function 則如下，其中 $\vec{x},\vec{w}$ 是 vectors。

$f_{\vec{w},b}(\vec{x}^{(i)})=g(\vec{z}^{(i)})=\frac{1}{1+e^{-\vec{z}^{(i)}}} \\\\ \vec{z}^{(i)}=\displaystyle\sum_{j=1}^{n}w_jx_j^{(i)}=\vec{w}\cdot\vec{x}^{(i)}+b \\\\ \vec{w}=[w_1,w_2,\cdots,w_n] \\\\ \vec{x}^{(i)}=[x_1^{(i)},x_2^{(i)},\cdots,x_n^{(i)}]$

如果我們將所有的 training examples 放在一起，就會形成一個陣列（matrix）。一般我們會用大寫的 X 來代表 training examples 形成的 matrix。因此，model function 可寫成如下：

$f_{w,b}(X)=g(Z)=g(X\cdot w+b)=\frac{1}{1+e^{-(X\cdot w+b)}} \\\\ X=\begin{bmatrix} x_1^{(1)} & x_2^{(1)} & \cdots & x_{n}^{(1)} \\ x_1^{(2)} & x_2^{(2)} & \cdots & x_{n}^{(2)} \\ \vdots \\ x_1^{(m)} & x_2^{(m)} & \cdots & x_{n}^{(m)} \\ \end{bmatrix}, w=\begin{bmatrix} w_1 \\ w_2 \\ \vdots \\ w_{n} \end{bmatrix}, b \text{ is a scalar} \\\\ y=\begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_{m} \end{bmatrix} \\\\ m: \text{number of examples} \\\\ n: \text{number of features in each example} \\\\ x_j^i: j \text{-th feature } \text{ in } i \text{-th example}$

向量化的 f_w,b 和 simple logistic regression 的 model function 很像，它們的實作也差不多。差別在於，向量化的 f_w,b 使用內積（dot product）來相乘 X 和 w。

def sigmoid(z):
    """
    Sigmoid function

    Parameters
    ----------
    z: (ndarray (m,)) or (scalar)
        m is the number of samples

    Returns
    -------
    (ndarray (m,)) or (scalar)
    """

    return 1 / (1 + np.exp(-z))


def f_wb(X, w, b):
    """
    Logistic regression model

    Parameters
    ----------
    X: (ndarray (m, n))
        m is the number of samples, n is the number of features
    w: (ndarray (n,))
    b: (scalar)

    Returns
    -------
    (ndarray (m,))
    """

    return sigmoid(np.dot(X, w) + b)

成本函數（Cost Function）

Multiple logistic regression 的 cost function 如下。

$J(\vec{w},b)=\frac{1}{m}\displaystyle\sum^{m}_{i=1}-y^{(i)}\log(f_{\vec{w},b}(\vec{x}^{(i)}))-(1-y^{(i)})\log(1-f_{\vec{w},b}(\vec{x}^{(i)})) \\\\ f_{\vec{w},b}(\vec{x}^{(i)})=g(\vec{w}\cdot \vec{x}^{(i)}+b)=\frac{1}{1+e^{-(\vec{w}\cdot \vec{x}^{(i)}+b)}}$

以下程式碼中，compute_cost() 實作了向量化的 cost function。

def compute_cost(X, y, w, b):
    """
    Compute cost

    Parameters
    ----------
    X: (ndarray (m, n))
        m is the number of samples, n is the number of features
    y: (ndarray (m,))
    w: (ndarray (n,))
    b: (scalar)

    Returns
    -------
    (scalar)
    """

    m = X.shape[0]
    y_hat = f_wb(X, w, b)
    cost = 1 / m * np.sum(-y * np.log(y_hat) - (1 - y) * np.log(1 - y_hat))
    return cost

梯度下降（Gradient Descent）

Multiple logistic regression 的 gradient descent 如下。從式子中可以了解到，我們要計算每一個 feature 的參數 w_j。然後，計算 J(w,b) 對每一個 w_j 的偏導數。

$\text{repeat until convergence: \{} \\ \phantom{xxxx} w_j=w_j-\alpha\frac{\partial J(w,b)}{\partial w_j}, \text{ for } j=1\dots n \\ \phantom{xxxx} b=b-\alpha\frac{\partial J(w,b)}{\partial b} \\ \}$

J(w,b) 對每一個 w_j 和 b 的偏導數如下：

$\frac{\partial J(w,b)}{\partial w_j}=\frac{1}{m} \displaystyle\sum^{m}_{i=1} (f_{w,b}(x^{(i)})-y^{(i)})x^{(i)}_j \\\\ \frac{\partial J(w,b)}{\partial b}=\frac{1}{m} \displaystyle\sum^{m}_{i=1} (f_{w,b} (x^{(i)})-y^{(i)})$

以下的程式碼實作 J(w,b) 對 w_j 和 b 的偏導數計算。

def compute_gradient(X, y, w, b):
    """
    Compute the gradient for logistic regression

    Parameters
    ----------
    X: (ndarray (m, n))
        m is the number of samples, n is the number of features
    y: (ndarray (m,))
    w: (ndarray (n,))
    b: (scalar)

    Returns
    -------
    dj_dw: (ndarray (n,))
    dj_db: (scalar)
    """

    m = X.shape[0]
    y_hat = f_wb(X, w, b)
    dj_dw = 1 / m * np.dot(X.T, y_hat - y)
    dj_db = 1 / m * np.sum(y_hat - y)
    return dj_dw, dj_db

以下的程式碼實作 gradient descent。

def perform_gradient_descent(X, y, w_init, b_init, alpha, epochs):
    """
    Perform gradient descent

    Parameters
    ----------
    X: (ndarray (m, n))
        m is the number of samples, n is the number of features
    y: (ndarray (m,))
    w_init: (ndarray (n,))
    b_init: (scalar)
    alpha: (float)
    epochs: (int)

    Returns
    -------
    w: (ndarray (n,))
    b: (scalar)
    """

    w = w_init
    b = b_init
    for i in range(epochs):
        dj_dw, dj_db = compute_gradient(X, y, w, b)
        w -= alpha * dj_dw
        b -= alpha * dj_db
        print(f'Epoch {i + 1:4}, Cost: {float(compute_cost(X, y, w, b)):8.2f}')
    return w, b

範例

以下的範例中，在座標平面上有 6 個點，並將它們分成兩群。

X_train = np.array([[10, 10], [10, 16], [20, 8], [10, 24], [20, 16], [30, 30]])
y_train = np.array([0, 0, 0, 1, 1, 1])

plt.scatter(X_train[y_train == 0, 0], X_train[y_train == 0, 1], 60, marker='^', c='c', label='y_train == 0')
plt.scatter(X_train[y_train == 1, 0], X_train[y_train == 1, 1], 60, marker='o', c='m', label='y_train == 1')
plt.xlabel('x_train[:,0]')
plt.ylabel('x_train[:,1]')
plt.legend()

用 gradient descent 來訓練模型找出 w 和 b。

np.random.seed(1)
initial_w = 0.01 * (np.random.rand(2) - 0.5)
initial_b = -8
w, b = perform_gradient_descent(X_train, y_train, initial_w, initial_b, 0.001, 10000)
w, b

# Output
Epoch    1, Cost:     3.75
Epoch 1001, Cost:     0.17
Epoch 2001, Cost:     0.17
Epoch 3001, Cost:     0.17
Epoch 4001, Cost:     0.17
Epoch 5001, Cost:     0.17
Epoch 6001, Cost:     0.17
Epoch 7001, Cost:     0.17
Epoch 8001, Cost:     0.17
Epoch 9001, Cost:     0.17
(array([0.17254296, 0.36047453]), np.float64(-8.219025094555942))

算出 w 和 b 後，我們可以畫出 decision boundary。

plt.scatter(X_train[y_train == 0, 0], X_train[y_train == 0, 1], 60, marker='^', c='c', label='y_train == 0')
plt.scatter(X_train[y_train == 1, 0], X_train[y_train == 1, 1], 60, marker='o', c='m', label='y_train == 1')
plot_x = np.array([min(X_train[:, 0]), max(X_train[:, 0])])
plot_y = (-1. / w[1]) * (w[0] * plot_x + b)
plt.plot(plot_x, plot_y, c='g')
plt.xlabel('x_train[:,0]')
plt.ylabel('x_train[:,1]')
plt.legend()

A decision boundary separates points into 2 groups.

最後，我們新增一個點 (25, 25)，並利用模型來將它分類。

plt.scatter(X_train[y_train == 0, 0], X_train[y_train == 0, 1], 60, marker='^', c='c', label='y_train == 0')
plt.scatter(X_train[y_train == 1, 0], X_train[y_train == 1, 1], 60, marker='o', c='m', label='y_train == 1')
plot_x = np.array([min(X_train[:, 0]), max(X_train[:, 0])])
plot_y = (-1. / w[1]) * (w[0] * plot_x + b)
plt.plot(plot_x, plot_y, c='g')

x = np.array([25, 25])
y_hat = f_wb(x, w, b)
plt.scatter(x[0], x[1], 60, marker='*', c='m' if y_hat >= 0.5 else 'c', label=f'Prediction: {y_hat:.2f}')

plt.xlabel('x_train[:,0]')
plt.ylabel('x_train[:,1]')
plt.legend()

結語

Logistic regression 模型和 linear regression 很像。它使用 linear regression model function 算出一個值，再用 sigmoid function 將該值巧妙地轉換成機率。雖然 logistic regression 概念蠻講單，但是卻常常被用在實務上。

參考

Andrew Ng, Machine Learning Specialization, Coursera.
西内啓，機器學習的數學基礎 : AI、深度學習打底必讀，旗標。

邏輯斯回歸（Logistic Regression）

Share

Table of Contents

邏輯斯回歸（Logistic Regression）

S 函數（Sigmoid Function）

邏輯斯回歸模型（Logistic Regression Model）

決策邊界（Decision Boundary）

成本函數（Cost Function）

概似函數與最大概似估計法（Likelihood Function and Maximum Likelihood Estimation）

成本函數與損失函數（Cost Function and Loss Function）

梯度下降（Gradient Descent）

範例

多元邏輯斯回歸（Multiple Logistic Regress）

多元邏輯斯回歸模型（Multiple Logistic Regression Model）

成本函數（Cost Function）

梯度下降（Gradient Descent）

範例

結語

參考

Related Tags

發佈留言 取消回覆

You May Also Like

發佈留言取消回覆