Logistic Regression

Photo by Food Photographer | Jennifer Pallian on Unsplash
Photo by Food Photographer | Jennifer Pallian on Unsplash
Logistic regression is a data analysis technique used for classification. It is probably the simplest model among classification algorithms.

Logistic regression is a data analysis technique used for classification. It is probably the simplest model among classification algorithms. Therefore, it is very suitable as the first classification algorithm for beginners.

Sigmoid Function

Before introducing logistic regression, let’s first look at the sigmoid function (also known as logistic function). The formula of Sigmoid function is as follows. Its output is between 0 and 1.

g(z)=\frac{1}{1+e^{-z}}, 0<g(z)<1

As shown in the figure below, the two ends of the sigmoid function will be very close to 0 and 1.

Sigmoid function, outputs between 0 and 1.
Sigmoid function, outputs between 0 and 1.

Logistic Regression Model

The model of logistic regression is to first pass x into the model of linear regression to obtain y. Then pass y into the sigmoid function to obtain a value between 0 and 1. The formula of the logistic regression model is as follows.

\text{Linear regression model: } z=\vec{w} \cdot \vec{x}+b \\\\ \text{Sigmoid function: } g(z)=\frac{1}{1+e^{-z}} \\\\ \text{Logistic regression model: } f_{\vec{w},b}(\vec{x})=g(\vec{w} \cdot \vec{x}+b)=\frac{1}{1+e^{-(\vec{w} \cdot \vec{x}+b)}}

Therefore, the output value of the logistic regression model is always between 0 and 1. Assume that x is the size of the tumor, when y is 0, it means a non-malignant tumor, and when y is 1, it means a malignant tumor. If, we pass x into a logistic regression model and obtain 0.7, this means there is a 70% chance that y is 1.

x \text{ is tumor size} \\\\ y \text{ is 0 (not malignant)} \\\\ y \text{ is 1 (malignant)} \\\\ f_{\vec{w},b}(\vec{x})=0.7 \\\\ \text{70\% chance y is 1}

Because the probability that y is 0 plus the probability that y is 1 sums to 1, so when there is a 70% probability that y is 1, that means there is a 30% probability that y is 0.

P(y=0)+P(y=1)=1

Decision Boundary

As mentioned just now, the output value of logistic regression is between 0 and 1, which represents the probability of y being 1. Therefore, we need to set a threshold. When fw,b is greater than or equal to threshold, ŷ is 1; conversely, when fw,b is less than threshold, ŷ is 0.

When the threshold is 0.5.
When the threshold is 0.5.

Looking carefully, when z is greater than or equal to 0, fw,b, that is, g(z) will be greater than or equal to threshold 0.5. So, the left side of z = 0 is ŷ = 0, and the right side of z = 0 is ŷ = 1. This boundary is called the design boundary.

Decision boundary: z=0.
Decision boundary: z=0.

In the example below, we can calculate the boundary decision by z = 0.

Example of decision boundary.
Example of decision boundary.

Cost Function and Loss Function

Cost function is used to measure the accuracy of fw,b . With the cost function, we can measure whether the adjusted sum w is b better than the original.

The cost function of Logistic regression is as follows:

J(\vec{w},b)=\frac{1}{m}\sum^{m}_{i=1}L(f_{\vec{w},b}(\vec{x}^{(i)},y^{(i)})

Where L is the loss function as follows:

L(f_{\vec{w},b}(\vec{x}^{(i)}),y^{(i)})= \begin{cases} -log(f_{\vec{w},b}(\vec{x}^{(i)})) & \text{if } y^{(i)}=1 \\ -log(1-f_{\vec{w},b}(\vec{x}^{(i)})) & \text{if } y^{(i)}=0 \end{cases}

We can combine the two expressions in the loss function into one, as follows.

L(f_{\vec{w},b}(\vec{x}^{(i)}),y^{(i)})=-y^{(i)}log(f_{\vec{w},b}(\vec{x}^{(i)})) - (1-y^{(i)})log(1-f_{\vec{w},b}(\vec{x}^{(i)}))

The following is the cost function after the loss function is expanded.

J(\vec{w},b)=-\frac{1}{m}\sum^{m}_{i=1} \left[ y^{(i)}log(f_{\vec{w},b}(\vec{x}^{(i)})) + (1-y^{(i)})log(1-f_{\vec{w},b}(\vec{x}^{(i)})) \right]

Gradient Descent

The following is the gradient descent algorithm of logistic regression.

\text{repeat \{ } \\\\ \phantom{xxxx} w_j=w_j-\alpha \frac{\partial}{\partial w_j} J(\vec{w},b) \\\\ \phantom{xxxx} b=b-\alpha \frac{\partial}{\partial w_j} J(\vec{w},b) \\\\ \}

The derivative of the cost function is calculated as follows:

\frac{\partial}{\partial w_j}J(\vec{w},b)=\frac{1}{m} \sum^{m}_{i=1} (f_{\vec{w},b}(\vec{x}^{(i)})-y^{(i)})x_j^{(i)}

\frac{\partial}{\partial b}J(\vec{w},b)=\frac{1}{m} \sum^{m}_{i=1} (f_{\vec{w},b}(\vec{x}^{(i)})-y^{(i)})

The derivative calculation method of the cost function of logistic regression looks the same, but in fact their fw,b is different.

\text{Linear regression: } f_{\vec{w},b}(\vec{x})=\vec{w} \cdot \vec{x} +b \\ \text{Logistic regression: } f_{\vec{w},b}(\vec{x})=\frac{1}{1+e^{-(\vec{w} \cdot \vec{x} +b)}}

Conclusion

Logistic regression models are very similar to linear regression. Therefore, before learning logistic regression, you should learn linear regression first.

Reference

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like