Linear Regression

Photo by Louis Hansel @shotsoflouis on Unsplash
Photo by Louis Hansel @shotsoflouis on Unsplash
Linear regression is a data analysis technique that uses linear functions to predict unknown data. Although the linear regression model is relatively simple, it is a mature statistical technique.

Linear regression is a data analysis technique that uses linear functions to predict unknown data. Although the linear regression model is relatively simple, it is a mature statistical technique.

Simple Linear Regression

Model Function

The model function of linear regression is as follows. where w and b are parameters. When we pass in the variable x , the function fw,b will return a predicted value ŷ.

\hat{y}^{(i)}=f_{w,b}(x^{(i)})=wx^{(i)}+b \\\\ \text{paramters}: w,b

It should be noted that fw,b returns a predicted value ŷ instead of the actual value y, as shown in the figure below. When the ŷ predicted by fw,b is very close to y, then we can say that the accuracy of fw,b is very high. We will improve the accuracy of fw, b by adjusting w and b.

Linear regression's model function.
Linear regression’s model function.

Cost Function

Cost function is used to measure the accuracy of fw,b. With the cost function, we can measure whether the adjusted w and b are better than the original ones.

The cost function of linear regression is squared error, as follows.

J(w,b)=\frac{1}{2m}\sum_{i=1}^{m-1}(\hat{y}^{(i)}-y^{(i)})^2=\frac{1}{2m}\sum_{i=1}^{m-1}(f_{w,b}(x^{(i)})-y^{(i)})^2 \\\\ \text{Find } w,b: \hat{y}^{(i)} \text{ is close to } y^{(i)} \text{ for all } (x^{(i)},y^{(i)}) \\\\ \text{Objective: } minimize_{w,b}J(w,b)

We ultimately want to find a pair of w and b such that J(w,b) will be minimal.

Linear regression's cost function: Square error.
Linear regression’s cost function: Square error.

Gradient Descent

Although we have the cost function, we still don’t know how to choose a better pair of w and b than the current one. Therefore, we need to use gradient descent to help us pick a better pair of w and b than the current one.

Basically the gradient descent algorithm is as shown below. First, pick a random pair of w and b, so it may be on the left or right side of the parabola, and then select the next pair of w and b in the valley direction. Repeat this action until J(w,b) cannot be smaller.

Finding w, b.
Finding w, b.

The next question is how to know in which direction the valley of the cost function is? We can differentiate the cost function J(w,b) and get the current slope. With the slope, we know which way to move. Repeat this action until the slope reaches 0, and we know we have reached the valley.

Gradient Descent Algorithm.
Gradient Descent Algorithm.

The following is the gradient descent algorithm.

\text{repeat until convergence } \{ \\\\ \phantom{xxxx} w=w-\alpha \frac{\partial J(w,b)}{\partial w} \\\\ \phantom{xxxx} b=b-\alpha \frac{\partial J(w,b)}{\partial b} \\\\ \}

where the derivative of the cost function is calculated as follows.

\frac{\partial J(w,b)}{\partial w}=\frac{1}{m} \sum^{m-1}_{i=0} (f_{w,b}(x^{(i)})-y^{(i)}) x^{(i)} \\\\ \frac{\partial J(w,b)}{\partial b}=\frac{1}{m} \sum^{m-1}_{i=0} (f_{w,b}(x^{(i)})-y^{(i)})

Learning Rate

In gradient descent, there is a learning rate called \alpha. It can be seen from the gradient descent algorithm that when \alpha is small, it takes more times to get close to the valley, so the performance will be slow. On the contrary, it will reach the valley more quickly. However, when \alpha is too large, it may not reach the valley.

When learning rate is too large, we may not reach the minimum.
When learning rate is too large, we may not reach the minimum.

Multiple Linear Regression

Model Function

So far, what we have introduced is simple linear regression, which has only one variable and may not be very practical. Next, we will introduce multiple linear regression.

Multiple linear regression has more than one variable. The following is the model function of 4 variables.

f_{w,b}(x)=w_1x_1+w_2x_2+w_3x_3+w_4x_4+b

Therefore, the model function of multiple linear regression is as follows.

f_{w,b}(x)=w_1x_1+w_2x_2+\cdots+w_nx_n+b \\\\ \vec{w}=[w_1,w_2,\cdots,w_n] \\\\ \vec{x}=[x_1,x_2,\cdots,x_n]

Vectorized model function is as follow.

f_{\vec{w},b}=\vec{w} \cdot \vec{x}+b

Cost Function

The cost function of multiple linear regression is as follows.

J(w_1,\cdots,w_n,b)

The vectorized cost function is as follows.

J(\vec{w},b)

Gradient Descent

The gradient descent of Multiple linear regression is as follows.

\text{repeat } \{ \\\\ \phantom{xxxx} w_j=w_j-\alpha \frac{\partial}{\partial w_j} J(\vec{w},b) \text{, for } j=1,\cdots,n \\\\ \phantom{xxxx} b=b-\alpha \frac{\partial}{\partial b} J(\vec{w},b) \\\\ \}

where the derivative of the cost function is calculated as follows.

\frac{\partial J(\textbf{w},b)}{\partial w_j}=\frac{1}{m} \sum^{m-1}_{i=0} (f_{\textbf{w},b}(x^{(i)})-y^{(i)})(x^{(i)}_j) \\\\ \frac{\partial J(\textbf{w},b)}{\partial b}=\frac{1}{m} \sum^{m-1}_{i=0} (f_{\textbf{w},b} (x^{(i)})-y^{(i)})

Conclusion

Linear regression is a relatively simple model. Great for explaining the mathematics behind the model. However, it is still very practical.

Reference

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like