Linear regression is a data analysis technique that uses linear functions to predict unknown data. Although the linear regression model is relatively simple, it is a mature statistical technique.
Table of Contents
Simple Linear Regression
Model Function
The model function of linear regression is as follows. where w
and b
are parameters. When we pass in the variable x
, the function fw,b
will return a predicted value ŷ
.
It should be noted that fw,b
returns a predicted value ŷ
instead of the actual value y
, as shown in the figure below. When the ŷ
predicted by fw,b
is very close to y
, then we can say that the accuracy of fw,b
is very high. We will improve the accuracy of fw, b
by adjusting w
and b
.
Cost Function
Cost function is used to measure the accuracy of fw,b
. With the cost function, we can measure whether the adjusted w
and b
are better than the original ones.
The cost function of linear regression is squared error, as follows.
We ultimately want to find a pair of w
and b
such that J(w,b)
will be minimal.
Gradient Descent
Although we have the cost function, we still don’t know how to choose a better pair of w
and b
than the current one. Therefore, we need to use gradient descent to help us pick a better pair of w
and b
than the current one.
Basically the gradient descent algorithm is as shown below. First, pick a random pair of w
and b
, so it may be on the left or right side of the parabola, and then select the next pair of w
and b
in the valley direction. Repeat this action until J(w,b)
cannot be smaller.
The next question is how to know in which direction the valley of the cost function is? We can differentiate the cost function J(w,b)
and get the current slope. With the slope, we know which way to move. Repeat this action until the slope reaches 0, and we know we have reached the valley.
The following is the gradient descent algorithm.
where the derivative of the cost function is calculated as follows.
Learning Rate
In gradient descent, there is a learning rate called . It can be seen from the gradient descent algorithm that when is small, it takes more times to get close to the valley, so the performance will be slow. On the contrary, it will reach the valley more quickly. However, when is too large, it may not reach the valley.
Multiple Linear Regression
Model Function
So far, what we have introduced is simple linear regression, which has only one variable and may not be very practical. Next, we will introduce multiple linear regression.
Multiple linear regression has more than one variable. The following is the model function of 4 variables.
Therefore, the model function of multiple linear regression is as follows.
Vectorized model function is as follow.
Cost Function
The cost function of multiple linear regression is as follows.
The vectorized cost function is as follows.
Gradient Descent
The gradient descent of Multiple linear regression is as follows.
where the derivative of the cost function is calculated as follows.
Conclusion
Linear regression is a relatively simple model. Great for explaining the mathematics behind the model. However, it is still very practical.
Reference
- Andrew Ng, Machine Learning Specialization, Coursera.