When a model has poor performance, it cannot predict the data accurately. The main cause may be overfitting or underfitting. If it is a case of overfitting, we can use regularization to solve model overfitting.
Table of Contents
Regularization
Regularization is a method used to prevent overfitting. Suppose there is an overfitted model as shown on the lower left. We can make w3
and w4
very small or tend to 0 to reduce the impact of x3
and x4
on the model, so that the model becomes simpilier, as shown on the lower right. This is the basic idea of regularization.
Gradient descent will find the minimum value in the cost function. In the cost function, if we append and , this will cause and to be small or tend to 0 in the minimum value found by gradient descent. Therefore, by modifying the cost function, we can reduce the impact of and to the model during training.
The following formula is the cost function with regularization added. The formula added at the end is called regularization term, and λ
is called regularization parameter. If λ
is set to a very large value, such as 1010, then all W
will tend to 0. Therefore, we can reduce W
by adjusting λ
.
Regularized Linear Regression
Regularized linear regression is the cost function of linear regression with regularization term, as follows.
The Gradient descent algorithm is as follows.
After we expand the derivative part, it becomes the following formula.
The simplified expression of wj
becomes as follows. It can be clearly seen that we can reduce wj
by adjusting λ
.
Regularized Logistic Regression
The cost function of logistic regression plus the regularization term will become the following formula.
The Gradient descent algorithm is as follows.
After expanding the derivative part, it becomes the following formula. It looks exactly the same as regularized linear regression, but it should be noted that f w, b
in the formula are logistic regression.
Conclusion
Regularization can reduce the size of parameters to solve overfitting. When the parameters are larger, the penalty will be larger, that is, it will be reduced a lot at a time.
Reference
- Andrew Ng, Machine Learning Specialization, Coursera.