Regularization

540 views

2 minute read

When a model has poor performance, it cannot predict the data accurately. The main cause may be overfitting or underfitting. If it is a case of overfitting, we can use regularization to solve model overfitting.

ByWayne
02/07/2024

Regularization
Regularized Linear Regression
Regularized Logistic Regression
Conclusion
References

Regularization

Regularization is a method used to prevent overfitting. Suppose there is an overfitted model as shown on the lower left. We can make w₃ and w₄ very small or tend to 0 to reduce the impact of x₃ and x₄ on the model, so that the model becomes simpilier, as shown on the lower right. This is the basic idea of regularization.

Gradient descent will find the minimum value in the cost function. In the cost function, if we append $1000w_3^2$ and $1000w_4^2$ , this will cause $1000w_3^2$ and $1000w_4^2$ to be small or tend to 0 in the minimum value found by gradient descent. Therefore, by modifying the cost function, we can reduce the impact of $1000w_3^2$ and $1000w_4^2$ to the model during training.

$min_{\vec{w},b} \left[ \frac{1}{2m} \displaystyle\sum_{i=1}^m (f_{\vec{w},b}(\vec{x}^{(i)})-y^{(i)})^2) + 1000 w_3^2 + 1000 w_4^2 \right]$

The following formula is the cost function with regularization added. The formula added at the end is called regularization term, and λ is called regularization parameter. If λ is set to a very large value, such as 10¹⁰, then all W will tend to 0. Therefore, we can reduce W by adjusting λ.

$J(\vec{w},b)=\frac{1}{2m} \displaystyle\sum_{i=1}^{m} (f_{\vec{w},b}(\vec{x}^{(i)})-y^{(i)})^2 + \frac{\lambda}{2m} \displaystyle\sum_{j=1}^n w_j^2$

Regularized Linear Regression

Regularized linear regression is the cost function of linear regression with regularization term, as follows.

$min_{\vec{w},b} J(\vec{w},b) = min_{\vec{w},b} \left[ \frac{1}{2m} \displaystyle\sum_{i-1}^m (f_{\vec{w},b}(\vec{x}^{(i)})-y^{(i)})^2 + \frac{\lambda}{2m} \displaystyle\sum_{j=1}^n w_j^2 \right]$

The Gradient descent algorithm is as follows.

$\text{repeat \{} \\\\ \phantom{xxxx} w_j=w_j-\alpha \frac{\partial}{\partial w_j}J(\vec{w},b) \\\\ \phantom{xxxx} b=b-\alpha \frac{\partial}{\partial b}J(\vec{w},b) \\\\ \text{\}}$

After we expand the derivative part, it becomes the following formula.

$\text{repeat \{} \\\\ \phantom{xxxx} w_j=w_j-\alpha \left[ \frac{1}{m} \displaystyle\sum_{i=1}^m \left[ (f_{\vec{w},b}(\vec{x}^{(i)})-y^{(i)})x_j^{(i)} \right] + \frac{\lambda}{m} w_j \right] \\\\ \phantom{xxxx} b=b-\alpha \frac{1}{m} \displaystyle\sum_{i=1}^m (f_{\vec{w},b}(\vec{x}^{(i)})-y^{(i)}) \\\\ \text{\}}$

The simplified expression of w_j becomes as follows. It can be clearly seen that we can reduce w_j by adjusting λ.

$w_j=w_j(1-\alpha \frac{\lambda}{m})- \alpha \frac{1}{m} \displaystyle\sum_{i=1}^m (f_{\vec{w},b}(\vec{x}^{(i)})-y^{(i)}) x_j^{(i)}$

Regularized Logistic Regression

The cost function of logistic regression plus the regularization term will become the following formula.

$z=w_1x_1+w_2x_2+w_3x_1^2x_2+w_4x_1^2x_2^2+w_5x_1^2x_2^3+ \cdots +b \\\\ f_{\vec{w},b}(\vec{x})=\frac{1}{1+e^{-z}} \\\\ J(\vec{w},b)=-\frac{1}{m} \displaystyle\sum_{i=1}^m \left[ y^{(i)} log(f_{\vec{w},b}(\vec{x}^{(i)})) + (1-y^{(i)}) logt(1-f_{\vec{w},b}(\vec{x}^{(i)})) \right] \\\\ \phantom{xxxxxxxx} + \frac{\lambda}{2m} \displaystyle\sum_{i=1}^n w_j^2$

The Gradient descent algorithm is as follows.

$\text{repeat \{} \\\\ \phantom{xxxx} w_j=w_j-\alpha \frac{\partial}{\partial w_j}J(\vec{w},b) \\\\ \phantom{xxxx} b=b-\alpha \frac{\partial}{\partial b}J(\vec{w},b) \\\\ \text{\}}$

After expanding the derivative part, it becomes the following formula. It looks exactly the same as regularized linear regression, but it should be noted that f _{w, b} in the formula are logistic regression.

Conclusion

Regularization can reduce the size of parameters to solve overfitting. When the parameters are larger, the penalty will be larger, that is, it will be reduced a lot at a time.

References

Andrew Ng, Machine Learning Specialization, Coursera.

Get source code of posts.

Regularization

Share

Table of Contents

Regularization

Regularized Linear Regression

Regularized Logistic Regression

Conclusion

References

Wayne

Leave a Reply Cancel reply

Python Pie/Donut/Sunburst Charts

Python Candlestick Charts

Python Box/Violin Plots

Python Regression Line Plots

Python Choropleth Map

Python Heatmaps

CLIP Model

Generative Pre-trained Transformer, GPT

Bidirectional Encoder Representations from Transformers, BERT

Transformer Model

Attention Models

Spring Security JWT Authentication with Google Sign-In Explained

How to Backup and Restore MySQL Databases in Spring Boot

Sending Push Notifications Using FCM in Spring Boot

Python Pie/Donut/Sunburst Charts

Kotlin Coroutine Flow Tutorial

Spring Security JWT Authentication with Google Sign-In Explained

How to Backup and Restore MySQL Databases in Spring Boot

Sending Push Notifications Using FCM in Spring Boot

Python Pie/Donut/Sunburst Charts

Regularization

Share

Table of Contents

Regularization

Regularized Linear Regression

Regularized Logistic Regression

Conclusion

References

Leave a Reply Cancel reply

You May Also Like