Overfitting and Underfitting

Photo by Jeremy Bishop on Unsplash
Photo by Jeremy Bishop on Unsplash
Overfitting and underfitting are the root causes for poor model accuracy. Only by being able to determine whether a model is overfitting or underfitting can we take the correct approach to improve the performance of the model.

Overfitting and underfitting are the root causes for poor model accuracy. Only by being able to determine whether a model is overfitting or underfitting can we take the correct approach to improve the performance of the model.

Overfitting and Underfitting

When the model does not fit the training data well, we call it underfitting. It can also be said that the model has high bias. The underfitted model cannot accurately capture the training data, so the predicted data will not be accurate enough.

The figure below is an example of underfitting of the regression model. We can see that the model does not fit the training data well enough.

Underfitting example of regression.
Underfitting example of regression.

Classification models also have underfitting situations. As shown in the figure below, the model cannot classify the two objects well.

Underfitting example of classification.
Underfitting example of classification.

On the contrary, when the model fits the training data completely or very well, it is possible that when predicting new data, the error between the predicted value and the actual value will be very large. This is called overfitting, and it can also be said that the model has high variance. As shown in the figure below, the regression model accurately captures every training data, but becomes very inaccurate when predicting new data.

Overfitting example of regression.
Overfitting example of regression.

The figure below is an overfitting example of the classification model.

Overfitting example of classification.
Overfitting example of classification.

When the model fits the training data appropriately and does not fit completely, but the error is small, we call it generalization. It can also be said that the model has low bias and low variance. When such a model predicts new data, the error between the predicted value and the real value will be very small. This is the model we want to train.

The figure below is an example of generalization of the regression model. We can see that although the model does not capture every training data completely accurately, its error is small.

Generalization example of regression.
Generalization example of regression.

The figure below is an example of generalization of the classification model. We can see that the model completely classify the square object, although there are two prototype objects in it.

Generalization example of classification.
Generalization example of classification.

How to Solve Overfitting and Underfitting?

When overfitting occurs in the model, we can try to solve it in the following ways.

  • Collect more training data. Using more training data to train the model will make the model smoother, thus solving the overfitting problem.
  • Use fewer features. Using too many features in a model can make the model complex. This can lead to overfitting, as some features may simply be inappropriate data. Therefore, removing those inappropriate features to simplify the model can solve the overfitting problem.
  • Use regularization to reduce the size of parameters.

However, when the model is underfitting, we can try to solve it in the following ways.

  • Use more features to make the model more complex so that the model can better fit the training data.

For details on regularization, please refer to the following article.

Conclusion

After reading this article, you should probably understand overfitting and underfitting, and their importance. The solutions for overfitting and underfitting are different. Therefore, when the accuracy of a model is low, if we cannot distinguish whether it is overfitting or underfitting, we will not be able to adjust the model correctly.

Reference

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like