In machine learning, before we take the data to train a model, we usually need to perform feature scaling on the data. This article will introduce some feature scaling methods.
Table of Contents
Feature Scaling and Gradient Descent
When the range of values of each feature of the data is too different, it may cause gradient descent run slowly. If we perform feature scaling on each feature so that they are within a comparable range, it will speed up the execution speed of gradient descent.
Rescaling (Min-Max Normalization)
Rescaling is also called min-max normalization. It takes the value minus the minimum and divides it by the maximum minus the minimum. This scales the values to be between [0, 1].
Mean Normalization
Mean Normalization subtracts the mean value from the value and divides it by the maximum value minus the minimum value. This scales values to between [-1, 1].
Standardization (Z-score Normalization)
Standardization is also called z-score normalization. It subtracts the mean value from the value and divides it by the standard deviation. It causes the values in each feature to have an average of 0 and a standard deviation of 1. This method is widely used in many machine learning algorithms.
Conclusion
Although feature scaling is not difficult, it can significantly speed up gradient descent. Therefore, after we collect the data, before starting to train the model, we can first analyze whether the value range of each feature is too different. If so, remember to do feature scaling first before training the model.
Reference
- Andrew Ng, Machine Learning Specialization, Coursera.