Layer Normalization
Normalization is a data transformation technique originating from statistics. It adjusts the mean and variance of data to make it more stable and predictable. In deep learning, normalization is widely used to improve the stability and efficiency of model training. This article explains the original concept of normalization, introduces the design and limitations of batch normalization, and explores how layer normalization addresses these issues to become a standard component in modern language models.