Photo by Sestrjevitovschii Ina on Unsplash
Read More

Layer Normalization

Normalization is a data transformation technique originating from statistics. It adjusts the mean and variance of data to make it more stable and predictable. In deep learning, normalization is widely used to improve the stability and efficiency of model training. This article explains the original concept of normalization, introduces the design and limitations of batch normalization, and explores how layer normalization addresses these issues to become a standard component in modern language models.
Read More
Photo by Koushik Chowdavarapu on Unsplash
Read More

Adam Optimizer

When training neural networks, choosing a good optimizer is critically important. Adam is one of the most commonly used optimizers, so that it has almost become the default choice. Adam is built upon the foundations of SGD, Momentum, and RMSprop. By revisiting the evolution of these methods, we can better understand the principles behind Adam.
Read More