PyTorch Archives - Wayne's Talk

Photo by Sestrjevitovschii Ina on Unsplash

6 views
4 minute read

Layer Normalization

ByWayne
17/06/2025

Normalization is a data transformation technique originating from statistics. It adjusts the mean and variance of data to make it more stable and predictable. In deep learning, normalization is widely used to improve the stability and efficiency of model training. This article explains the original concept of normalization, introduces the design and limitations of batch normalization, and explores how layer normalization addresses these issues to become a standard component in modern language models.

Photo by Koushik Chowdavarapu on Unsplash

6 views
6 minute read

Adam Optimizer

ByWayne
15/06/2025

When training neural networks, choosing a good optimizer is critically important. Adam is one of the most commonly used optimizers, so that it has almost become the default choice. Adam is built upon the foundations of SGD, Momentum, and RMSprop. By revisiting the evolution of these methods, we can better understand the principles behind Adam.

3 views
10 minute read

LoRA: Low-Rank Adaptation of Large Language Models

ByWayne
11/06/2025

When LLMs often have tens of billions of parameters, performing a single fine-tuning run can exhaust an entire GPU. LoRA (Low-Rank Adaptation of Large Language Models) offers a clever solution: instead of modifying the model’s original parameters directly, it learns new knowledge through low-rank matrices. This allows us to adapt the model’s behavior quickly and at very low cost, while still preserving its original performance.

4 views
5 minute read

CLIP Model

ByWayne
03/06/2025

CLIP (Contrastive Language-Image Pre-training) is a model proposed by OpenAI in 2021. It achieves strong generalization capability by integrating visual and language representations, and it has extensive potential applications. This article will introduce both the theory and practical implementation of CLIP.

133 views
15 minute read

Generative Pre-trained Transformer, GPT

ByWayne
23/04/2025

Over the past decade in the field of Natural Language Processing (NLP), the Generative Pre-trained Transformer (GPT) has undoubtedly been one of the most iconic technologies. GPT has not only redefined the approach to language modeling but also sparked a revolution centered around pre-training, leading to the rise of general-purpose language models. This article begins with an overview of the GPT architecture and delves into the design principles and technological evolution from GPT-1 to GPT-3.

Photo by Maarten van den Heuvel on Unsplash

155 views
12 minute read

Bidirectional Encoder Representations from Transformers, BERT

ByWayne
15/04/2025

Bidirectional Encoder Representations from Transformers (BERT) is a pre-training technology for natural language processing proposed by Google AI in 2018. BERT significantly advances the state of natural language processing by providing a deeper contextual understanding of language.

127 views
11 minute read

Transformer Model

ByWayne
03/04/2025

Transformer model was introduced by a team at Google Brain in 2017 and is a deep learning architecture that uses an attention mechanism. It solves significant challenges associated with traditional sequence namely capturing long-range dependencies and enabling more parallelizable computations.

156 views
10 minute read

Attention Models

ByWayne
19/03/2025

Attention mechanisms is a method in deep learning that lets a model focus on the most relevant parts of its input when producing each piece of its output. Unlike traditional sequence models that often struggle with longer inputs, attention allows models to dynamically focus on different parts of the input sequence when generating each part of the output sequence.

100 views
5 minute read

Sequence to Sequence Model (Seq2Seq)

ByWayne
15/03/2025

Sequence to Sequence (Seq2Seq) model is a neural network architecture that maps one sequence to another. It has revolutionized the field of Natural Language Processing (NLP), significantly enhancing the performance of tasks such as translation, text summarization, and chatbots. This article will dive deeply into the principles behind the Seq2Seq model.

154 views
5 minute read

GloVe Word Embeddings

ByWayne
05/03/2025

GloVe is a word embedding model that constructs word vectors based on global co-occurrence statistics. Unlike Word2Vec, which relies on local context windows, GloVe captures the overall statistical relationships between words through matrix factorization. This approach enables GloVe to generate high-quality word representations that effectively encode semantic and syntactic relationships. This article will introduce the principles and training methods of GloVe.

Get source code of posts.

PyTorch

Layer Normalization

Adam Optimizer

LoRA: Low-Rank Adaptation of Large Language Models

CLIP Model

Generative Pre-trained Transformer, GPT

Bidirectional Encoder Representations from Transformers, BERT

Transformer Model

Attention Models

Sequence to Sequence Model (Seq2Seq)

GloVe Word Embeddings

Layer Normalization

Adam Optimizer

LoRA: Low-Rank Adaptation of Large Language Models

CLIP Model

Generative Pre-trained Transformer, GPT

Spring Security JWT Authentication with Google Sign-In Explained

How to Backup and Restore MySQL Databases in Spring Boot

Sending Push Notifications Using FCM in Spring Boot

Python Pie/Donut/Sunburst Charts

Kotlin Coroutine Flow Tutorial

Spring Security JWT Authentication with Google Sign-In Explained

How to Backup and Restore MySQL Databases in Spring Boot

Sending Push Notifications Using FCM in Spring Boot

Python Pie/Donut/Sunburst Charts