Photo by Koushik Chowdavarapu on Unsplash
Read More

Adam Optimizer

When training neural networks, choosing a good optimizer is critically important. Adam is one of the most commonly used optimizers, so that it has almost become the default choice. Adam is built upon the foundations of SGD, Momentum, and RMSprop. By revisiting the evolution of these methods, we can better understand the principles behind Adam.
Read More
Photo by jean wimmerlin on Unsplash
Read More

LoRA: Low-Rank Adaptation of Large Language Models

When LLMs often have tens of billions of parameters, performing a single fine-tuning run can exhaust an entire GPU. LoRA (Low-Rank Adaptation of Large Language Models) offers a clever solution: instead of modifying the model’s original parameters directly, it learns new knowledge through low-rank matrices. This allows us to adapt the model’s behavior quickly and at very low cost, while still preserving its original performance.
Read More
Photo by Farhan Khan on Unsplash
Read More

CLIP Model

CLIP (Contrastive Language-Image Pre-training) is a model proposed by OpenAI in 2021. It achieves strong generalization capability by integrating visual and language representations, and it has extensive potential applications. This article will introduce both the theory and practical implementation of CLIP.
Read More
Photo by Logan Armstrong on Unsplash
Read More

Generative Pre-trained Transformer, GPT

Over the past decade in the field of Natural Language Processing (NLP), the Generative Pre-trained Transformer (GPT) has undoubtedly been one of the most iconic technologies. GPT has not only redefined the approach to language modeling but also sparked a revolution centered around pre-training, leading to the rise of general-purpose language models. This article begins with an overview of the GPT architecture and delves into the design principles and technological evolution from GPT-1 to GPT-3.
Read More
Photo by Anthony Tran on Unsplash
Read More

Attention Models

Attention mechanisms is a method in deep learning that lets a model focus on the most relevant parts of its input when producing each piece of its output. Unlike traditional sequence models that often struggle with longer inputs, attention allows models to dynamically focus on different parts of the input sequence when generating each part of the output sequence.
Read More
Photo by Léonard Cotte on Unsplash
Read More

Sequence to Sequence Model (Seq2Seq)

Sequence to Sequence (Seq2Seq) model is a neural network architecture that maps one sequence to another. It has revolutionized the field of Natural Language Processing (NLP), significantly enhancing the performance of tasks such as translation, text summarization, and chatbots. This article will dive deeply into the principles behind the Seq2Seq model.
Read More
Photo by Daniele Buso on Unsplash
Read More

Bi-directional Recurrent Neural Networks (BRNNs)

Bi-directional recurrent neural betworks (BRNNs) are an extension of standard RNNs specifically designed to process sequential data in both forward and backward directions. Compared to traditional RNNs, BRNN architectures maintain more comprehensive context information, enabling them to capture useful dependencies across entire sequences for improved predictions in various natural language processing and speech recognition tasks.
Read More
Photo by Kelsey Curtis on Unsplash
Read More

GloVe Word Embeddings

GloVe is a word embedding model that constructs word vectors based on global co-occurrence statistics. Unlike Word2Vec, which relies on local context windows, GloVe captures the overall statistical relationships between words through matrix factorization. This approach enables GloVe to generate high-quality word representations that effectively encode semantic and syntactic relationships. This article will introduce the principles and training methods of GloVe.
Read More