Photo by Aivars Vilks on Unsplash
Read More

Byte-Pair Encoding

Byte-Pair Encoding (BPE) is a frequency-based symbol merging algorithm that was originally proposed as a data compression method. In natural language processing (NLP), BPE has been reinterpreted as a subword tokenization technique that strikes a balance between characters and full words. By automatically learning high-frequency fragments from data, BPE can construct a scalable vocabulary effectively without relying on any language-specific knowledge.
Read More
Photo by Courtney Cook on Unsplash
Read More

Chain-of-Thought, CoT

The performance of LLMs on reasoning tasks has undergone substantial change in recent years with the introduction of Chain-of-Thought (CoT) prompting. This technique guides an LLM to produce step-by-step intermediate reasoning, enabling the model to exhibit a human-like structure of thought. As task complexity increases, however, the limitations of traditional CoT have become more apparent, motivating a series of follow-up methods designed to address these issues. This article presents an overview of CoT and its extensions.
Read More
Photo by jean wimmerlin on Unsplash
Read More

LoRA: Low-Rank Adaptation of Large Language Models

When LLMs often have tens of billions of parameters, performing a single fine-tuning run can exhaust an entire GPU. LoRA (Low-Rank Adaptation of Large Language Models) offers a clever solution: instead of modifying the model’s original parameters directly, it learns new knowledge through low-rank matrices. This allows us to adapt the model’s behavior quickly and at very low cost, while still preserving its original performance.
Read More
Photo by Logan Armstrong on Unsplash
Read More

Generative Pre-trained Transformer, GPT

Over the past decade in the field of Natural Language Processing (NLP), the Generative Pre-trained Transformer (GPT) has undoubtedly been one of the most iconic technologies. GPT has not only redefined the approach to language modeling but also sparked a revolution centered around pre-training, leading to the rise of general-purpose language models. This article begins with an overview of the GPT architecture and delves into the design principles and technological evolution from GPT-1 to GPT-3.
Read More
Photo by Anthony Tran on Unsplash
Read More

Attention Models

Attention mechanisms is a method in deep learning that lets a model focus on the most relevant parts of its input when producing each piece of its output. Unlike traditional sequence models that often struggle with longer inputs, attention allows models to dynamically focus on different parts of the input sequence when generating each part of the output sequence.
Read More
Photo by Léonard Cotte on Unsplash
Read More

Sequence to Sequence Model (Seq2Seq)

Sequence to Sequence (Seq2Seq) model is a neural network architecture that maps one sequence to another. It has revolutionized the field of Natural Language Processing (NLP), significantly enhancing the performance of tasks such as translation, text summarization, and chatbots. This article will dive deeply into the principles behind the Seq2Seq model.
Read More
Photo by Daniele Buso on Unsplash
Read More

Bi-directional Recurrent Neural Networks (BRNNs)

Bi-directional recurrent neural betworks (BRNNs) are an extension of standard RNNs specifically designed to process sequential data in both forward and backward directions. Compared to traditional RNNs, BRNN architectures maintain more comprehensive context information, enabling them to capture useful dependencies across entire sequences for improved predictions in various natural language processing and speech recognition tasks.
Read More