Deep Learning Archives - Wayne's Talk

115 views
15 minute read

Generative Pre-trained Transformer, GPT

ByWayne
23/04/2025

Over the past decade in the field of Natural Language Processing (NLP), the Generative Pre-trained Transformer (GPT) has undoubtedly been one of the most iconic technologies. GPT has not only redefined the approach to language modeling but also sparked a revolution centered around pre-training, leading to the rise of general-purpose language models. This article begins with an overview of the GPT architecture and delves into the design principles and technological evolution from GPT-1 to GPT-3.

Photo by Maarten van den Heuvel on Unsplash

143 views
12 minute read

Bidirectional Encoder Representations from Transformers, BERT

ByWayne
15/04/2025

Bidirectional Encoder Representations from Transformers (BERT) is a pre-training technology for natural language processing proposed by Google AI in 2018. BERT significantly advances the state of natural language processing by providing a deeper contextual understanding of language.

115 views
11 minute read

Transformer Model

ByWayne
03/04/2025

Transformer model was introduced by a team at Google Brain in 2017 and is a deep learning architecture that uses an attention mechanism. It solves significant challenges associated with traditional sequence namely capturing long-range dependencies and enabling more parallelizable computations.

147 views
10 minute read

Attention Models

ByWayne
19/03/2025

Attention mechanisms is a method in deep learning that lets a model focus on the most relevant parts of its input when producing each piece of its output. Unlike traditional sequence models that often struggle with longer inputs, attention allows models to dynamically focus on different parts of the input sequence when generating each part of the output sequence.

95 views
5 minute read

Sequence to Sequence Model (Seq2Seq)

ByWayne
15/03/2025

Sequence to Sequence (Seq2Seq) model is a neural network architecture that maps one sequence to another. It has revolutionized the field of Natural Language Processing (NLP), significantly enhancing the performance of tasks such as translation, text summarization, and chatbots. This article will dive deeply into the principles behind the Seq2Seq model.

146 views
10 minute read

Bi-directional Recurrent Neural Networks (BRNNs)

ByWayne
10/03/2025

Bi-directional recurrent neural betworks (BRNNs) are an extension of standard RNNs specifically designed to process sequential data in both forward and backward directions. Compared to traditional RNNs, BRNN architectures maintain more comprehensive context information, enabling them to capture useful dependencies across entire sequences for improved predictions in various natural language processing and speech recognition tasks.

141 views
5 minute read

GloVe Word Embeddings

ByWayne
05/03/2025

GloVe is a word embedding model that constructs word vectors based on global co-occurrence statistics. Unlike Word2Vec, which relies on local context windows, GloVe captures the overall statistical relationships between words through matrix factorization. This approach enables GloVe to generate high-quality word representations that effectively encode semantic and syntactic relationships. This article will introduce the principles and training methods of GloVe.

228 views
10 minute read

Word2Vec Word Embedding Model

ByWayne
03/03/2025

Word2Vec is a model for learning word embeddings, which converts words and their semantics into vectors through neural networks. Word2Vec provides two training methods: CBOW and Skip-gram, and improves efficiency through Negative Sampling and Subsampling technologies. This article will introduce the basic principles and training methods of Word2Vec.

133 views
9 minute read

Gated Recurrent Unit (GRU)

ByWayne
24/02/2025

The gated recurrent unit (GRU) is a type of RNN specifically designed to process sequential data. Similar to the long short-term memory (LSTM), it is designed to solve the long-term dependency problem of the standard RNN.

153 views
11 minute read

Long Short-Term Memory (LSTM)

ByWayne
24/02/2025

Long short-term (LSTM ) is a type of RNN specifically designed to process sequential data. Compared to standard RNNs, LSTM networks are able to maintain useful long-term dependencies for making predictions in the current and future time steps.

Get source code of posts.

Generative Pre-trained Transformer, GPT

Deep Learning

Generative Pre-trained Transformer, GPT

Bidirectional Encoder Representations from Transformers, BERT

Transformer Model

Attention Models

Sequence to Sequence Model (Seq2Seq)

Bi-directional Recurrent Neural Networks (BRNNs)

GloVe Word Embeddings

Word2Vec Word Embedding Model

Gated Recurrent Unit (GRU)

Long Short-Term Memory (LSTM)

Generative Pre-trained Transformer, GPT

Bidirectional Encoder Representations from Transformers, BERT

Transformer Model

Attention Models

Sequence to Sequence Model (Seq2Seq)

Spring Security JWT Authentication with Google Sign-In Explained

How to Backup and Restore MySQL Databases in Spring Boot

Sending Push Notifications Using FCM in Spring Boot

Python Pie/Donut/Sunburst Charts

Kotlin Coroutine Flow Tutorial

Spring Security JWT Authentication with Google Sign-In Explained

How to Backup and Restore MySQL Databases in Spring Boot

Sending Push Notifications Using FCM in Spring Boot

Python Pie/Donut/Sunburst Charts