Artificial Intelligence Archives

90 views
10 minute read

Bradley-Terry Model

ByWayne
12/01/2026

In many machine learning and decision-making systems, what we encounter is not a directly measurable quality score, but rather a large number of preference judgments in the form of pairwise comparisons, that is deciding which of two options is better. Although such pairwise comparison data is simple in form, it implicitly contains rich structural information. Starting from a probabilistic semantics perspective, this article will gradually explain how the Bradley–Terry model can transform these preference comparisons into a learnable representation of latent utilities.

18 views
16 minute read

Entropy

ByWayne
10/01/2026

In probabilistic modeling and machine learning, entropy is a fundamental concept for quantifying uncertainty. It not only describes the inherent randomness of data, but also implicitly captures the minimum information cost required in prediction and modeling. Many learning objectives that may appear different on the surface, such as maximizing log-likelihood or designing loss functions, can in fact be traced back and understood through the lens of entropy.

47 views
8 minute read

Byte-Pair Encoding

ByWayne
07/01/2026

Byte-Pair Encoding (BPE) is a frequency-based symbol merging algorithm that was originally proposed as a data compression method. In natural language processing (NLP), BPE has been reinterpreted as a subword tokenization technique that strikes a balance between characters and full words. By automatically learning high-frequency fragments from data, BPE can construct a scalable vocabulary effectively without relying on any language-specific knowledge.

66 views
16 minute read

Policy Gradient

ByWayne
24/12/2025

In RL control problems, most methods take value functions as the core learning object, improving the policy indirectly by estimating long-term returns. However, when the state or action space becomes continuous, or when the policy itself must remain stochastic, this approach becomes less direct. Policy gradient methods adopt a different perspective by treating the policy itself as the object of optimization, directly performing gradient ascent on the expected return.

42 views
16 minute read

On-Policy Control with Approximation

ByWayne
22/12/2025

In practical control problems, the state and action spaces are often high-dimensional, continuous, and noisy, which makes reinforcement learning algorithms based on tabular methods difficult to apply directly. Once function approximation is introduced, the two components that are conceptually well separated in theory of value evaluation and policy improvement become tightly intertwined, bringing with them challenges related to stability and variance. This article focuses on on-policy control methods under function approximation, with particular attention to Sarsa.

Photo by Federico Di Dio photography on Unsplash

49 views
14 minute read

On-Policy Prediction with Approximation

ByWayne
19/12/2025

This chapter focuses on on-policy prediction with approximation, systematically organizing the learning objectives for value estimation under this setting, the feasible learning methods, and the solutions to which they actually converge. By contrasting Gradient Monte Carlo with Semi-Gradient TD(0), we will see the unavoidable trade-offs that arise between theoretically well-defined objectives and methods that are practically viable.

44 views
10 minute read

Dyna Architecture

ByWayne
17/12/2025

In reinforcement learning (RL), an agent often needs to learn an effective decision policy under conditions where real interactions with the environment are limited and costly. Relying solely on real experience is conceptually straightforward, but it is often constrained by poor data efficiency and slow learning speed. Conversely, relying entirely on planning with a model may introduce bias when the model is inaccurate. The Dyna architecture was proposed to strike a balance between these two extremes by integrating acting, learning, and planning within a single learning process.

60 views
14 minute read

Temporal-Difference Learning, TD

ByWayne
16/12/2025

In Reinforcement Learning (RL), Dynamic Programming (DP) offers the most complete and mathematically explicit solution framework. However, its reliance on a known environment model makes it difficult to apply directly to real-world settings. Monte Carlo (MC) methods, in contrast, learn from experience without requiring a model, but they must wait until the end of an entire episode before performing updates, resulting in relatively coarse learning granularity. Temporal Difference (TD) learning represents a compromise between these two approaches: it does not require a model, yet it can update value estimates incrementally after each interaction step.

34 views
3 minute read

Incremental Implementation

ByWayne
15/12/2025

In Reinforcement Learning (RL), many algorithms may appear different in form, yet their core update mechanisms are highly similar. At the implementation level, they all rely on a common numerical estimation approach. This approach is not an independent algorithm, but rather a computational technique for gradually approximating an expectation. Understanding this mechanism helps clarify the fundamental differences among various reinforcement learning methods.

46 views
14 minute read

Monte Carlo Methods, MC

ByWayne
14/12/2025

In Dynamic Programming (DP), having a complete environment model is a prerequisite for exact computation. However, this assumption rarely holds in most real-world problems. Monte Carlo (MC) methods choose to forgo reliance on an explicit model and instead learn directly from complete experiences generated through interaction with the environment. By sampling and averaging episode returns, MC provides a practical pathway for estimating value functions grounded in actual experience.

Get source code of posts.

Bradley-Terry Model

Artificial Intelligence

Bradley-Terry Model

Entropy

Byte-Pair Encoding

Policy Gradient

On-Policy Control with Approximation

On-Policy Prediction with Approximation

Dyna Architecture

Temporal-Difference Learning, TD

Incremental Implementation

Monte Carlo Methods, MC

Bradley-Terry Model

Entropy

Byte-Pair Encoding

Policy Gradient

On-Policy Control with Approximation

Spring Security JWT Authentication with Google Sign-In Explained

How to Backup and Restore MySQL Databases in Spring Boot

Sending Push Notifications Using FCM in Spring Boot

Python Pie/Donut/Sunburst Charts

Kotlin Coroutine Flow Tutorial

Spring Security JWT Authentication with Google Sign-In Explained

How to Backup and Restore MySQL Databases in Spring Boot

Sending Push Notifications Using FCM in Spring Boot

Python Pie/Donut/Sunburst Charts