Incremental Implementation

36 views

3 minute read

In Reinforcement Learning (RL), many algorithms may appear different in form, yet their core update mechanisms are highly similar. At the implementation level, they all rely on a common numerical estimation approach. This approach is not an independent algorithm, but rather a computational technique for gradually approximating an expectation. Understanding this mechanism helps clarify the fundamental differences among various reinforcement learning methods.

ByWayne
15/12/2025

Incremental Implementation
The Mathematical Essence of Incremental
Conclusion
References

Incremental Implementation

In Reinforcement Learning (RL), incremental implementation are not independent algorithms, but numerical techniques for estimating expectations. The key idea is to avoid recomputing an expectation from scratch using all available data. Instead, the estimate is adjusted by a small amount each time a new sample is observed.

$\text{new estimate} \space \leftarrow \space \text{old estimate} \space + \space \text{step size} \space (\text{target} \space - \space \text{old estimate})$

The difference between the target and the old estimate can be interpreted as the error of the current estimate. The essence of an incremental update is to adjust the estimate in the direction of this error.

$\text{error} \space = \space (\text{target} \space - \space \text{old estimate})$

The Mathematical Essence of Incremental

Given samples $X_1, X_2, \cdots, X_n$ , the sample mean can be computed as follows. This is the batch form of the sample mean.

$\displaystyle \bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i$

However, the sample mean can also be computed in an incremental form, as shown below. This is a typical incremental update.

$\displaystyle \bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i \\\\ \hphantom{\bar{X}_n} = \frac{1}{n} (X_n + \sum_{i=1}^{n-1} X_i) \\\\ \hphantom{\bar{X}_n} = \frac{1}{n} \Big( X_n + (n-1) \frac{1}{n-1} \sum_{i=1}^{n-1} X_i \Big) \\\\ \hphantom{\bar{X}_n} = \frac{1}{n} \Big( X_n + (n-1) \bar{X}_{n-1} \Big) \\\\ \hphantom{\bar{X}_n} = \frac{1}{n} (X_n + n\bar{X}_{n-1} - \bar{X}_{n-1}) \\\\ \hphantom{\bar{X}_n} = \bar{X}_{n-1} + \frac{1}{n} (X_n - \bar{X}_{n-1})$

In this formulation, the step size is $\alpha_n = \frac{1}{n}$ , and the target is $X_n$ . In the incremental form, each update can be viewed as using the newly observed sample $X_n$ as the current sample target.

It is worth noting that as more samples are collected, the step size $\alpha_n$ becomes smaller and smaller. As a result, even if new data indicate a significant change in the mean, the estimate may become sluggish and fail to track this change because the step size is too small. Under a non-stationary data distribution, as $\alpha_n = \frac{1}{n}$ approaches zero, the estimate gradually loses its ability to adapt to new data. In this setting, if we wish the estimate to remain sensitive to recent samples, we can instead use a constant step size $\alpha$ . The incremental update with a constant step size is given as follows:

$\bar{X}_n = \bar{X}_{n-1} + \alpha (X_n - \bar{X}_{n-1})$

By expanding the above equation, we can see that older samples have progressively less influence on the estimate.

$\displaystyle \bar{X}_n = \bar{X}_{n-1} + \alpha (X_n - \bar{X}_{n-1}) \\\\ \hphantom{\bar{X}_n} = \alpha X_n + (1-\alpha)\bar{X}_{n-1} \\\\ \hphantom{\bar{X}_n} = \alpha X_n + (1-\alpha) \Big( \alpha X_{n-1} + (1-\alpha) \bar{X}_{n-2} \Big) \\\\ \hphantom{\bar{X}_n} = \alpha X_n + (1-\alpha) \alpha X_{n-1} + (1-\alpha)^2 \bar{X}_{n-2} \\\\ \hphantom{\bar{X}_n} = \alpha X_n + (1-\alpha) \alpha X_{n-1} + (1-\alpha)^2 \alpha X_{n-2} + \cdots + (1-\alpha)^{n-1} \alpha X_2 + (1-\alpha)^n \bar{X}_1 \\\\ \hphantom{\bar{X}_n} = (1-\alpha)^n \bar{X}_1 + \sum_{i=2}^n \alpha (1-\alpha)^{n-i} X_i$

Conclusion

Incremental implementation provides a unified perspective for understanding how value estimates are updated across different reinforcement learning methods. By choosing appropriate step sizes and targets, an algorithm can continuously refine its estimate of an expectation under constraints of limited memory and online interaction. When the step size decreases with the number of samples, the estimate converges to the conventional sample mean; when a constant step size is used, the estimate retains the ability to adapt to new data in non-stationary environments. As will be seen later, different reinforcement learning methods arise primarily from defining different targets while applying the same incremental update framework.

References

Richard S. Sutton and Andrew G. Barto. 2020. Reinforcement Learning: An Introduction, 2^nd. The MIT Press.

Get source code of posts.

Incremental Implementation

Share

Table of Contents

Incremental Implementation

The Mathematical Essence of Incremental

Conclusion

References

Wayne

Leave a Reply Cancel reply

Python Pie/Donut/Sunburst Charts

Python Candlestick Charts

Python Box/Violin Plots

Python Regression Line Plots

Python Choropleth Map

Python Heatmaps

Bradley-Terry Model

Entropy

Byte-Pair Encoding

Policy Gradient

On-Policy Control with Approximation

Spring Security JWT Authentication with Google Sign-In Explained

How to Backup and Restore MySQL Databases in Spring Boot

Sending Push Notifications Using FCM in Spring Boot

Python Pie/Donut/Sunburst Charts

Kotlin Coroutine Flow Tutorial

Spring Security JWT Authentication with Google Sign-In Explained

How to Backup and Restore MySQL Databases in Spring Boot

Sending Push Notifications Using FCM in Spring Boot

Python Pie/Donut/Sunburst Charts

Incremental Implementation

Share

Table of Contents

Incremental Implementation

The Mathematical Essence of Incremental

Conclusion

References

Leave a Reply Cancel reply

You May Also Like