Incremental Implementation

Photo by Andriyko Podilnyk on Unsplash
Photo by Andriyko Podilnyk on Unsplash
In Reinforcement Learning (RL), many algorithms may appear different in form, yet their core update mechanisms are highly similar. At the implementation level, they all rely on a common numerical estimation approach. This approach is not an independent algorithm, but rather a computational technique for gradually approximating an expectation. Understanding this mechanism helps clarify the fundamental differences among various reinforcement learning methods.

In Reinforcement Learning (RL), many algorithms may appear different in form, yet their core update mechanisms are highly similar. At the implementation level, they all rely on a common numerical estimation approach. This approach is not an independent algorithm, but rather a computational technique for gradually approximating an expectation. Understanding this mechanism helps clarify the fundamental differences among various reinforcement learning methods.

Incremental Implementation

In Reinforcement Learning (RL), incremental implementation are not independent algorithms, but numerical techniques for estimating expectations. The key idea is to avoid recomputing an expectation from scratch using all available data. Instead, the estimate is adjusted by a small amount each time a new sample is observed.

\text{new estimate} \space \leftarrow \space \text{old estimate} \space + \space \text{step size} \space (\text{target} \space - \space \text{old estimate})

The difference between the target and the old estimate can be interpreted as the error of the current estimate. The essence of an incremental update is to adjust the estimate in the direction of this error.

\text{error} \space = \space (\text{target} \space - \space \text{old estimate})

The Mathematical Essence of Incremental

Given samples X_1, X_2, \cdots, X_n, the sample mean can be computed as follows. This is the batch form of the sample mean.

\displaystyle \bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i

However, the sample mean can also be computed in an incremental form, as shown below. This is a typical incremental update.

\displaystyle \bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i \\\\ \hphantom{\bar{X}_n} = \frac{1}{n} (X_n + \sum_{i=1}^{n-1} X_i) \\\\ \hphantom{\bar{X}_n} = \frac{1}{n} \Big( X_n + (n-1) \frac{1}{n-1} \sum_{i=1}^{n-1} X_i \Big) \\\\ \hphantom{\bar{X}_n} = \frac{1}{n} \Big( X_n + (n-1) \bar{X}_{n-1} \Big) \\\\ \hphantom{\bar{X}_n} = \frac{1}{n} (X_n + n\bar{X}_{n-1} - \bar{X}_{n-1}) \\\\ \hphantom{\bar{X}_n} = \bar{X}_{n-1} + \frac{1}{n} (X_n - \bar{X}_{n-1})

In this formulation, the step size is \alpha_n = \frac{1}{n}, and the target is X_n. In the incremental form, each update can be viewed as using the newly observed sample X_n as the current sample target.

It is worth noting that as more samples are collected, the step size \alpha_n becomes smaller and smaller. As a result, even if new data indicate a significant change in the mean, the estimate may become sluggish and fail to track this change because the step size is too small. Under a non-stationary data distribution, as \alpha_n = \frac{1}{n} approaches zero, the estimate gradually loses its ability to adapt to new data. In this setting, if we wish the estimate to remain sensitive to recent samples, we can instead use a constant step size \alpha. The incremental update with a constant step size is given as follows:

\bar{X}_n = \bar{X}_{n-1} + \alpha (X_n - \bar{X}_{n-1})

By expanding the above equation, we can see that older samples have progressively less influence on the estimate.

\displaystyle \bar{X}_n = \bar{X}_{n-1} + \alpha (X_n - \bar{X}_{n-1}) \\\\ \hphantom{\bar{X}_n} = \alpha X_n + (1-\alpha)\bar{X}_{n-1} \\\\ \hphantom{\bar{X}_n} = \alpha X_n + (1-\alpha) \Big( \alpha X_{n-1} + (1-\alpha) \bar{X}_{n-2} \Big) \\\\ \hphantom{\bar{X}_n} = \alpha X_n + (1-\alpha) \alpha X_{n-1} + (1-\alpha)^2 \bar{X}_{n-2} \\\\ \hphantom{\bar{X}_n} = \alpha X_n + (1-\alpha) \alpha X_{n-1} + (1-\alpha)^2 \alpha X_{n-2} + \cdots + (1-\alpha)^{n-1} \alpha X_2 + (1-\alpha)^n \bar{X}_1 \\\\ \hphantom{\bar{X}_n} = (1-\alpha)^n \bar{X}_1 + \sum_{i=2}^n \alpha (1-\alpha)^{n-i} X_i

Conclusion

Incremental implementation provides a unified perspective for understanding how value estimates are updated across different reinforcement learning methods. By choosing appropriate step sizes and targets, an algorithm can continuously refine its estimate of an expectation under constraints of limited memory and online interaction. When the step size decreases with the number of samples, the estimate converges to the conventional sample mean; when a constant step size is used, the estimate retains the ability to adapt to new data in non-stationary environments. As will be seen later, different reinforcement learning methods arise primarily from defining different targets while applying the same incremental update framework.

References

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like