Dynamic Programming, DP
In Reinforcement Learning (RL), Dynamic Programming (DP) is the earliest and most complete solution framework. Although DP is almost impossible to apply directly to practical high-dimensional or continuous environments, it reveals the mathematical foundations of all core concepts in modern RL. At a fundamental level, the convergence objectives and update rules of all RL algorithms are derived from the Bellman Equations and the Generalized Policy Iteration (GPI) framework used in DP.













