To the Max: Reinventing Reward in Reinforcement Learning

11citations

arXiv:2402.01361 PDF Project

citations

#942

in ICML 2024

of 2635 papers

Top Authors

Data Points

Top Authors

Grigorii Veviurko Wendelin Boehmer Mathijs de Weerdt

Topics

reinforcement learning reward function design max-reward optimization goal-reaching environments stochastic environments optimal policy learning

Abstract

In reinforcement learning (RL), different reward functions can define the same optimal policy but result in drastically different learning performance. For some, the agent gets stuck with a suboptimal behavior, and for others, it solves the task efficiently. Choosing a good reward function is hence an extremely important yet challenging problem. In this paper, we explore an alternative approach for using rewards for learning. We introducemax-reward RL, where an agent optimizes the maximum rather than the cumulative reward. Unlike earlier works, our approach works for deterministic and stochastic environments and can be easily combined with state-of-the-art RL algorithms. In the experiments, we study the performance of max-reward RL algorithms in two goal-reaching environments from Gymnasium-Robotics and demonstrate its benefits over standard RL. The code is available at https://github.com/veviurko/To-the-Max.

Citation History

Jan 28, 2026

Feb 13, 2026

11+11

Feb 13, 2026