On Shallow Planning Under Partial Observability

2
citations
#1457
in AAAI 2025
of 3028 papers
2
Top Authors
5
Data Points

Abstract

Formulating a real-world problem under the Reinforcement Learning framework involves non-trivial design choices, such as selecting a discount factor for the learning objective (discounted cumulative rewards), which articulates the planning horizon of the agent. This work investigates the impact of the discount factor on the bias-variance trade-off given structural parameters of the underlying Markov Decision Process. Our results support the idea that a shorter planning horizon might be beneficial, especially under partial observability.

Citation History

Jan 27, 2026
1
Feb 4, 2026
1
Feb 13, 2026
1
Feb 13, 2026
2+1
Feb 13, 2026
2