Regret-Free Reinforcement Learning for Temporal Logic Specifications

0citations

citations

#2278

in ICML 2025

of 3340 papers

Top Authors

Data Points

Top Authors

R Majumdar Mahmoud Salamati Sadegh Soudjani

Abstract

Learning to control an unknown dynamical system with respect to high-level temporal specifications is an important problem in control theory. We present the first regret-free online algorithm for learning a controller for linear temporal logic (LTL) specifications for systems with unknown dynamics.We assume that the underlying (unknown) dynamics is modeled by a finite-state and action Markov decision process (MDPs).Our core technical result is a regret-free learning algorithm for infinite-horizon reach-avoid problems on MDPs.For general LTL specifications, we show that the synthesis problem can be reduced to a reach-avoid problem once the graph structure is known.Additionally, we provide an algorithm for learning the graph structure, assuming knowledge of a minimum transition probability, which operates independently of the main regret-free algorithm. Our LTL controller synthesis algorithm provides sharp bounds on how close we are to achieving optimal behavior after a finite number of learning episodes.In contrast, previous algorithms for LTL synthesis only provide asymptotic guarantees, which give no insight into the transient performance during the learning phase.

Citation History

Jan 28, 2026