Improved Off-policy Reinforcement Learning in Biological Sequence Design

10citations

arXiv:2410.04461

citations

#558

in ICML 2025

of 3340 papers

Top Authors

Data Points

Top Authors

Hyeonah Kim Minsu Kim Taeyoung Yun Sanghyeok Choi Emmanuel Bengio Alex Hernandez-Garcia Jinkyoo Park

Abstract

Designing biological sequences with desired properties is challenging due to vast search spaces and limited evaluation budgets. Although reinforcement learning methods use proxy models for rapid reward evaluation, insufficient training data can cause proxy misspecification on out-of-distribution inputs. To address this, we propose a novel off-policy search, $\delta$-Conservative Search, that enhances robustness by restricting policy exploration to reliable regions. Starting from high-score offline sequences, we inject noise by randomly masking tokens with probability $\delta$, then denoise them using our policy. We further adapt $\delta$ based on proxy uncertainty on each data point, aligning the level of conservativeness with model confidence. Experimental results show that our conservative search consistently enhances the off-policy training, outperforming existing machine learning methods in discovering high-score sequences across diverse tasks, including DNA, RNA, protein, and peptide design.

Citation History

Jan 28, 2026

Feb 13, 2026

10+10

Feb 13, 2026