by Qing-Shan Jia Papers
3 papers found
Conference
CLARIFY: Contrastive Preference Reinforcement Learning for Untangling Ambiguous Queries
Ni Mu, Hao Hu, Xiao Hu et al.
ICML 2025arXiv:2506.00388
3
citations
STAIR: Addressing Stage Misalignment through Temporal-Aligned Preference Reinforcement Learning
Yao Luan, Ni Mu, Yiqin Yang et al.
NEURIPS 2025oralarXiv:2509.23802
Query-Policy Misalignment in Preference-Based Reinforcement Learning
Xiao Hu, Jianxiong Li, Xianyuan Zhan et al.
ICLR 2024spotlightarXiv:2305.17400
15
citations