Oral "reinforcement learning from human feedback" Papers
4 papers found
Conference
A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning
Yuzheng Hu, Fan Wu, Haotian Ye et al.
NEURIPS 2025oralarXiv:2505.19281
3
citations
MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization
Yougang Lyu, Lingyong Yan, Zihan Wang et al.
ICLR 2025oralarXiv:2410.07672
16
citations
MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions
Yekun Chai, Haoran Sun, Huang Fang et al.
ICLR 2025oralarXiv:2410.02743
9
citations
CogBench: a large language model walks into a psychology lab
Julian Coda-Forno, Marcel Binz, Jane Wang et al.
ICML 2024oralarXiv:2402.18225
57
citations