"self-play algorithms" Papers
2 papers found
Conference
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
Gokul Swamy, Christoph Dann, Rahul Kidambi et al.
ICML 2024arXiv:2401.04056
139
citations
Learning Diverse Risk Preferences in Population-Based Self-Play
Yuhua Jiang, Qihan Liu, Xiaoteng Ma et al.
AAAI 2024paperarXiv:2305.11476
8
citations