Poster "reinforcement learning algorithms" Papers
4 papers found
Conference
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
Jiarui Yao, Yifan Hao, Hanning Zhang et al.
NEURIPS 2025arXiv:2505.02391
13
citations
Position: Benchmarking is Limited in Reinforcement Learning Research
Scott Jordan, Adam White, Bruno da Silva et al.
ICML 2024arXiv:2406.16241
14
citations
SAPG: Split and Aggregate Policy Gradients
Jayesh Singla, Ananye Agarwal, Deepak Pathak
ICML 2024arXiv:2407.20230
13
citations
Slow and Steady Wins the Race: Maintaining Plasticity with Hare and Tortoise Networks
Hojoon Lee, Hyeonseo Cho, Hyunseung Kim et al.
ICML 2024arXiv:2406.02596
29
citations