by Tim Rocktaeschel Papers
4 papers found
Conference
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
Davide Paglieri, Bartłomiej Cupiał, Samuel Coward et al.
ICLR 2025arXiv:2411.13543
74
citations
Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
Laura Ruis, Maximilian Mozes, Juhan Bae et al.
ICLR 2025arXiv:2411.12580
28
citations
H-GAP: Humanoid Control with a Generalist Planner
Zhengyao Jiang, Yingchen Xu, Nolan Wagener et al.
ICLR 2024spotlightarXiv:2312.02682
11
citations
Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks
Samyak Jain, Robert Kirk, Ekdeep Singh Lubana et al.
ICLR 2024arXiv:2311.12786
97
citations