COLM
418 papers tracked across 1 years
Top Papers in COLM 2025
View all papers →Understanding R1-Zero-Like Training: A Critical Perspective
Zichen Liu, Changyu Chen, Wenjun Li et al.
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Bowen Jin, Hansi Zeng, Zhenrui Yue et al.
Tulu 3: Pushing Frontiers in Open Language Model Post-Training
Nathan Lambert, Jacob Morrison, Valentina Pyatkin et al.
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
Weihao Zeng, Yuzhen Huang, Qian Liu et al.
LIMO: Less is More for Reasoning
Yixin Ye, Zhen Huang, Yang Xiao et al.
Training Large Language Models to Reason in a Continuous Latent Space
Shibo Hao, Sainbayar Sukhbaatar, DiJia Su et al.
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs
Kanishk Gandhi, Ayush K Chakravarthy, Anikait Singh et al.
FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios
Ethan Chern, Steffi Chern, Shiqi Chen et al.
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
Pranjal Aggarwal, Sean Welleck
PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling
Zefan Cai, Yichi Zhang, Bofei Gao et al.
SmolVLM: Redefining small and efficient multimodal models
Andrés Marafioti, Orr Zohar, Miquel Farré et al.
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Rosie Zhao, Alexandru Meterez, Sham M. Kakade et al.
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving
Yong Lin, Shange Tang, Bohan Lyu et al.
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Bo Peng, Ruichong Zhang, Daniel Goldstein et al.
Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents
Saaket Agashe, Kyle Wong, Vincent Tu et al.
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
Andreas Hochlehnert, Hardik Bhatnagar, Vishaal Udandarao et al.
Why do LLMs attend to the first token?
Federico Barbero, Alvaro Arroyo, Xiangming Gu et al.
AIOS: LLM Agent Operating System
Kai Mei, Xi Zhu, Wujiang Xu et al.
LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K
Tao Yuan, Xuefei Ning, Dong Zhou et al.
An Illusion of Progress? Assessing the Current State of Web Agents
Tianci Xue, Weijian Qi, Tianneng Shi et al.