rleak.com - Spot the Future of AI Research

#1

Understanding R1-Zero-Like Training: A Critical Perspective

Zichen Liu, Changyu Chen, Wenjun Li et al.

COLM 2025

714

citations

#2

Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Bowen Jin, Hansi Zeng, Zhenrui Yue et al.

COLM 2025

694

citations

#3

Tulu 3: Pushing Frontiers in Open Language Model Post-Training

Nathan Lambert, Jacob Morrison, Valentina Pyatkin et al.

COLM 2025

494

citations

#4

SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild

Weihao Zeng, Yuzhen Huang, Qian Liu et al.

COLM 2025

390

citations

#5

LIMO: Less is More for Reasoning

Yixin Ye, Zhen Huang, Yang Xiao et al.

COLM 2025

383

citations

#6

Training Large Language Models to Reason in a Continuous Latent Space

Shibo Hao, Sainbayar Sukhbaatar, DiJia Su et al.

COLM 2025

357

citations

#7

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

Kanishk Gandhi, Ayush K Chakravarthy, Anikait Singh et al.

COLM 2025

318

citations

#8

FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios

Ethan Chern, Steffi Chern, Shiqi Chen et al.

COLM 2025

276

citations

#9

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

Pranjal Aggarwal, Sean Welleck

COLM 2025

250

citations

#10

PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

Zefan Cai, Yichi Zhang, Bofei Gao et al.

COLM 2025

204

citations

#11

SmolVLM: Redefining small and efficient multimodal models

Andrés Marafioti, Orr Zohar, Miquel Farré et al.

COLM 2025

125

citations

#12

Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining

Rosie Zhao, Alexandru Meterez, Sham M. Kakade et al.

COLM 2025

87

citations

#13

Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving

Yong Lin, Shange Tang, Bohan Lyu et al.

COLM 2025

82

citations

#14

RWKV-7 "Goose" with Expressive Dynamic State Evolution

Bo Peng, Ruichong Zhang, Daniel Goldstein et al.

COLM 2025

76

citations

#15

Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents

Saaket Agashe, Kyle Wong, Vincent Tu et al.

COLM 2025

73

citations

#16

A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility

Andreas Hochlehnert, Hardik Bhatnagar, Vishaal Udandarao et al.

COLM 2025

71

citations

#17

Why do LLMs attend to the first token?

Federico Barbero, Alvaro Arroyo, Xiangming Gu et al.

COLM 2025

63

citations

#18

AIOS: LLM Agent Operating System

Kai Mei, Xi Zhu, Wujiang Xu et al.

COLM 2025

62

citations

#19

LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K

Tao Yuan, Xuefei Ning, Dong Zhou et al.

COLM 2025

62

citations

#20

An Illusion of Progress? Assessing the Current State of Web Agents

Tianci Xue, Weijian Qi, Tianneng Shi et al.

COLM 2025

56

citations

COLM

Top Papers in COLM 2025

Understanding R1-Zero-Like Training: A Critical Perspective

Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Tulu 3: Pushing Frontiers in Open Language Model Post-Training

SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild

LIMO: Less is More for Reasoning

Training Large Language Models to Reason in a Continuous Latent Space

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

SmolVLM: Redefining small and efficient multimodal models

Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining

Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving

RWKV-7 "Goose" with Expressive Dynamic State Evolution

Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents

A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility

Why do LLMs attend to the first token?

AIOS: LLM Agent Operating System

LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K

An Illusion of Progress? Assessing the Current State of Web Agents