Hawkeye: Model Collaboration for Efficient Reasoning

0citations

PDF Project

citations

#337

in COLM 2025

of 418 papers

Top Authors

Data Points

Top Authors

Jianshu She Zhuohao Li Zhemin Huang Qi Li Peiran Xu Haonan Li Qirong Ho

Topics

reinforcement learning (with human feedback)fine-tuning compression decoding algorithms reasoning algorithms

Abstract

Chain-of-Thought (CoT) reasoning has demonstrated remarkable effectiveness in enhancing the reasoning abilities of large language models (LLMs). However, its efficiency remains a challenge due to excessive intermediate reasoning tokens, which introduce both semantic redundancy and unnecessarily detailed reasoning steps. Moreover, the computational expense and latency remain high, as the cost is determined by the number of output tokens, which encompasses these intermediate steps. In this work, we observe that most CoT tokens are unnecessary, and retaining only a small portion of them is sufficient for high-quality responses. Inspired by this, we propose Hawkeye, a novel post-training and inference framework where a large model produce concise CoT instructions to guide a smaller model in response generation. Hawkeye quantifies redundancy in CoT reasoning and distills high-density information via reinforcement learning. By leveraging these concise CoTs, Hawkeye is able to expand responses while reducing token usage and computational cost significantly. Our evaluation results show that Hawkeye can achieve comparable response quality using only 35\% of the complete CoTs while improving clarity, coherence, and conciseness by approximately 10\%. Furthermore, Hawkeye can accelerate end-to-end reasoning by up to 3.4× on complex math tasks while saving up tp 60\% inference cost. Hawkeye will be open-sourced and the models will be available soon.

Citation History

Feb 13, 2026