Topics
Abstract
Chain-of-Thought (CoT) reasoning has demonstrated remarkable effectiveness in enhancing the reasoning abilities of large language models (LLMs). However, its efficiency remains a challenge due to excessive intermediate reasoning tokens, which introduce both semantic redundancy and unnecessarily detailed reasoning steps. Moreover, the computational expense and latency remain high, as the cost is determined by the number of output tokens, which encompasses these intermediate steps. In this work, we observe that most CoT tokens are unnecessary, and retaining only a small portion of them is sufficient for high-quality responses. Inspired by this, we propose Hawkeye, a novel post-training and inference framework where a large model produce concise CoT instructions to guide a smaller model in response generation. Hawkeye quantifies redundancy in CoT reasoning and distills high-density information via reinforcement learning. By leveraging these concise CoTs, Hawkeye is able to expand responses while reducing token usage and computational cost significantly. Our evaluation results show that Hawkeye can achieve comparable response quality using only 35\% of the complete CoTs while improving clarity, coherence, and conciseness by approximately 10\%. Furthermore, Hawkeye can accelerate end-to-end reasoning by up to 3.4× on complex math tasks while saving up tp 60\% inference cost. Hawkeye will be open-sourced and the models will be available soon.