AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence

14citations

arXiv:2502.13943

citations

#406

in ICML 2025

of 3340 papers

Top Authors

Data Points

Top Authors

Yuliang Liu Junjie Lu Chaofeng Qu Zhaoling Chen Zefan Cai Jason Liu Chonghan Liu Yunhui Xia Li Zhao Jiang Bian Chuheng Zhang Wei Shen Zhouhan Lin

Abstract

Current approaches for training Process Reward Models (PRMs) often involve deconposing responses into multiple reasoning steps using rule-based techniques, such as using predefined placeholder tokens or setting the reasoning step's length to a fixed size.These approaches overlook the fact that certain words don't usually indicate true decision points. To address this, we propose AdaptiveStep, a method that divides reasoning steps based on the model's confidence in predicting the next word, offering more information on decision-making at each step, improving downstream tasks like reward model training. Moreover, our method requires no manual annotation. Experiments with AdaptiveStep-trained PRMs in mathematical reasoning and code generation show that the outcome PRM achieves state-of-the-art Best-of-N performance, surpassing greedy search strategy with token-level value-guided decoding, while also reducing construction costs by over 30% compared to existing open-source PRMs. We also provide a thorough analysis and case study on its performance, transferability, and generalization capabilities. We provide our code on https://github.com/Lux0926/ASPRM.

Citation History

Jan 28, 2026

Feb 13, 2026

14+14

Feb 13, 2026