"reasoning tasks" Papers

30 papers found

Accelerating RL for LLM Reasoning with Optimal Advantage Regression

Kianté Brantley, Mingyu Chen, Zhaolin Gao et al.

NEURIPS 2025arXiv:2505.20686
12
citations

Advancing LLM Reasoning Generalists with Preference Trees

Lifan Yuan, Ganqu Cui, Hanbin Wang et al.

ICLR 2025arXiv:2404.02078
183
citations

Analyzing the Power of Chain of Thought through Memorization Capabilities

Lijia Yu, Xiao-Shan Gao, Lijun Zhang

NEURIPS 2025arXiv:2511.01190

AREAL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

Wei Fu, Jiaxuan Gao, Xujie Shen et al.

NEURIPS 2025arXiv:2505.24298
117
citations

Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards

Charles Arnal, Gaëtan Narozniak, Vivien Cabannes et al.

NEURIPS 2025arXiv:2506.20520
17
citations

Bag of Tricks for Inference-time Computation of LLM Reasoning

Fan LIU, Wen-Shuo Chao, Naiqiang Tan et al.

NEURIPS 2025arXiv:2502.07191
12
citations

Balancing Act: Diversity and Consistency in Large Language Model Ensembles

Ahmed Abdulaal, Chen Jin, Nina Montaña-Brown et al.

ICLR 2025

Benchmarking Agentic Workflow Generation

Shuofei Qiao, Runnan Fang, Zhisong Qiu et al.

ICLR 2025arXiv:2410.07869
21
citations

Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning

Jiayu Wang, Yifei Ming, Zixuan Ke et al.

NEURIPS 2025arXiv:2506.04723
2
citations

C3PO: Optimized Large Language Model Cascades with Probabilistic Cost Constraints for Reasoning

Antonios Valkanas, Soumyasundar Pal, Pavel Rumiantsev et al.

NEURIPS 2025arXiv:2511.07396

CDW-CoT: Clustered Distance-Weighted Chain-of-Thoughts Reasoning

Yuanheng Fang, Guoqing Chao, Wenqiang Lei et al.

AAAI 2025paperarXiv:2501.12226
2
citations

Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation

Liliang Ren, Congcong Chen, Haoran Xu et al.

NEURIPS 2025arXiv:2507.06607
6
citations

Enhancing Language Model Agents using Diversity of Thoughts

Vijay Chandra Lingam, Behrooz Tehrani, sujay sanghavi et al.

ICLR 2025
7
citations

Fast attention mechanisms: a tale of parallelism

Jingwen Liu, Hantao Yu, Clayton Sanford et al.

NEURIPS 2025arXiv:2509.09001

Generalization v.s. Memorization: Tracing Language Models’ Capabilities Back to Pretraining Data

Xinyi Wang, Antonis Antoniades, Yanai Elazar et al.

ICLR 2025arXiv:2407.14985
80
citations

InfiGFusion: Graph-on-Logits Distillation via Efficient Gromov-Wasserstein for Model Fusion

Yuanyi Wang, Zhaoyi Yan, Yiming Zhang et al.

NEURIPS 2025arXiv:2505.13893
3
citations

LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits

Duy Nguyen, Archiki Prasad, Elias Stengel-Eskin et al.

NEURIPS 2025arXiv:2410.01735
6
citations

Multipole Attention for Efficient Long Context Reasoning

Coleman Hooper, Sebastian Zhao, Luca Manolache et al.

NEURIPS 2025arXiv:2506.13059
3
citations

MultiVerse: A Multi-Turn Conversation Benchmark for Evaluating Large Vision and Language Models

Young-Jun Lee, Byung-Kwan Lee, Jianshu Zhang et al.

ICCV 2025arXiv:2510.16641
4
citations

On the self-verification limitations of large language models on reasoning and planning tasks

Kaya Stechly, Karthik Valmeekam, Subbarao Kambhampati

ICLR 2025arXiv:2402.08115
109
citations

PID-controlled Langevin Dynamics for Faster Sampling on Generative Models

Hongyi Chen, Jianhai Shu, Jingtao Ding et al.

NEURIPS 2025arXiv:2511.12603

The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning

Xinyu Zhu, Mengzhou Xia, Zhepei Wei et al.

NEURIPS 2025arXiv:2506.01347
89
citations

ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning

Shulin Huang, Linyi Yang, Yan Song et al.

NEURIPS 2025arXiv:2502.16268
15
citations

TTRL: Test-Time Reinforcement Learning

Yuxin Zuo, Kaiyan Zhang, Li Sheng et al.

NEURIPS 2025arXiv:2504.16084
129
citations

VinePPO: Refining Credit Assignment in RL Training of LLMs

Amirhossein Kazemnejad, Milad Aghajohari, Eva Portelance et al.

ICML 2025arXiv:2410.01679
56
citations

Language Models with Conformal Factuality Guarantees

Christopher Mohri, Tatsunori Hashimoto

ICML 2024arXiv:2402.10978
85
citations

Premise Order Matters in Reasoning with Large Language Models

Xinyun Chen, Ryan Chi, Xuezhi Wang et al.

ICML 2024arXiv:2402.08939
52
citations

Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling

Weijia Xu, Andrzej Banburski-Fahey, Nebojsa Jojic

ICML 2024arXiv:2305.09993
47
citations

Stay on Topic with Classifier-Free Guidance

Guillaume Sanchez, Alexander Spangher, Honglu Fan et al.

ICML 2024spotlightarXiv:2306.17806
73
citations

Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning

Zhiheng Xi, Wenxiang Chen, Boyang Hong et al.

ICML 2024arXiv:2402.05808
58
citations