"coding benchmarks" Papers
3 papers found
Conference
Breakpoint: Stress-testing systems-level reasoning in LLM agents
Kaivalya Hariharan, Uzay Girit, Zifan Wang et al.
COLM 2025paper
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space
Zhen Zhang, Xuehai He, Weixiang Yan et al.
NEURIPS 2025arXiv:2505.15778
48
citations
Magicoder: Empowering Code Generation with OSS-Instruct
Yuxiang Wei, Zhe Wang, Jiawei Liu et al.
ICML 2024arXiv:2312.02120
208
citations