"benchmark contamination" Papers
2 papers found
Conference
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Naman Jain, Han, Alex Gu et al.
ICLR 2025arXiv:2403.07974
1108
citations
Pretraining on the Test Set Is No Longer All You Need: A Debate-Driven Approach to QA Benchmarks
Linbo Cao, Jinman Zhao
COLM 2025paperarXiv:2507.17747
3
citations