Spotlight "language model evaluation" Papers
2 papers found
Conference
Absence Bench: Language Models Can’t See What’s Missing
Harvey Yiyun Fu, Aryan Shrivastava, Jared Moore et al.
NEURIPS 2025spotlight
Signal and Noise: A Framework for Reducing Uncertainty in Language Model Evaluation
David Heineman, Valentin Hofmann, Ian Magnusson et al.
NEURIPS 2025spotlightarXiv:2508.13144
6
citations