"language model benchmarking" Papers
3 papers found
Conference
Predicting Empirical AI Research Outcomes with Language Models
Jiaxin Wen, Chenglei Si, Yueh-Han Chen et al.
NEURIPS 2025arXiv:2506.00794
5
citations
RealMath: A Continuous Benchmark for Evaluating Language Models on Research-Level Mathematics
Jie Zhang, Cezara Petrui, Kristina Nikolić et al.
NEURIPS 2025arXiv:2505.12575
12
citations
SWEb: A Large Web Dataset for the Scandinavian Languages
Tobias Norlund, Tim Isbister, Amaru Cuba Gyllensten et al.
ICLR 2025arXiv:2410.04456
1
citations