by Florian Eddie Dorner Papers
2 papers found
Conference
Limits to scalable evaluation at the frontier: LLM as judge won’t beat twice the data
Florian Eddie Dorner, Vivian Nastl, Moritz Hardt
ICLR 2025
24
citations
Training on the Test Task Confounds Evaluation and Emergence
Ricardo Dominguez-Olmedo, Florian Eddie Dorner, Moritz Hardt
ICLR 2025arXiv:2407.07890
24
citations