"model evaluation bias" Papers
2 papers found
Conference
Limits to scalable evaluation at the frontier: LLM as judge won’t beat twice the data
Florian Eddie Dorner, Vivian Nastl, Moritz Hardt
ICLR 2025
24
citations
Silencer: From Discovery to Mitigation of Self-Bias in LLM-as-Benchmark-Generator
Peiwen Yuan, Yiwei Li, Shaoxiong Feng et al.
NEURIPS 2025arXiv:2505.20738
3
citations