Poster "evaluation reliability" Papers
3 papers found
Conference
Distributional LLM-as-a-Judge
Luyu Chen, Zeyu Zhang, Haoran Tan et al.
NEURIPS 2025
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
Jiayi Ye, Yanbo Wang, Yue Huang et al.
ICLR 2025arXiv:2410.02736
229
citations
xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation
Qingchen Yu, Zifan Zheng, Shichao Song et al.
ICLR 2025arXiv:2405.11874
15
citations