"evaluation reliability" Papers
4 papers found
Conference
Distributional LLM-as-a-Judge
Luyu Chen, Zeyu Zhang, Haoran Tan et al.
NEURIPS 2025
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
Jiayi Ye, Yanbo Wang, Yue Huang et al.
ICLR 2025arXiv:2410.02736
229
citations
xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation
Qingchen Yu, Zifan Zheng, Shichao Song et al.
ICLR 2025arXiv:2405.11874
15
citations
Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling
Jie Ruan, Xiao Pu, Mingqi Gao et al.
AAAI 2024paperarXiv:2406.07967
8
citations