"evaluation framework" Papers
5 papers found
Conference
Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness
Rongzhe Wei, Peizhi Niu, Hans Hao-Hsun Hsu et al.
NEURIPS 2025arXiv:2506.05735
9
citations
JudgeBench: A Benchmark for Evaluating LLM-Based Judges
Sijun Tan, Siyuan Zhuang, Kyle Montgomery et al.
ICLR 2025arXiv:2410.12784
163
citations
Scalable Evaluation and Neural Models for Compositional Generalization
Giacomo Camposampiero, Pietro Barbiero, Michael Hersche et al.
NEURIPS 2025arXiv:2511.02667
Assessing Large Language Models on Climate Information
Jannis Bulian, Mike Schäfer, Afra Amini et al.
ICML 2024arXiv:2310.02932
34
citations
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Mantas Mazeika, Long Phan, Xuwang Yin et al.
ICML 2024arXiv:2402.04249
802
citations