Poster "language model evaluation" Papers
8 papers found
Conference
DATE-LM: Benchmarking Data Attribution Evaluation for Large Language Models
Cathy Jiao, Yijun Pan, Emily Xiao et al.
NEURIPS 2025arXiv:2507.09424
Eliminating Position Bias of Language Models: A Mechanistic Approach
Ziqi Wang, Hanlin Zhang, Xiner Li et al.
ICLR 2025arXiv:2407.01100
50
citations
ImpScore: A Learnable Metric For Quantifying The Implicitness Level of Sentences
Yuxin Wang, Xiaomeng Zhu, Weimin Lyu et al.
ICLR 2025arXiv:2411.05172
2
citations
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
Jiayi Ye, Yanbo Wang, Yue Huang et al.
ICLR 2025arXiv:2410.02736
229
citations
RADAR: Benchmarking Language Models on Imperfect Tabular Data
Ken Gu, Zhihan Zhang, Kate Lin et al.
NEURIPS 2025arXiv:2506.08249
2
citations
Towards more rigorous evaluations of language models
Desi R Ivanova, Ilija Ilievski, Momchil Konstantinov
ICLR 2025
Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners?
Andreas Opedal, Alessandro Stolfo, Haruki Shirakami et al.
ICML 2024arXiv:2401.18070
24
citations
Open-Domain Text Evaluation via Contrastive Distribution Methods
Sidi Lu, Hongyi Liu, Asli Celikyilmaz et al.
ICML 2024arXiv:2306.11879
1
citations