Poster "large language model evaluation" Papers
7 papers found
Conference
Auto-Vocabulary Semantic Segmentation
Osman Ülger, Maksymilian Kulicki, Yuki Asano et al.
ICCV 2025arXiv:2312.04539
4
citations
BenchmarkCards: Standardized Documentation for Large Language Model Benchmarks
Anna Sokol, Elizabeth Daly, Michael Hind et al.
NEURIPS 2025arXiv:2410.12974
2
citations
BenTo: Benchmark Reduction with In-Context Transferability
Hongyu Zhao, Ming Li, Lichao Sun et al.
ICLR 2025
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows"
Yifei Ming, Senthil Purushwalkam, Shrey Pandit et al.
ICLR 2025
45
citations
How Benchmark Prediction from Fewer Data Misses the Mark
Guanhua Zhang, Florian E. Dorner, Moritz Hardt
NEURIPS 2025arXiv:2506.07673
5
citations
MMQA: Evaluating LLMs with Multi-Table Multi-Hop Complex Questions
Jian Wu, Linyi Yang, Dongyuan Li et al.
ICLR 2025
23
citations
Dynamic Evaluation of Large Language Models by Meta Probing Agents
Kaijie Zhu, Jindong Wang, Qinlin Zhao et al.
ICML 2024arXiv:2402.14865
55
citations