"llm-based evaluation" Papers
3 papers found
Conference
SSRB: Direct Natural Language Querying to Massive Heterogeneous Semi-Structured Data
Xin Zhang, Mingxin Li, Yanzhao Zhang et al.
NEURIPS 2025
Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement
Jaehun Jung, Faeze Brahman, Yejin Choi
ICLR 2025arXiv:2407.18370
45
citations
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
Weihao Yu, Zhengyuan Yang, Linjie Li et al.
ICML 2024arXiv:2308.02490
1066
citations