Paper "benchmark evaluation" Papers
5 papers found
Conference
CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models
Zihui Cheng, Qiguang Chen, Jin Zhang et al.
AAAI 2025paperarXiv:2412.12932
30
citations
PokerBench: Training Large Language Models to Become Professional Poker Players
Richard Zhuang, Akshat Gupta, Richard Yang et al.
AAAI 2025paperarXiv:2501.08328
8
citations
Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark
Fangjun Li, David C. Hogg, Anthony G. Cohn
AAAI 2024paperarXiv:2401.03991
53
citations
Benchmarking Large Language Models in Retrieval-Augmented Generation
Jiawei Chen, Hongyu Lin, Xianpei Han et al.
AAAI 2024paperarXiv:2309.01431
475
citations
RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting
Lei Shu, Liangchen Luo, Jayakumar Hoskere et al.
AAAI 2024paperarXiv:2305.15685
78
citations