Poster "llm benchmarking" Papers
3 papers found
Conference
ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments
Hojae Han, seung-won hwang, Rajhans Samdani et al.
ICLR 2025arXiv:2502.19852
13
citations
DataGen: Unified Synthetic Dataset Generation via Large Language Models
Yue Huang, Siyuan Wu, Chujie Gao et al.
ICLR 2025arXiv:2406.18966
21
citations
Risk Management for Mitigating Benchmark Failure Modes: BenchRisk
Sean McGregor, Vassil Tashev, Armstrong Foundjem et al.
NEURIPS 2025arXiv:2510.21460
1
citations