Poster by Wayne Chi Papers
2 papers found
Conference
Copilot Arena: A Platform for Code LLM Evaluation in the Wild
Wayne Chi, Valerie Chen, Anastasios Angelopoulos et al.
ICML 2025arXiv:2502.09328
17
citations
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
Frank (Fangzheng) Xu, Yufan Song, Boxuan Li et al.
NEURIPS 2025arXiv:2412.14161
105
citations