by Karolina Stanczak Papers
2 papers found
Conference
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
Xing Han Lù, Amirhossein Kazemnejad, Nicholas Meade et al.
COLM 2025paperarXiv:2504.08942
23
citations
SafeArena: Evaluating the Safety of Autonomous Web Agents
Ada Tur, Nicholas Meade, Xing Han Lù et al.
ICML 2025arXiv:2503.04957
39
citations