Poster "llm agent evaluation" Papers
2 papers found
Conference
AgentAuditor: Human-level Safety and Security Evaluation for LLM Agents
Hanjun Luo, Shenyu Dai, Chiming Ni et al.
NEURIPS 2025arXiv:2506.00641
18
citations
SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security Tasks
Hwiwon Lee, Ziqi Zhang, Hanxiao Lu et al.
NEURIPS 2025arXiv:2506.11791
16
citations