"safety benchmarks" Papers
2 papers found
Conference
AgentBreeder: Mitigating the AI Safety Risks of Multi-Agent Scaffolds via Self-Improvement
J Rosser, Jakob Foerster
NEURIPS 2025spotlightarXiv:2502.00757
6
citations
BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
Yunhan Zhao, Xiang Zheng, Lin Luo et al.
ICLR 2025arXiv:2410.20971
20
citations