"safety vulnerabilities" Papers
5 papers found
Conference
BadRobot: Jailbreaking Embodied LLM Agents in the Physical World
Hangtao Zhang, Chenyu Zhu, Xianlong Wang et al.
ICLR 2025
6
citations
Endless Jailbreaks with Bijection Learning
Brian R.Y. Huang, Max Li, Leonard Tang
ICLR 2025arXiv:2410.01294
16
citations
MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models
Chejian Xu, Jiawei Zhang, Zhaorun Chen et al.
ICLR 2025arXiv:2503.14827
11
citations
Reasoning as an Adaptive Defense for Safety
Taeyoun Kim, Fahim Tajwar, Aditi Raghunathan et al.
NEURIPS 2025arXiv:2507.00971
11
citations
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
Xiangming Gu, Xiaosen Zheng, Tianyu Pang et al.
ICML 2024arXiv:2402.08567
103
citations