"black-box defense" Papers
2 papers found
Conference
BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
Yunhan Zhao, Xiang Zheng, Lin Luo et al.
ICLR 2025arXiv:2410.20971
20
citations
Probe before You Talk: Towards Black-box Defense against Backdoor Unalignment for Large Language Models
Biao Yi, Tiansheng Huang, Sishuo Chen et al.
ICLR 2025arXiv:2506.16447
23
citations