Poster "adversarial prompts" Papers
4 papers found
Conference
AutoPrompt: Automated Red-Teaming of Text-to-Image Models via LLM-Driven Adversarial Prompts
Yufan Liu, Wanqian Zhang, Huashan Chen et al.
ICCV 2025arXiv:2510.24034
1
citations
Breach By A Thousand Leaks: Unsafe Information Leakage in 'Safe' AI Responses
David Glukhov, Ziwen Han, I Shumailov et al.
ICLR 2025arXiv:2407.02551
10
citations
Short-length Adversarial Training Helps LLMs Defend Long-length Jailbreak Attacks: Theoretical and Empirical Evidence
Shaopeng Fu, Liang Ding, Jingfeng ZHANG et al.
NEURIPS 2025arXiv:2502.04204
6
citations
T2V-OptJail: Discrete Prompt Optimization for Text-to-Video Jailbreak Attacks
Jiayang Liu, Siyuan Liang, Shiqian Zhao et al.
NEURIPS 2025arXiv:2505.06679
6
citations