"model vulnerabilities" Papers
3 papers found
Conference
EFFICIENT JAILBREAK ATTACK SEQUENCES ON LARGE LANGUAGE MODELS VIA MULTI-ARMED BANDIT-BASED CONTEXT SWITCHING
Aditya Ramesh, Shivam Bhardwaj, Aditya Saibewar et al.
ICLR 2025
3
citations
Memories of Forgotten Concepts
Matan Rusanovsky, Shimon Malnick, Amir Jevnisek et al.
CVPR 2025highlightarXiv:2412.00782
6
citations
VERA: Variational Inference Framework for Jailbreaking Large Language Models
Anamika Lochab, Lu Yan, Patrick Pynadath et al.
NEURIPS 2025arXiv:2506.22666
1
citations