"llm vulnerabilities" Papers
2 papers found
Conference
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Maksym Andriushchenko, francesco croce, Nicolas Flammarion
ICLR 2025arXiv:2404.02151
401
citations
Safety Alignment Should be Made More Than Just a Few Tokens Deep
Xiangyu Qi, Ashwinee Panda, Kaifeng Lyu et al.
ICLR 2025arXiv:2406.05946
303
citations