"relearning attacks" Papers
2 papers found
Conference
Bits Leaked per Query: Information-Theoretic Bounds for Adversarial Attacks on LLMs
Masahiro Kaneko, Timothy Baldwin
NEURIPS 2025spotlightarXiv:2510.17000
Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning
Chongyu Fan, Jiancheng Liu, Licong Lin et al.
NEURIPS 2025arXiv:2410.07163
81
citations