"robustness improvement" Papers
3 papers found
Conference
Robust LLM safeguarding via refusal feature adversarial training
Lei Yu, Virginie Do, Karen Hambardzumyan et al.
ICLR 2025arXiv:2409.20089
45
citations
Transformers Learn Low Sensitivity Functions: Investigations and Implications
Bhavya Vasudeva, Deqing Fu, Tianyi Zhou et al.
ICLR 2025arXiv:2403.06925
8
citations
PIDformer: Transformer Meets Control Theory
Tam Nguyen, Cesar Uribe, Tan Nguyen et al.
ICML 2024arXiv:2402.15989
12
citations