Poster "large language models safety" Papers
2 papers found
Conference
Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturbation
Tiansheng Huang, Sihao Hu, Fatih Ilhan et al.
ICLR 2025arXiv:2409.01586
60
citations
Jailbreak Antidote: Runtime Safety-Utility Balance via Sparse Representation Adjustment in Large Language Models
Guobin Shen, Dongcheng Zhao, Yiting Dong et al.
ICLR 2025arXiv:2410.02298
13
citations