Poster "fine-tuning attacks" Papers
3 papers found
Conference
CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge Deployment
Qinfeng Li, Tianyue Luo, Xuhong Zhang et al.
NEURIPS 2025arXiv:2410.13903
7
citations
Safety Alignment Should be Made More Than Just a Few Tokens Deep
Xiangyu Qi, Ashwinee Panda, Kaifeng Lyu et al.
ICLR 2025arXiv:2406.05946
303
citations
Tamper-Resistant Safeguards for Open-Weight LLMs
Rishub Tamirisa, Bhrugu Bharathi, Long Phan et al.
ICLR 2025arXiv:2408.00761
113
citations