α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Xiangyu Qi
Xiangyu Qi
6
papers
1,693
total citations
papers (6)
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
ICLR 2024
arXiv
966
citations
Safety Alignment Should be Made More Than Just a Few Tokens Deep
ICLR 2025
arXiv
303
citations
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
ICML 2024
arXiv
184
citations
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
ICLR 2025
arXiv
151
citations
Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks
CVPR 2022
arXiv
74
citations
BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection
ICLR 2024
arXiv
15
citations