"safety neurons" Papers
2 papers found
Conference
Towards Understanding Safety Alignment: A Mechanistic Perspective from Safety Neurons
Jianhui Chen, Xiaozhi Wang, Zijun Yao et al.
NEURIPS 2025arXiv:2406.14144
24
citations
Understanding and Enhancing Safety Mechanisms of LLMs via Safety-Specific Neuron
Yiran Zhao, Wenxuan Zhang, Yuxi Xie et al.
ICLR 2025
29
citations