Paper "safety alignment" Papers
5 papers found
Conference
DuMo: Dual Encoder Modulation Network for Precise Concept Erasure
Feng Han, Kai Chen, Chao Gong et al.
AAAI 2025paperarXiv:2501.01125
8
citations
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
Yichen Gong, Delong Ran, Jinyuan Liu et al.
AAAI 2025paperarXiv:2311.05608
302
citations
SCANS: Mitigating the Exaggerated Safety for LLMs via Safety-Conscious Activation Steering
Zouying Cao, Yifei Yang, Hai Zhao
AAAI 2025paperarXiv:2408.11491
23
citations
Sherkala-Chat: Building a State-of-the-Art LLM for Kazakh in a Moderately Resourced Setting
Fajri Koto, Rituraj Joshi, Nurdaulet Mukhituly et al.
COLM 2025paper
5
citations
The Blessing and Curse of Dimensionality in Safety Alignment
Rachel S.Y. Teo, Laziz Abdullaev, Tan Minh Nguyen
COLM 2025paperarXiv:2507.20333
6
citations