Paper "steering vectors" Papers
2 papers found
Conference
One-shot Optimized Steering Vectors Mediate Safety-relevant Behaviors in LLMs
Jacob Dunefsky, Arman Cohan
COLM 2025paperarXiv:2502.18862
8
citations
SAEs Can Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs
Aashiq Muhamed, Jacopo Bonato, Mona T. Diab et al.
COLM 2025paper
17
citations