"concept activation vectors" Papers
3 papers found
Conference
Controlling Large Language Models Through Concept Activation Vectors
Hanyu Zhang, Xiting Wang, Chengao Li et al.
AAAI 2025paperarXiv:2501.05764
20
citations
GCAV: A Global Concept Activation Vector Framework for Cross-Layer Consistency in Interpretability
Zhenghao He, Sanchit Sinha, Guangzhi Xiong et al.
ICCV 2025arXiv:2508.21197
Steering LLMs' Behavior with Concept Activation Vectors
Ruixuan HUANG, Shuai Wang
ICLR 2025