"representation editing" Papers
4 papers found
Conference
Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit
Qizhou Chen, Taolin Zhang, Chengyu Wang et al.
AAAI 2025paperarXiv:2408.09916
6
citations
Detoxifying Large Language Models via Autoregressive Reward Guided Representation Editing
Yisong Xiao, Aishan Liu, Siyuan Liang et al.
NEURIPS 2025arXiv:2510.01243
2
citations
Re-Imagining Multimodal Instruction Tuning: A Representation View
Yiyang Liu, James Liang, Ruixiang Tang et al.
ICLR 2025arXiv:2503.00723
13
citations
SAEs Can Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs
Aashiq Muhamed, Jacopo Bonato, Mona T. Diab et al.
COLM 2025paper
17
citations