Poster "test-time intervention" Papers
3 papers found
Conference
Detoxifying Large Language Models via Autoregressive Reward Guided Representation Editing
Yisong Xiao, Aishan Liu, Siyuan Liang et al.
NEURIPS 2025arXiv:2510.01243
2
citations
Neural Causal Graph for Interpretable and Intervenable Classification
Jiawei Wang, Shaofei Lu, Da Cao et al.
ICLR 2025
1
citations
Reducing Hallucinations in Large Vision-Language Models via Latent Space Steering
Sheng Liu, Haotian Ye, James Y Zou
ICLR 2025
29
citations