Resolving Lexical Bias in Model Editing

1
citations
#1918
in ICML 2025
of 3340 papers
4
Top Authors
4
Data Points

Abstract

Model editing aims to modify the outputs of large language models after they are trained. Previous approaches have often involved direct alterations to model weights, which can result in model degradation. Recent techniques avoid making modifications to the model's weights by using an adapter that applies edits to the model when triggered by semantic similarity in the representation space. We demonstrate that current adapter methods arecritically vulnerableto strong lexical biases, leading to issues such as applying edits to irrelevant prompts with overlapping words. This paper presents a principled approach to learning a disentangled representation space that facilitates precise localization of edits by maintaining distance between irrelevant prompts while preserving proximity among paraphrases. In our empirical study, we show that our method (Projector Editor Networks for Model Editing - PENME) achieves state-of-the-art model editing results while being more computationally efficient during inference than previous methods and adaptable across different architectures.

Citation History

Jan 28, 2026
0
Feb 13, 2026
1+1
Feb 13, 2026
1
Feb 13, 2026
1