"model unlearning" Papers
9 papers found
Conference
Concept Bottleneck Large Language Models
Chung-En Sun, Tuomas Oikarinen, Berk Ustun et al.
ICLR 2025arXiv:2412.07992
26
citations
ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning
Ruchika Chavhan, Da Li, Timothy Hospedales
ICLR 2025arXiv:2405.19237
37
citations
Distillation Robustifies Unlearning
Bruce W, Lee, Addie Foote, Alex Infanger et al.
NEURIPS 2025spotlightarXiv:2506.06278
6
citations
Explainable Reinforcement Learning from Human Feedback to Improve Alignment
Shicheng Liu, Siyuan Xu, Wenjie Qiu et al.
NEURIPS 2025arXiv:2512.13837
Exploring and Leveraging Class Vectors for Classifier Editing
Jaeik Kim, Jaeyoung Do
NEURIPS 2025arXiv:2510.11268
On Effects of Steering Latent Representation for Large Language Model Unlearning
Huu-Tien Dang, Tin Pham, Hoang Thanh-Tung et al.
AAAI 2025paperarXiv:2408.06223
AND: Audio Network Dissection for Interpreting Deep Acoustic Models
Tung-Yu Wu, Yu-Xiang Lin, Lily Weng
ICML 2024arXiv:2406.16990
3
citations
Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models
Samuele Poppi, Tobia Poppi, Federico Cocchi et al.
ECCV 2024arXiv:2311.16254
10
citations
The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning
Nathaniel Li, Alexander Pan, Anjali Gopal et al.
ICML 2024arXiv:2403.03218
333
citations