"model interpretability" Papers
54 papers found • Page 1 of 2
Conference
Additive Models Explained: A Computational Complexity Approach
Shahaf Bassan, Michal Moshkovitz, Guy Katz
AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution
Fengyuan Liu, Nikhil Kandpal, Colin Raffel
Cognitive Mirrors: Exploring the Diverse Functional Roles of Attention Heads in LLM Reasoning
Xueqi Ma, Jun Wang, Yanbei Jiang et al.
Concept Bottleneck Language Models For Protein Design
Aya Ismail, Tuomas Oikarinen, Amy Wang et al.
Data-centric Prediction Explanation via Kernelized Stein Discrepancy
Mahtab Sarvmaili, Hassan Sajjad, Ga Wu
Dataset Distillation for Pre-Trained Self-Supervised Vision Models
George Cazenavette, Antonio Torralba, Vincent Sitzmann
DATE-LM: Benchmarking Data Attribution Evaluation for Large Language Models
Cathy Jiao, Yijun Pan, Emily Xiao et al.
Defining and Discovering Hyper-meta-paths for Heterogeneous Hypergraphs
Yaming Yang, Ziyu Zheng, Weigang Lu et al.
Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning
Chen Qian, Dongrui Liu, Hao Wen et al.
Dense SAE Latents Are Features, Not Bugs
Xiaoqing Sun, Alessandro Stolfo, Joshua Engels et al.
Discovering Influential Neuron Path in Vision Transformers
Yifan Wang, Yifei Liu, Yingdong Shi et al.
Enhancing Multimodal Large Language Models Complex Reason via Similarity Computation
Xiaofeng Zhang, Fanshuo Zeng, Yihao Quan et al.
Forking Paths in Neural Text Generation
Eric Bigelow, Ari Holtzman, Hidenori Tanaka et al.
From Search to Sampling: Generative Models for Robust Algorithmic Recourse
Prateek Garg, Lokesh Nagalapatti, Sunita Sarawagi
How to Probe: Simple Yet Effective Techniques for Improving Post-hoc Explanations
Siddhartha Gairola, Moritz Böhle, Francesco Locatello et al.
I Am Big, You Are Little; I Am Right, You Are Wrong
David A Kelly, Akchunya Chanchal, Nathan Blake
Interpreting Language Reward Models via Contrastive Explanations
Junqi Jiang, Tom Bewley, Saumitra Mishra et al.
LeapFactual: Reliable Visual Counterfactual Explanation Using Conditional Flow Matching
Zhuo Cao, Xuan Zhao, Lena Krieger et al.
LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity
Walid Bousselham, Angie Boggust, Sofian Chaybouti et al.
Localizing Knowledge in Diffusion Transformers
Arman Zarei, Samyadeep Basu, Keivan Rezaei et al.
Looking Inward: Language Models Can Learn About Themselves by Introspection
Felix Jedidja Binder, James Chua, Tomek Korbak et al.
Manipulating Feature Visualizations with Gradient Slingshots
Dilyara Bareeva, Marina Höhne, Alexander Warnecke et al.
Narrowing Information Bottleneck Theory for Multimodal Image-Text Representations Interpretability
Zhiyu Zhu, Zhibo Jin, Jiayu Zhang et al.
Register and [CLS] tokens induce a decoupling of local and global features in large ViTs
Alexander Lappe, Martin Giese
Self-Assembling Graph Perceptrons
Jialong Chen, Tong Wang, Bowen Deng et al.
SHAP zero Explains Biological Sequence Models with Near-zero Marginal Cost for Future Queries
Darin Tsui, Aryan Musharaf, Yigit Efe Erginbas et al.
Smoothed Differentiation Efficiently Mitigates Shattered Gradients in Explanations
Adrian Hill, Neal McKee, Johannes Maeß et al.
Start Smart: Leveraging Gradients For Enhancing Mask-based XAI Methods
Buelent Uendes, Shujian Yu, Mark Hoogendoorn
TAB: Transformer Attention Bottlenecks enable User Intervention and Debugging in Vision-Language Models
Pooyan Rahmanzadehgervi, Hung Nguyen, Rosanne Liu et al.
The Fragile Truth of Saliency: Improving LLM Input Attribution via Attention Bias Optimization
Yihua Zhang, Changsheng Wang, Yiwei Chen et al.
The Zero Body Problem: Probing LLM Use of Sensory Language
Rebecca M. M. Hicke, Sil Hamilton, David Mimno
Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties
Gouki Minegishi, Hiroki Furuta, Takeshi Kojima et al.
Towards Understanding How Knowledge Evolves in Large Vision-Language Models
Sudong Wang, Yunjian Zhang, Yao Zhu et al.
Unveiling Concept Attribution in Diffusion Models
Nguyen Hung-Quang, Hoang Phan, Khoa D Doan
Accelerating the Global Aggregation of Local Explanations
Alon Mor, Yonatan Belinkov, Benny Kimelfeld
Attention Guided CAM: Visual Explanations of Vision Transformer Guided by Self-Attention
Saebom Leem, Hyunseok Seo
Attribution-based Explanations that Provide Recourse Cannot be Robust
Hidde Fokkema, Rianne de Heide, Tim van Erven
CAPE: CAM as a Probabilistic Ensemble for Enhanced DNN Interpretation
Townim Chowdhury, Kewen Liao, Vu Minh Hieu Phan et al.
Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort
Jeeyung Kim, Ze Wang, Qiang Qiu
Distilled Datamodel with Reverse Gradient Matching
Jingwen Ye, Ruonan Yu, Songhua Liu et al.
Explaining Graph Neural Networks via Structure-aware Interaction Index
Ngoc Bui, Trung Hieu Nguyen, Viet Anh Nguyen et al.
Explaining Probabilistic Models with Distributional Values
Luca Franceschi, Michele Donini, Cedric Archambeau et al.
Exploring the LLM Journey from Cognition to Expression with Linear Representations
Yuzi Yan, Jialian Li, YipinZhang et al.
Improving Neural Additive Models with Bayesian Principles
Kouroche Bouchiat, Alexander Immer, Hugo Yèche et al.
Iterative Search Attribution for Deep Neural Networks
Zhiyu Zhu, Huaming Chen, Xinyi Wang et al.
KernelSHAP-IQ: Weighted Least Square Optimization for Shapley Interactions
Fabian Fumagalli, Maximilian Muschalik, Patrick Kolpaczki et al.
MAPTree: Beating “Optimal” Decision Trees with Bayesian Decision Trees
Colin Sullivan, Mo Tiwari, Sebastian Thrun
MFABA: A More Faithful and Accelerated Boundary-Based Attribution Method for Deep Neural Networks
Zhiyu Zhu, Huaming Chen, Jiayu Zhang et al.
On Gradient-like Explanation under a Black-box Setting: When Black-box Explanations Become as Good as White-box
Yi Cai, Gerhard Wunder
Position: Cracking the Code of Cascading Disparity Towards Marginalized Communities
Golnoosh Farnadi, Mohammad Havaei, Negar Rostamzadeh