"concept-based interpretability" Papers
3 papers found
Conference
Interpreting Emergent Planning in Model-Free Reinforcement Learning
Thomas Bush, Stephen Chung, Usman Anwar et al.
ICLR 2025arXiv:1901.03559
125
citations
Towards Compositionality in Concept Learning
Adam Stein, Aaditya Naik, Yinjun Wu et al.
ICML 2024arXiv:2406.18534
8
citations
Understanding Video Transformers via Universal Concept Discovery
Matthew Kowal, Achal Dave, Rares Andrei Ambrus et al.
CVPR 2024highlightarXiv:2401.10831
18
citations