α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Zhengxuan Wu
Zhengxuan Wu
5
papers
359
total citations
papers (5)
AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
ICML 2025
arXiv
118
citations
Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
NEURIPS 2023
arXiv
112
citations
CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model Behavior
NEURIPS 2022
arXiv
59
citations
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation
ICML 2024
arXiv
43
citations
ZeroC: A Neuro-Symbolic Model for Zero-shot Concept Recognition and Acquisition at Inference Time
NEURIPS 2022
arXiv
27
citations