α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Samuel Marks
Samuel Marks
3
papers
1,071
total citations
papers (3)
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
ICLR 2025
arXiv
750
citations
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
ICLR 2025
arXiv
263
citations
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability
ICML 2025
arXiv
58
citations