α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Nathan Helm-Burger
Nathan Helm-Burger
3
papers
353
total citations
papers (3)
The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning
ICML 2024
arXiv
333
citations
CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring
NEURIPS 2025
arXiv
13
citations
Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models
NEURIPS 2025
arXiv
7
citations