α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
David Krueger
David Krueger
10
papers
807
total citations
papers (10)
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
ICLR 2025
arXiv
750
citations
Pitfalls of Evidence-Based AI Policy
ICLR 2025
arXiv
14
citations
Detecting High-Stakes Interactions with Activation Probes
NEURIPS 2025
arXiv
13
citations
Thinker: Learning to Plan and Act
NEURIPS 2023
arXiv
12
citations
Implicit meta-learning may lead language models to trust more reliable sources
ICML 2024
arXiv
7
citations
Rethinking Safety in LLM Fine-tuning: An Optimization Perspective
COLM 2025
arXiv
5
citations
From Dormant to Deleted: Tamper-Resistant Unlearning Through Weight-Space Regularization
NEURIPS 2025
arXiv
4
citations
Input Space Mode Connectivity in Deep Neural Networks
ICLR 2025
arXiv
1
citations
PoisonBench: Assessing Language Model Vulnerability to Poisoned Preference Data
ICML 2025
1
citations
Defining and Characterizing Reward Gaming
NEURIPS 2022
0
citations