ResearchAlpha Leak

Conferences Topics Top Authors Rankings Browse All

Home/Authors/David Krueger

David Krueger

Topic trends: 32,543 papers · similarity ≥ 0.4 · year ≥ 2024 · Data sourced from Semantic Scholar

34,598 papers | Abstracts: 31,650 (91.5%) | Citations: 34,598 (100.0%) | arXiv: 26,074 (75.4%)

Built: Feb 14, 2026, 11:22 PM AMS

10

papers

807

total citations

papers (10)

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Pitfalls of Evidence-Based AI Policy

Detecting High-Stakes Interactions with Activation Probes

NEURIPS 2025arXiv

Thinker: Learning to Plan and Act

NEURIPS 2023arXiv

Implicit meta-learning may lead language models to trust more reliable sources

Rethinking Safety in LLM Fine-tuning: An Optimization Perspective

From Dormant to Deleted: Tamper-Resistant Unlearning Through Weight-Space Regularization

NEURIPS 2025arXiv

Input Space Mode Connectivity in Deep Neural Networks

PoisonBench: Assessing Language Model Vulnerability to Poisoned Preference Data

Defining and Characterizing Reward Gaming