ResearchAlpha Leak

Conferences Topics Top Authors Rankings Browse All

Home/Authors/Dylan Hadfield-Menell

Dylan Hadfield-Menell

Topic trends: 32,543 papers · similarity ≥ 0.4 · year ≥ 2024 · Data sourced from Semantic Scholar

34,598 papers | Abstracts: 31,650 (91.5%) | Citations: 34,598 (100.0%) | arXiv: 26,074 (75.4%)

Built: Feb 14, 2026, 10:11 PM AMS

9

papers

1,065

total citations

papers (9)

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF

Consequences of Misaligned AI

NEURIPS 2020arXiv

Robust Feature-Level Adversaries are Interpretability Tools

NEURIPS 2022arXiv

How to talk so AI will learn: Instructions, descriptions, and autonomy

NEURIPS 2022arXiv

Diverse Preference Learning for Capabilities and Alignment

Red Teaming Deep Neural Networks with Feature Synthesis Tools

NEURIPS 2023arXiv

Pitfalls of Evidence-Based AI Policy

Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia

NEURIPS 2025arXiv