α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Anca Dragan
Anca Dragan
19
papers
581
total citations
papers (19)
Reward-rational (implicit) choice: A unifying formalism for reward learning
NEURIPS 2020
arXiv
194
citations
Learning to Model the World With Language
ICML 2024
arXiv
71
citations
AI Alignment with Changing and Influenceable Reward Functions
ICML 2024
arXiv
43
citations
On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
ICLR 2025
arXiv
43
citations
AvE: Assistance via Empowerment
NEURIPS 2020
arXiv
42
citations
Bridging RL Theory and Practice with the Effective Horizon
NEURIPS 2023
arXiv
38
citations
Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making
ICML 2024
arXiv
33
citations
Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking
ICLR 2025
arXiv
25
citations
Learning Optimal Advantage from Preferences and Mistaking It for Reward
AAAI 2024
arXiv
16
citations
First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization
NEURIPS 2022
arXiv
15
citations
Preference learning along multiple criteria: A game-theoretic perspective
NEURIPS 2020
arXiv
14
citations
Context Steering: Controllable Personalization at Inference Time
ICLR 2025
arXiv
14
citations
Pragmatic Image Compression for Human-in-the-Loop Decision-Making
NEURIPS 2021
arXiv
14
citations
Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
ICLR 2025
arXiv
8
citations
The Effective Horizon Explains Deep RL Performance in Stochastic Environments
ICLR 2024
arXiv
5
citations
AssistanceZero: Scalably Solving Assistance Games
ICML 2025
arXiv
4
citations
Coprocessor Actor Critic: A Model-Based Reinforcement Learning Approach For Adaptive Brain Stimulation
ICML 2024
arXiv
2
citations
Learning to Influence Human Behavior with Offline Reinforcement Learning
NEURIPS 2023
arXiv
0
citations
Uni[MASK]: Unified Inference in Sequential Decision Problems
NEURIPS 2022
0
citations