α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Dmitrii Krasheninnikov
Dmitrii Krasheninnikov
4
papers
770
total citations
papers (4)
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
ICLR 2025
arXiv
750
citations
Detecting High-Stakes Interactions with Activation Probes
NEURIPS 2025
arXiv
13
citations
Implicit meta-learning may lead language models to trust more reliable sources
ICML 2024
arXiv
7
citations
Defining and Characterizing Reward Gaming
NEURIPS 2022
0
citations