α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Archit Sharma
Archit Sharma
2
Affiliations
Affiliations
Google
Stanford
7
papers
7,479
total citations
papers (7)
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
NEURIPS 2023
arXiv
7,188
citations
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
ICML 2024
arXiv
179
citations
Autonomous Reinforcement Learning via Subgoal Curricula
NEURIPS 2021
arXiv
32
citations
You Only Live Once: Single-Life Reinforcement Learning
NEURIPS 2022
arXiv
29
citations
When to Ask for Help: Proactive Interventions in Autonomous Reinforcement Learning
NEURIPS 2022
arXiv
21
citations
Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval
ICLR 2025
arXiv
16
citations
RLVF: Learning from Verbal Feedback without Overgeneralization
ICML 2024
arXiv
14
citations