α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Aviral Kumar
Aviral Kumar
20
papers
4,196
total citations
papers (20)
Conservative Q-Learning for Offline Reinforcement Learning
NEURIPS 2020
arXiv
2,285
citations
COMBO: Conservative Offline Model-Based Policy Optimization
NEURIPS 2021
arXiv
488
citations
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
NEURIPS 2023
arXiv
200
citations
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
ICML 2024
arXiv
179
citations
Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability
NEURIPS 2021
arXiv
149
citations
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
ICML 2024
arXiv
135
citations
DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction
NEURIPS 2020
arXiv
118
citations
Model Inversion Networks for Model-Based Optimization
NEURIPS 2020
arXiv
113
citations
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
ICML 2024
arXiv
107
citations
One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL
NEURIPS 2020
arXiv
105
citations
Conservative Data Sharing for Multi-Task Offline Reinforcement Learning
NEURIPS 2021
arXiv
87
citations
Scaling Test-Time Compute Without Verification or RL is Suboptimal
ICML 2025
arXiv
73
citations
Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models
ICLR 2025
arXiv
49
citations
Data-Driven Offline Decision-Making via Invariant Representation Learning
NEURIPS 2022
arXiv
33
citations
Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data
ICLR 2025
arXiv
30
citations
Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets
NEURIPS 2023
arXiv
27
citations
Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners
NEURIPS 2025
arXiv
9
citations
Value-Based Deep RL Scales Predictably
ICML 2025
arXiv
9
citations
ReDS: Offline RL With Heteroskedastic Datasets via Support Constraints
NEURIPS 2023
0
citations
DASCO: Dual-Generator Adversarial Support Constrained Offline Reinforcement Learning
NEURIPS 2022
0
citations