α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Han Zhong
Han Zhong
14
papers
650
total citations
papers (14)
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint
ICML 2024
arXiv
312
citations
Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment
ICML 2024
arXiv
125
citations
Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage
NEURIPS 2023
arXiv
50
citations
A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes
NEURIPS 2023
arXiv
39
citations
Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power
NEURIPS 2022
arXiv
33
citations
Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration
NEURIPS 2023
arXiv
26
citations
Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds
NEURIPS 2023
arXiv
13
citations
Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret
ICML 2024
arXiv
11
citations
Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs
NEURIPS 2021
arXiv
11
citations
A Reduction-based Framework for Sequential Decision Making with Delayed Feedback
NEURIPS 2023
arXiv
9
citations
BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning
ICML 2025
arXiv
9
citations
Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond
ICML 2024
arXiv
8
citations
A3S: A General Active Clustering Method with Pairwise Constraints
ICML 2024
arXiv
3
citations
Posterior Sampling for Competitive RL: Function Approximation and Partial Observation
NEURIPS 2023
arXiv
1
citations