α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Haipeng Luo
Haipeng Luo
OpenReview
26
papers
1,416
total citations
papers (26)
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
ICLR 2025
arXiv
655
citations
Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
CVPR 2023
arXiv
149
citations
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition With Pre-Trained Vision-Language Models
CVPR 2023
arXiv
80
citations
Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs
NEURIPS 2020
arXiv
61
citations
Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition
NEURIPS 2020
arXiv
61
citations
Last-iterate Convergence in Extensive-Form Games
NEURIPS 2021
arXiv
50
citations
Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses
NEURIPS 2021
arXiv
50
citations
The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition
NEURIPS 2021
arXiv
46
citations
Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer Games
NEURIPS 2022
arXiv
39
citations
Near-Optimal No-Regret Learning Dynamics for General Convex Games
NEURIPS 2022
arXiv
34
citations
Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games with Bandit Feedback
NEURIPS 2023
arXiv
28
citations
Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path
NEURIPS 2021
arXiv
26
citations
Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback
NEURIPS 2022
arXiv
25
citations
Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms
NEURIPS 2023
arXiv
21
citations
Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback
NEURIPS 2022
arXiv
21
citations
Regret Matching+: (In)Stability and Fast Convergence in Games
NEURIPS 2023
arXiv
13
citations
No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions
NEURIPS 2023
arXiv
12
citations
Practical Contextual Bandits with Feedback Graphs
NEURIPS 2023
arXiv
10
citations
Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments
NEURIPS 2022
arXiv
9
citations
Comparator-Adaptive Convex Bandits
NEURIPS 2020
arXiv
8
citations
Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback
ICML 2024
arXiv
5
citations
Efficient Contextual Bandits with Uninformed Feedback Graphs
ICML 2024
arXiv
5
citations
Contextual Linear Bandits with Delay as Payoff
ICML 2025
arXiv
3
citations
Improved Bounds for Swap Multicalibration and Swap Omniprediction
NEURIPS 2025
arXiv
2
citations
ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints
ICML 2024
arXiv
2
citations
Improved Regret and Contextual Linear Extension for Pandora's Box and Prophet Inequality
NEURIPS 2025
arXiv
1
citations