α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Haitao Mi
Haitao Mi
5
papers
71
total citations
papers (5)
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
ICLR 2025
arXiv
34
citations
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
NEURIPS 2025
arXiv
18
citations
Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training
NEURIPS 2025
arXiv
10
citations
LiteSearch: Efficient Tree Search with Dynamic Exploration Budget for Math Reasoning
AAAI 2025
5
citations
UniGist: Towards General and Hardware-aligned Sequence-level Long Context Compression
NEURIPS 2025
arXiv
4
citations