α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Nathan Kallus
Nathan Kallus
21
papers
401
total citations
papers (21)
Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning
NEURIPS 2020
arXiv
67
citations
Post-Contextual-Bandit Inference
NEURIPS 2021
arXiv
48
citations
Provable Offline Preference-Based Reinforcement Learning
ICLR 2024
arXiv
43
citations
Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems
NEURIPS 2022
arXiv
43
citations
What's the Harm? Sharp Bounds on the Fraction Negatively Affected by Treatment
NEURIPS 2022
arXiv
28
citations
The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning
NEURIPS 2023
arXiv
25
citations
Future-Dependent Value-Based Off-Policy Evaluation in POMDPs
NEURIPS 2023
arXiv
24
citations
Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning
NEURIPS 2021
arXiv
17
citations
More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning
ICML 2024
arXiv
17
citations
Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies
NEURIPS 2020
arXiv
16
citations
Control Variates for Slate Off-Policy Evaluation
NEURIPS 2021
arXiv
15
citations
$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
NEURIPS 2025
arXiv
12
citations
Inferring the Long-Term Causal Effects of Long-Term Treatments from Short-Term Experiments
ICML 2024
arXiv
11
citations
Peeking with PEAK: Sequential, Nonparametric Composite Hypothesis Tests for Means of Multiple Data Streams
ICML 2024
arXiv
11
citations
Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage
NEURIPS 2023
arXiv
8
citations
Value-Guided Search for Efficient Chain-of-Thought Reasoning
NEURIPS 2025
arXiv
7
citations
Estimating Structural Disparities for Face Models
CVPR 2022
arXiv
5
citations
The Implicit Delta Method
NEURIPS 2022
arXiv
2
citations
GST-UNet: A Neural Framework for Spatiotemporal Causal Inference with Time-Varying Confounding
NEURIPS 2025
arXiv
2
citations
Efficient Adaptive Experimentation with Noncompliance
NEURIPS 2025
arXiv
0
citations
Switching the Loss Reduces the Cost in Batch Reinforcement Learning
ICML 2024
0
citations