α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Zhiyuan Li
Zhiyuan Li
1
Affiliations
Affiliations
Toyota Technological Institute at Chicago
21
papers
1,085
total citations
papers (21)
DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection
ICCV 2021
arXiv
274
citations
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
ICLR 2024
arXiv
241
citations
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
NEURIPS 2021
arXiv
98
citations
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
NEURIPS 2022
arXiv
89
citations
Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias
NEURIPS 2021
arXiv
84
citations
Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate
NEURIPS 2020
arXiv
78
citations
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking
ICLR 2024
arXiv
57
citations
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
NEURIPS 2023
arXiv
42
citations
Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent
NEURIPS 2022
arXiv
33
citations
Implicit Regularization and Convergence for Weight Normalization
NEURIPS 2020
arXiv
26
citations
Why Do You Grok? A Theoretical Analysis on Grokking Modular Addition
ICML 2024
arXiv
19
citations
Structured Preconditioners in Adaptive Optimization: A Unified Analysis
ICML 2025
arXiv
18
citations
PENCIL: Long Thoughts with Short Memory
ICML 2025
arXiv
10
citations
Optimistic Multi-Agent Policy Gradient
ICML 2024
arXiv
5
citations
AgentMixer: Multi-Agent Correlated Policy Factorization
AAAI 2025
arXiv
4
citations
Simplicity Bias via Global Convergence of Sharpness Minimization
ICML 2024
arXiv
3
citations
Non-Asymptotic Length Generalization
ICML 2025
arXiv
3
citations
Find a Scapegoat: Poisoning Membership Inference Attack and Defense to Federated Learning
ICCV 2025
arXiv
1
citations
Fast Mixing of Stochastic Gradient Descent with Normalization and Weight Decay
NEURIPS 2022
0
citations
What is the Inductive Bias of Flatness Regularization? A Study of Deep Matrix Factorization Models
NEURIPS 2023
0
citations
Implicit Bias of AdamW: $\ell_\infty$-Norm Constrained Optimization
ICML 2024
0
citations