α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Sanjiv Kumar
Sanjiv Kumar
28
papers
1,263
total citations
papers (28)
Rethinking FID: Towards a Better Evaluation Metric for Image Generation
CVPR 2024
arXiv
294
citations
Think before you speak: Training Language Models With Pause Tokens
ICLR 2024
arXiv
200
citations
Batch Active Learning at Scale
NEURIPS 2021
arXiv
187
citations
O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers
NEURIPS 2020
arXiv
94
citations
Why are Adaptive Methods Good for Attention Models?
NEURIPS 2020
arXiv
87
citations
Learning discrete distributions: user vs item-level privacy
NEURIPS 2020
arXiv
58
citations
Two-stage LLM Fine-tuning with Less Specialization and More Generalization
ICLR 2024
arXiv
43
citations
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
ICML 2024
arXiv
38
citations
Robust large-margin learning in hyperbolic space
NEURIPS 2020
arXiv
36
citations
When Does Confidence-Based Cascade Deferral Suffice?
NEURIPS 2023
arXiv
36
citations
Decoupled Context Processing for Context Augmented Language Modeling
NEURIPS 2022
arXiv
30
citations
TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s
NEURIPS 2022
arXiv
29
citations
Faster Cascades via Speculative Decoding
ICLR 2025
arXiv
23
citations
Multi-Stage Influence Function
NEURIPS 2020
arXiv
21
citations
On student-teacher deviations in distillation: does it pay to disobey?
NEURIPS 2023
arXiv
19
citations
SOAR: Improved Indexing for Approximate Nearest Neighbor Search
NEURIPS 2023
arXiv
18
citations
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization
ICLR 2025
arXiv
16
citations
Tandem Transformers for Inference Efficient LLMs
ICML 2024
arXiv
10
citations
ResMem: Learn what you can and memorize the rest
NEURIPS 2023
arXiv
9
citations
Better autoregressive regression with LLMs via regression-aware fine-tuning
ICLR 2025
7
citations
Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines
ICML 2024
arXiv
5
citations
Spark Transformer: Reactivating Sparsity in Transformer FFN and Attention
NEURIPS 2025
2
citations
Analyzing Similarity Metrics for Data Selection for Language Model Pretraining
NEURIPS 2025
arXiv
1
citations
Efficient Training of Retrieval Models using Negative Cache
NEURIPS 2021
0
citations
MarkovGen: Structured Prediction for Efficient Text-to-Image Generation
CVPR 2024
arXiv
0
citations
How Does Noise Help Robustness? Explanation and Exploration under the Neural SDE Framework
CVPR 2020
0
citations
Post-hoc estimators for learning to defer to an expert
NEURIPS 2022
0
citations
USTAD: Unified Single-model Training Achieving Diverse Scores for Information Retrieval
ICML 2024
0
citations