α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Prateek Mittal
Prateek Mittal
18
papers
1,811
total citations
papers (18)
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
ICLR 2024
arXiv
966
citations
Safety Alignment Should be Made More Than Just a Few Tokens Deep
ICLR 2025
arXiv
303
citations
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
ICML 2024
arXiv
184
citations
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
ICLR 2025
arXiv
151
citations
Data Shapley in One Training Run
ICLR 2025
arXiv
48
citations
Teach LLMs to Phish: Stealing Private Information from Language Models
ICLR 2024
arXiv
38
citations
Differentially Private Image Classification by Learning Priors from Random Processes
NEURIPS 2023
arXiv
30
citations
HYDRA: Pruning Adversarially Robust Neural Networks
NEURIPS 2020
arXiv
25
citations
Understanding Robust Learning through the Lens of Representation Similarities
NEURIPS 2022
arXiv
18
citations
A New Linear Scaling Rule for Private Adaptive Hyperparameter Optimization
ICML 2024
arXiv
15
citations
A Randomized Approach to Tight Privacy Accounting
NEURIPS 2023
arXiv
12
citations
Formulating Robustness Against Unforeseen Attacks
NEURIPS 2022
arXiv
9
citations
Renyi Differential Privacy of Propose-Test-Release and Applications to Private and Robust Machine Learning
NEURIPS 2022
arXiv
5
citations
Characterizing the Optimal $0-1$ Loss for Multi-class Classification with a Test-time Attacker
NEURIPS 2023
arXiv
5
citations
ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search
NEURIPS 2025
arXiv
2
citations
PatchDEMUX: A Certifiably Robust Framework for Multi-label Classifiers Against Adversarial Patches
CVPR 2025
arXiv
0
citations
A Privacy-Friendly Approach to Data Valuation
NEURIPS 2023
0
citations
Adapting to Evolving Adversaries with Regularized Continual Robust Training
ICML 2025
arXiv
0
citations