Prateek Mittal

papers

1,811

total citations

papers (18)

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

ICLR 2024arXiv

966

citations

Safety Alignment Should be Made More Than Just a Few Tokens Deep

ICLR 2025arXiv

303

citations

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

ICML 2024arXiv

184

citations

Differentially Private Image Classification by Learning Priors from Random Processes

NEURIPS 2023arXiv

citations

HYDRA: Pruning Adversarially Robust Neural Networks

NEURIPS 2020arXiv

citations

Understanding Robust Learning through the Lens of Representation Similarities

NEURIPS 2022arXiv

citations

A New Linear Scaling Rule for Private Adaptive Hyperparameter Optimization

ICML 2024arXiv

citations

A Randomized Approach to Tight Privacy Accounting

NEURIPS 2023arXiv

citations

Formulating Robustness Against Unforeseen Attacks

NEURIPS 2022arXiv

citations

Renyi Differential Privacy of Propose-Test-Release and Applications to Private and Robust Machine Learning

NEURIPS 2022arXiv

citations

Characterizing the Optimal $0-1$ Loss for Multi-class Classification with a Test-time Attacker

NEURIPS 2023arXiv

citations

ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search

NEURIPS 2025arXiv

citations

PatchDEMUX: A Certifiably Robust Framework for Multi-label Classifiers Against Adversarial Patches

CVPR 2025arXiv

citations

A Privacy-Friendly Approach to Data Valuation

NEURIPS 2023

citations

Adapting to Evolving Adversaries with Regularized Continual Robust Training

ICML 2025arXiv

citations

Prateek Mittal

papers (18)

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

Safety Alignment Should be Made More Than Just a Few Tokens Deep

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal

Data Shapley in One Training Run

Teach LLMs to Phish: Stealing Private Information from Language Models

Differentially Private Image Classification by Learning Priors from Random Processes

HYDRA: Pruning Adversarially Robust Neural Networks

Understanding Robust Learning through the Lens of Representation Similarities

A New Linear Scaling Rule for Private Adaptive Hyperparameter Optimization

A Randomized Approach to Tight Privacy Accounting

Formulating Robustness Against Unforeseen Attacks

Renyi Differential Privacy of Propose-Test-Release and Applications to Private and Robust Machine Learning

Characterizing the Optimal $0-1$ Loss for Multi-class Classification with a Test-time Attacker

ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search

PatchDEMUX: A Certifiably Robust Framework for Multi-label Classifiers Against Adversarial Patches

A Privacy-Friendly Approach to Data Valuation

Adapting to Evolving Adversaries with Regularized Continual Robust Training

papers (18)

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

Safety Alignment Should be Made More Than Just a Few Tokens Deep

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal

Data Shapley in One Training Run

Teach LLMs to Phish: Stealing Private Information from Language Models

Differentially Private Image Classification by Learning Priors from Random Processes

HYDRA: Pruning Adversarially Robust Neural Networks

Understanding Robust Learning through the Lens of Representation Similarities

A New Linear Scaling Rule for Private Adaptive Hyperparameter Optimization

A Randomized Approach to Tight Privacy Accounting

Formulating Robustness Against Unforeseen Attacks

Renyi Differential Privacy of Propose-Test-Release and Applications to Private and Robust Machine Learning

Characterizing the Optimal $0-1$ Loss for Multi-class Classification with a Test-time Attacker

ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search

PatchDEMUX: A Certifiably Robust Framework for Multi-label Classifiers Against Adversarial Patches

A Privacy-Friendly Approach to Data Valuation

Adapting to Evolving Adversaries with Regularized Continual Robust Training