α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Peter Henderson
Peter Henderson
13
papers
2,086
total citations
papers (13)
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
ICLR 2024
arXiv
966
citations
LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models
NEURIPS 2023
arXiv
319
citations
Safety Alignment Should be Made More Than Just a Few Tokens Deep
ICLR 2025
arXiv
303
citations
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
ICML 2024
arXiv
184
citations
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
ICLR 2025
arXiv
151
citations
Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset
NEURIPS 2022
arXiv
133
citations
Fantastic Copyrighted Beasts and How (Not) to Generate Them
ICLR 2025
arXiv
24
citations
Dynamic Risk Assessments for Offensive Cybersecurity Agents
NEURIPS 2025
arXiv
4
citations
Position: In-House Evaluation Is Not Enough. Towards Robust Third-Party Evaluation and Flaw Disclosure for General-Purpose AI
ICML 2025
2
citations
Position: On the Societal Impact of Open Foundation Models
ICML 2024
0
citations
Position: A Safe Harbor for AI Evaluation and Red Teaming
ICML 2024
0
citations
Cheaply Estimating Inference Efficiency Metrics for Autoregressive Transformer Models
NEURIPS 2023
0
citations
A Multimodal Benchmark for Framing of Oil & Gas Advertising and Potential Greenwashing Detection
NEURIPS 2025
arXiv
0
citations