ResearchAlpha Leak

Conferences Topics Top Authors Rankings Browse All

Home/Authors/Peter Henderson

Peter Henderson

Topic trends: 32,543 papers · similarity ≥ 0.4 · year ≥ 2024 · Data sourced from Semantic Scholar

34,598 papers | Abstracts: 31,650 (91.5%) | Citations: 34,598 (100.0%) | arXiv: 26,074 (75.4%)

Built: Feb 14, 2026, 11:22 PM AMS

13

papers

2,086

total citations

papers (13)

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

NEURIPS 2023arXiv

Safety Alignment Should be Made More Than Just a Few Tokens Deep

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal

Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset

NEURIPS 2022arXiv

Fantastic Copyrighted Beasts and How (Not) to Generate Them

Dynamic Risk Assessments for Offensive Cybersecurity Agents

NEURIPS 2025arXiv

Position: In-House Evaluation Is Not Enough. Towards Robust Third-Party Evaluation and Flaw Disclosure for General-Purpose AI

Position: On the Societal Impact of Open Foundation Models

Position: A Safe Harbor for AI Evaluation and Red Teaming

Cheaply Estimating Inference Efficiency Metrics for Autoregressive Transformer Models

A Multimodal Benchmark for Framing of Oil & Gas Advertising and Potential Greenwashing Detection

NEURIPS 2025arXiv