Dan Hendrycks

Affiliations

UC Berkeley

papers

8,634

total citations

papers (15)

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

ICLR 2025arXiv

2,226

citations

The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

ICCV 2021arXiv

2,156

citations

Natural Adversarial Examples

CVPR 2021arXiv

1,783

citations

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

ICML 2024arXiv

802

citations

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

ICML 2024arXiv

citations

Forecasting Future World Events With Neural Networks

NEURIPS 2022arXiv

citations

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

NEURIPS 2025arXiv

citations

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

ICLR 2025arXiv

citations

How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios

NEURIPS 2022arXiv

citations

A Spectral View of Randomized Smoothing under Common Corruptions: Benchmarking and Improving Certified Robustness

ECCV 2022

citations

Dan Hendrycks

Affiliations

papers (15)

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

Natural Adversarial Examples

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning

OpenOOD: Benchmarking Generalized Out-of-Distribution Detection

PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures

Tamper-Resistant Safeguards for Open-Weight LLMs

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

Forecasting Future World Events With Neural Networks

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios

A Spectral View of Randomized Smoothing under Common Corruptions: Benchmarking and Improving Certified Robustness

papers (15)

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

Natural Adversarial Examples

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning

OpenOOD: Benchmarking Generalized Out-of-Distribution Detection

PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures

Tamper-Resistant Safeguards for Open-Weight LLMs

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

Forecasting Future World Events With Neural Networks

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios

A Spectral View of Randomized Smoothing under Common Corruptions: Benchmarking and Improving Certified Robustness