α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Dan Hendrycks
Dan Hendrycks
1
Affiliations
Affiliations
UC Berkeley
15
papers
8,634
total citations
papers (15)
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
ICLR 2025
arXiv
2,226
citations
The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization
ICCV 2021
arXiv
2,156
citations
Natural Adversarial Examples
CVPR 2021
arXiv
1,783
citations
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
ICML 2024
arXiv
802
citations
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models
NEURIPS 2023
arXiv
571
citations
The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning
ICML 2024
arXiv
333
citations
OpenOOD: Benchmarking Generalized Out-of-Distribution Detection
NEURIPS 2022
arXiv
330
citations
PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures
CVPR 2022
arXiv
174
citations
Tamper-Resistant Safeguards for Open-Weight LLMs
ICLR 2025
arXiv
113
citations
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
ICML 2024
arXiv
49
citations
Forecasting Future World Events With Neural Networks
NEURIPS 2022
arXiv
39
citations
Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
NEURIPS 2025
arXiv
36
citations
MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models
ICLR 2025
arXiv
11
citations
How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios
NEURIPS 2022
arXiv
11
citations
A Spectral View of Randomized Smoothing under Common Corruptions: Benchmarking and Improving Certified Robustness
ECCV 2022
0
citations