Dawn Song

papers

7,353

total citations

papers (27)

The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

ICCV 2021arXiv

2,156

citations

The Secret Revealer: Generative Model-Inversion Attacks Against Deep Neural Networks

CVPR 2020arXiv

488

citations

PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures

CVPR 2022arXiv

174

citations

Compositional Generalization via Neural-Symbolic Stack Machines

NEURIPS 2020arXiv

106

citations

TrojDiff: Trojan Attacks on Diffusion Models With Diverse Targets

CVPR 2023arXiv

104

citations

RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content

ICML 2024arXiv

citations

Synthesize, Execute and Debug: Learning to Repair for Neural Program Synthesis

NEURIPS 2020arXiv

citations

DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification

NEURIPS 2023arXiv

citations

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

ICML 2024arXiv

citations

Data Shapley in One Training Run

ICLR 2025arXiv

citations

Agent Instructs Large Language Models to be General Zero-Shot Reasoners

ICML 2024arXiv

citations

Forecasting Future World Events With Neural Networks

NEURIPS 2022arXiv

citations

CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification

AAAI 2025arXiv

citations

C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models

ICML 2024arXiv

citations

Scalability vs. Utility: Do We Have To Sacrifice One for the Other in Data Importance Quantification?

CVPR 2021arXiv

citations

Towards practical differentially private causal graph discovery

NEURIPS 2020arXiv

citations

How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios

NEURIPS 2022arXiv

citations

Adversarial Examples for k-Nearest Neighbor Classifiers Based on Higher-Order Voronoi Diagrams

NEURIPS 2021arXiv

citations

GRATH: Gradual Self-Truthifying for Large Language Models

ICML 2024arXiv

citations

Position: Evolving AI Collectives Enhance Human Diversity and Enable Self-Regulation

ICML 2024

citations

Latent Execution for Neural Program Synthesis Beyond Domain-Specific Languages

NEURIPS 2021

citations

BIRD: Generalizable Backdoor Detection and Removal for Deep Reinforcement Learning

NEURIPS 2023

citations

SHINE: Shielding Backdoors in Deep Reinforcement Learning

ICML 2024

citations

Position: On the Societal Impact of Open Foundation Models

ICML 2024

citations

Dawn Song

papers (27)

The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

Natural Adversarial Examples

Model-Contrastive Federated Learning

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

The Secret Revealer: Generative Model-Inversion Attacks Against Deep Neural Networks

PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures

Compositional Generalization via Neural-Symbolic Stack Machines

TrojDiff: Trojan Attacks on Diffusion Models With Diverse Targets

RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content

Synthesize, Execute and Debug: Learning to Repair for Neural Program Synthesis

DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

Data Shapley in One Training Run

Agent Instructs Large Language Models to be General Zero-Shot Reasoners

Forecasting Future World Events With Neural Networks

CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification

C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models

Scalability vs. Utility: Do We Have To Sacrifice One for the Other in Data Importance Quantification?

Towards practical differentially private causal graph discovery

How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios

Adversarial Examples for k-Nearest Neighbor Classifiers Based on Higher-Order Voronoi Diagrams

GRATH: Gradual Self-Truthifying for Large Language Models

Position: Evolving AI Collectives Enhance Human Diversity and Enable Self-Regulation

Latent Execution for Neural Program Synthesis Beyond Domain-Specific Languages

BIRD: Generalizable Backdoor Detection and Removal for Deep Reinforcement Learning

SHINE: Shielding Backdoors in Deep Reinforcement Learning

Position: On the Societal Impact of Open Foundation Models

papers (27)

The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

Natural Adversarial Examples

Model-Contrastive Federated Learning

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

The Secret Revealer: Generative Model-Inversion Attacks Against Deep Neural Networks

PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures

Compositional Generalization via Neural-Symbolic Stack Machines

TrojDiff: Trojan Attacks on Diffusion Models With Diverse Targets

RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content

Synthesize, Execute and Debug: Learning to Repair for Neural Program Synthesis

DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

Data Shapley in One Training Run

Agent Instructs Large Language Models to be General Zero-Shot Reasoners

Forecasting Future World Events With Neural Networks

CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification

C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models

Scalability vs. Utility: Do We Have To Sacrifice One for the Other in Data Importance Quantification?

Towards practical differentially private causal graph discovery

How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios

Adversarial Examples for k-Nearest Neighbor Classifiers Based on Higher-Order Voronoi Diagrams

GRATH: Gradual Self-Truthifying for Large Language Models

Position: Evolving AI Collectives Enhance Human Diversity and Enable Self-Regulation

Latent Execution for Neural Program Synthesis Beyond Domain-Specific Languages

BIRD: Generalizable Backdoor Detection and Removal for Deep Reinforcement Learning

SHINE: Shielding Backdoors in Deep Reinforcement Learning

Position: On the Societal Impact of Open Foundation Models