Yangsibo Huang

papers

1,449

total citations

papers (12)

Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation

ICLR 2024arXiv

430

citations

Evaluating Gradient Inversion Attacks and Defenses in Federated Learning

NEURIPS 2021arXiv

357

citations

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

ICML 2024arXiv

184

citations

Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models

COLM 2025arXiv

citations

Sparsity-Preserving Differentially Private Training of Large Embedding Models

NEURIPS 2023arXiv

citations

Position: A Safe Harbor for AI Evaluation and Red Teaming

ICML 2024

citations

Yangsibo Huang

papers (12)

Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation

Evaluating Gradient Inversion Attacks and Defenses in Federated Learning

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

MUSE: Machine Unlearning Six-Way Evaluation for Language Models

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal

Recovering Private Text in Federated Learning of Language Models

Fantastic Copyrighted Beasts and How (Not) to Generate Them

Scaling Laws for Differentially Private Language Models

Scaling Embedding Layers in Language Models

Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models

Sparsity-Preserving Differentially Private Training of Large Embedding Models

Position: A Safe Harbor for AI Evaluation and Red Teaming

papers (12)

Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation

Evaluating Gradient Inversion Attacks and Defenses in Federated Learning

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

MUSE: Machine Unlearning Six-Way Evaluation for Language Models

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal

Recovering Private Text in Federated Learning of Language Models

Fantastic Copyrighted Beasts and How (Not) to Generate Them

Scaling Laws for Differentially Private Language Models

Scaling Embedding Layers in Language Models

Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models

Sparsity-Preserving Differentially Private Training of Large Embedding Models

Position: A Safe Harbor for AI Evaluation and Red Teaming