Poster "safety evaluation" Papers
7 papers found
Conference
AIR-BENCH 2024: A Safety Benchmark based on Regulation and Policies Specified Risk Categories
Yi Zeng, Yu Yang, Andy Zhou et al.
ICLR 2025
15
citations
Causal Composition Diffusion Model for Closed-loop Traffic Generation
Haohong Lin, Xin Huang, Tung Phan-Minh et al.
CVPR 2025arXiv:2412.17920
13
citations
ProAdvPrompter: A Two-Stage Journey to Effective Adversarial Prompting for LLMs
Hao Di, Tong He, Haishan Ye et al.
ICLR 2025
2
citations
Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking
Benjamin Feuer, Micah Goldblum, Teresa Datta et al.
ICLR 2025arXiv:2409.15268
28
citations
Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Danny Halawi, Alexander Wei, Eric Wallace et al.
ICML 2024arXiv:2406.20053
65
citations
MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
Xin Liu, Yichen Zhu, Jindong Gu et al.
ECCV 2024arXiv:2311.17600
199
citations
Position: TrustLLM: Trustworthiness in Large Language Models
Yue Huang, Lichao Sun, Haoran Wang et al.
ICML 2024