"model robustness" Papers

38 papers found

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs

Anselm Paulus, Arman Zharmagambetov, Chuan Guo et al.

ICML 2025arXiv:2404.16873

132

citations

Aligning Visual Contrastive learning models via Preference Optimization

Amirabbas Afzali, Borna khodabandeh, Ali Rasekh et al.

ICLR 2025arXiv:2411.08923

citations

An Evidence-Based Post-Hoc Adjustment Framework for Anomaly Detection Under Data Contamination

Sukanya Patra, Souhaib Ben Taieb

NEURIPS 2025spotlightarXiv:2510.21296

Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent

Christy Li, Josep Lopez Camuñas, Jake Touchet et al.

NEURIPS 2025arXiv:2510.21704

Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness

Qi Zhang, Yifei Wang, Jingyi Cui et al.

ICLR 2025arXiv:2410.21331

citations

Buffer layers for Test-Time Adaptation

Hyeongyu Kim, GeonHui Han, Dosik Hwang

NEURIPS 2025arXiv:2510.21271

Can Knowledge Editing Really Correct Hallucinations?

Baixiang Huang, Canyu Chen, Xiongxiao Xu et al.

ICLR 2025arXiv:2410.16251

citations

Competing Large Language Models in Multi-Agent Gaming Environments

Jen-Tse Huang, Eric John Li, Man Ho LAM et al.

ICLR 2025

citations

Democratic Training Against Universal Adversarial Perturbations

Bing Sun, Jun Sun, Wei Zhao

ICLR 2025arXiv:2502.05542

citations

Everywhere Attack: Attacking Locally and Globally to Boost Targeted Transferability

Hui Zeng, Sanshuai Cui, Biwei Chen et al.

AAAI 2025paperarXiv:2501.00707

citations

Failures to Find Transferable Image Jailbreaks Between Vision-Language Models

Rylan Schaeffer, Dan Valentine, Luke Bailey et al.

ICLR 2025arXiv:2407.15211

citations

Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning

Seanie Lee, Minsu Kim, Lynn Cherif et al.

ICLR 2025arXiv:2405.18540

citations

Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning

Hossein Rajoli Nowdeh, Jie Ji, Xiaolong Ma et al.

NEURIPS 2025arXiv:2510.24919

Optimal Control for Transformer Architectures: Enhancing Generalization, Robustness and Efficiency

Kelvin Kan, Xingjian Li, Benjamin Zhang et al.

NEURIPS 2025arXiv:2505.13499

citations

Perturb a Model, Not an Image: Towards Robust Privacy Protection via Anti-Personalized Diffusion Models

Tae-Young Lee, Juwon Seo, Jong Hwan Ko et al.

NEURIPS 2025arXiv:2511.01307

Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs

Itay Itzhak, Yonatan Belinkov, Gabriel Stanovsky

COLM 2025paperarXiv:2507.07186

citations

Remarkable Robustness of LLMs: Stages of Inference?

Vedang Lad, Jin Hwa Lee, Wes Gurnee et al.

NEURIPS 2025oralarXiv:2406.19384

citations

RepGuard: Adaptive Feature Decoupling for Robust Backdoor Defense in Large Language Models

Chenxu Niu, Jie Zhang, Yanbing Liu et al.

NEURIPS 2025

Resolution Attack: Exploiting Image Compression to Deceive Deep Neural Networks

Wangjia Yu, Xiaomeng Fu, Qiao Li et al.

ICLR 2025

Rethinking Evaluation of Infrared Small Target Detection

Youwei Pang, Xiaoqi Zhao, Lihe Zhang et al.

NEURIPS 2025arXiv:2509.16888

Seal Your Backdoor with Variational Defense

Ivan Sabolic, Matej Grcic, Siniša Šegvić

ICCV 2025arXiv:2503.08829

citations

Simple, Good, Fast: Self-Supervised World Models Free of Baggage

Jan Robine, Marc Höftmann, Stefan Harmeling

ICLR 2025arXiv:2506.02612

citations

Topological Zigzag Spaghetti for Diffusion-based Generation and Prediction on Graphs

Yuzhou Chen, Yulia Gel

ICLR 2025

citations

TransferBench: Benchmarking Ensemble-based Black-box Transfer Attacks

Fabio Brau, Maura Pintor, Antonio Cinà et al.

NEURIPS 2025

Transformer Layers as Painters

Qi Sun, Marc Pickett, Aakash Kumar Nain et al.

AAAI 2025paperarXiv:2407.09298

citations

Web Artifact Attacks Disrupt Vision Language Models

Maan Qraitem, Piotr Teterwak, Kate Saenko et al.

ICCV 2025arXiv:2503.13652

citations

Beyond the Federation: Topology-aware Federated Learning for Generalization to Unseen Clients

Mengmeng Ma, Tang Li, Xi Peng

ICML 2024arXiv:2407.04949

citations

Energy-based Backdoor Defense without Task-Specific Samples and Model Retraining

Yudong Gao, Honglong Chen, Peng Sun et al.

ICML 2024

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Mantas Mazeika, Long Phan, Xuwang Yin et al.

ICML 2024arXiv:2402.04249

802

citations

Improving SAM Requires Rethinking its Optimization Formulation

Wanyun Xie, Fabian Latorre, Kimon Antonakopoulos et al.

ICML 2024arXiv:2407.12993

citations

Interpretability-Guided Test-Time Adversarial Defense

Akshay Ravindra Kulkarni, Tsui-Wei Weng

ECCV 2024arXiv:2409.15190

citations

MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models

Xin Liu, Yichen Zhu, Jindong Gu et al.

ECCV 2024arXiv:2311.17600

199

citations

On the Robustness of Large Multimodal Models Against Image Adversarial Attacks

Xuanming Cui, Alejandro Aparcedo, Young Kyun Jang et al.

CVPR 2024arXiv:2312.03777

citations

Revealing the Dark Secrets of Extremely Large Kernel ConvNets on Robustness

Honghao Chen, Zhang Yurong, xiaokun Feng et al.

ICML 2024arXiv:2407.08972

citations

Robustness of Deep Learning for Accelerated MRI: Benefits of Diverse Training Data

Kang Lin, Reinhard Heckel

ICML 2024arXiv:2312.10271

citations

Unexplored Faces of Robustness and Out-of-Distribution: Covariate Shifts in Environment and Sensor Domains

Eunsu Baek, Keondo Park, Ji-yoon Kim et al.

CVPR 2024arXiv:2404.15882

citations

Unraveling Batch Normalization for Realistic Test-Time Adaptation

Zixian Su, Jingwei Guo, Kai Yao et al.

AAAI 2024paperarXiv:2312.09486

citations

Why Larger Language Models Do In-context Learning Differently?

Zhenmei Shi, Junyi Wei, Zhuoyan Xu et al.

ICML 2024arXiv:2405.19592

citations