"model robustness" Papers

38 papers found

AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs

Anselm Paulus, Arman Zharmagambetov, Chuan Guo et al.

ICML 2025arXiv:2404.16873
132
citations

Aligning Visual Contrastive learning models via Preference Optimization

Amirabbas Afzali, Borna khodabandeh, Ali Rasekh et al.

ICLR 2025arXiv:2411.08923
3
citations

An Evidence-Based Post-Hoc Adjustment Framework for Anomaly Detection Under Data Contamination

Sukanya Patra, Souhaib Ben Taieb

NEURIPS 2025spotlightarXiv:2510.21296

Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent

Christy Li, Josep Lopez Camuñas, Jake Touchet et al.

NEURIPS 2025arXiv:2510.21704

Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness

Qi Zhang, Yifei Wang, Jingyi Cui et al.

ICLR 2025arXiv:2410.21331
4
citations

Buffer layers for Test-Time Adaptation

Hyeongyu Kim, GeonHui Han, Dosik Hwang

NEURIPS 2025arXiv:2510.21271

Can Knowledge Editing Really Correct Hallucinations?

Baixiang Huang, Canyu Chen, Xiongxiao Xu et al.

ICLR 2025arXiv:2410.16251
29
citations

Competing Large Language Models in Multi-Agent Gaming Environments

Jen-Tse Huang, Eric John Li, Man Ho LAM et al.

ICLR 2025
28
citations

Democratic Training Against Universal Adversarial Perturbations

Bing Sun, Jun Sun, Wei Zhao

ICLR 2025arXiv:2502.05542
1
citations

Everywhere Attack: Attacking Locally and Globally to Boost Targeted Transferability

Hui Zeng, Sanshuai Cui, Biwei Chen et al.

AAAI 2025paperarXiv:2501.00707
3
citations

Failures to Find Transferable Image Jailbreaks Between Vision-Language Models

Rylan Schaeffer, Dan Valentine, Luke Bailey et al.

ICLR 2025arXiv:2407.15211
24
citations

Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning

Seanie Lee, Minsu Kim, Lynn Cherif et al.

ICLR 2025arXiv:2405.18540
47
citations

Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning

Hossein Rajoli Nowdeh, Jie Ji, Xiaolong Ma et al.

NEURIPS 2025arXiv:2510.24919

Optimal Control for Transformer Architectures: Enhancing Generalization, Robustness and Efficiency

Kelvin Kan, Xingjian Li, Benjamin Zhang et al.

NEURIPS 2025arXiv:2505.13499
3
citations

Perturb a Model, Not an Image: Towards Robust Privacy Protection via Anti-Personalized Diffusion Models

Tae-Young Lee, Juwon Seo, Jong Hwan Ko et al.

NEURIPS 2025arXiv:2511.01307

Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs

Itay Itzhak, Yonatan Belinkov, Gabriel Stanovsky

COLM 2025paperarXiv:2507.07186
3
citations

Remarkable Robustness of LLMs: Stages of Inference?

Vedang Lad, Jin Hwa Lee, Wes Gurnee et al.

NEURIPS 2025oralarXiv:2406.19384
95
citations

RepGuard: Adaptive Feature Decoupling for Robust Backdoor Defense in Large Language Models

Chenxu Niu, Jie Zhang, Yanbing Liu et al.

NEURIPS 2025

Resolution Attack: Exploiting Image Compression to Deceive Deep Neural Networks

Wangjia Yu, Xiaomeng Fu, Qiao Li et al.

ICLR 2025

Rethinking Evaluation of Infrared Small Target Detection

Youwei Pang, Xiaoqi Zhao, Lihe Zhang et al.

NEURIPS 2025arXiv:2509.16888

Seal Your Backdoor with Variational Defense

Ivan Sabolic, Matej Grcic, Siniša Šegvić

ICCV 2025arXiv:2503.08829
1
citations

Simple, Good, Fast: Self-Supervised World Models Free of Baggage

Jan Robine, Marc Höftmann, Stefan Harmeling

ICLR 2025arXiv:2506.02612
5
citations

Topological Zigzag Spaghetti for Diffusion-based Generation and Prediction on Graphs

Yuzhou Chen, Yulia Gel

ICLR 2025
3
citations

TransferBench: Benchmarking Ensemble-based Black-box Transfer Attacks

Fabio Brau, Maura Pintor, Antonio Cinà et al.

NEURIPS 2025

Transformer Layers as Painters

Qi Sun, Marc Pickett, Aakash Kumar Nain et al.

AAAI 2025paperarXiv:2407.09298
42
citations

Web Artifact Attacks Disrupt Vision Language Models

Maan Qraitem, Piotr Teterwak, Kate Saenko et al.

ICCV 2025arXiv:2503.13652
2
citations

Beyond the Federation: Topology-aware Federated Learning for Generalization to Unseen Clients

Mengmeng Ma, Tang Li, Xi Peng

ICML 2024arXiv:2407.04949
7
citations

Energy-based Backdoor Defense without Task-Specific Samples and Model Retraining

Yudong Gao, Honglong Chen, Peng Sun et al.

ICML 2024

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Mantas Mazeika, Long Phan, Xuwang Yin et al.

ICML 2024arXiv:2402.04249
802
citations

Improving SAM Requires Rethinking its Optimization Formulation

Wanyun Xie, Fabian Latorre, Kimon Antonakopoulos et al.

ICML 2024arXiv:2407.12993
4
citations

Interpretability-Guided Test-Time Adversarial Defense

Akshay Ravindra Kulkarni, Tsui-Wei Weng

ECCV 2024arXiv:2409.15190
3
citations

MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models

Xin Liu, Yichen Zhu, Jindong Gu et al.

ECCV 2024arXiv:2311.17600
199
citations

On the Robustness of Large Multimodal Models Against Image Adversarial Attacks

Xuanming Cui, Alejandro Aparcedo, Young Kyun Jang et al.

CVPR 2024arXiv:2312.03777
89
citations

Revealing the Dark Secrets of Extremely Large Kernel ConvNets on Robustness

Honghao Chen, Zhang Yurong, xiaokun Feng et al.

ICML 2024arXiv:2407.08972
10
citations

Robustness of Deep Learning for Accelerated MRI: Benefits of Diverse Training Data

Kang Lin, Reinhard Heckel

ICML 2024arXiv:2312.10271
9
citations

Unexplored Faces of Robustness and Out-of-Distribution: Covariate Shifts in Environment and Sensor Domains

Eunsu Baek, Keondo Park, Ji-yoon Kim et al.

CVPR 2024arXiv:2404.15882
12
citations

Unraveling Batch Normalization for Realistic Test-Time Adaptation

Zixian Su, Jingwei Guo, Kai Yao et al.

AAAI 2024paperarXiv:2312.09486
11
citations

Why Larger Language Models Do In-context Learning Differently?

Zhenmei Shi, Junyi Wei, Zhuoyan Xu et al.

ICML 2024arXiv:2405.19592
49
citations