"model robustness" Papers
38 papers found
Conference
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs
Anselm Paulus, Arman Zharmagambetov, Chuan Guo et al.
Aligning Visual Contrastive learning models via Preference Optimization
Amirabbas Afzali, Borna khodabandeh, Ali Rasekh et al.
An Evidence-Based Post-Hoc Adjustment Framework for Anomaly Detection Under Data Contamination
Sukanya Patra, Souhaib Ben Taieb
Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent
Christy Li, Josep Lopez Camuñas, Jake Touchet et al.
Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness
Qi Zhang, Yifei Wang, Jingyi Cui et al.
Buffer layers for Test-Time Adaptation
Hyeongyu Kim, GeonHui Han, Dosik Hwang
Can Knowledge Editing Really Correct Hallucinations?
Baixiang Huang, Canyu Chen, Xiongxiao Xu et al.
Competing Large Language Models in Multi-Agent Gaming Environments
Jen-Tse Huang, Eric John Li, Man Ho LAM et al.
Democratic Training Against Universal Adversarial Perturbations
Bing Sun, Jun Sun, Wei Zhao
Everywhere Attack: Attacking Locally and Globally to Boost Targeted Transferability
Hui Zeng, Sanshuai Cui, Biwei Chen et al.
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
Rylan Schaeffer, Dan Valentine, Luke Bailey et al.
Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning
Seanie Lee, Minsu Kim, Lynn Cherif et al.
Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning
Hossein Rajoli Nowdeh, Jie Ji, Xiaolong Ma et al.
Optimal Control for Transformer Architectures: Enhancing Generalization, Robustness and Efficiency
Kelvin Kan, Xingjian Li, Benjamin Zhang et al.
Perturb a Model, Not an Image: Towards Robust Privacy Protection via Anti-Personalized Diffusion Models
Tae-Young Lee, Juwon Seo, Jong Hwan Ko et al.
Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs
Itay Itzhak, Yonatan Belinkov, Gabriel Stanovsky
Remarkable Robustness of LLMs: Stages of Inference?
Vedang Lad, Jin Hwa Lee, Wes Gurnee et al.
RepGuard: Adaptive Feature Decoupling for Robust Backdoor Defense in Large Language Models
Chenxu Niu, Jie Zhang, Yanbing Liu et al.
Resolution Attack: Exploiting Image Compression to Deceive Deep Neural Networks
Wangjia Yu, Xiaomeng Fu, Qiao Li et al.
Rethinking Evaluation of Infrared Small Target Detection
Youwei Pang, Xiaoqi Zhao, Lihe Zhang et al.
Seal Your Backdoor with Variational Defense
Ivan Sabolic, Matej Grcic, Siniša Šegvić
Simple, Good, Fast: Self-Supervised World Models Free of Baggage
Jan Robine, Marc Höftmann, Stefan Harmeling
Topological Zigzag Spaghetti for Diffusion-based Generation and Prediction on Graphs
Yuzhou Chen, Yulia Gel
TransferBench: Benchmarking Ensemble-based Black-box Transfer Attacks
Fabio Brau, Maura Pintor, Antonio Cinà et al.
Transformer Layers as Painters
Qi Sun, Marc Pickett, Aakash Kumar Nain et al.
Web Artifact Attacks Disrupt Vision Language Models
Maan Qraitem, Piotr Teterwak, Kate Saenko et al.
Beyond the Federation: Topology-aware Federated Learning for Generalization to Unseen Clients
Mengmeng Ma, Tang Li, Xi Peng
Energy-based Backdoor Defense without Task-Specific Samples and Model Retraining
Yudong Gao, Honglong Chen, Peng Sun et al.
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Mantas Mazeika, Long Phan, Xuwang Yin et al.
Improving SAM Requires Rethinking its Optimization Formulation
Wanyun Xie, Fabian Latorre, Kimon Antonakopoulos et al.
Interpretability-Guided Test-Time Adversarial Defense
Akshay Ravindra Kulkarni, Tsui-Wei Weng
MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
Xin Liu, Yichen Zhu, Jindong Gu et al.
On the Robustness of Large Multimodal Models Against Image Adversarial Attacks
Xuanming Cui, Alejandro Aparcedo, Young Kyun Jang et al.
Revealing the Dark Secrets of Extremely Large Kernel ConvNets on Robustness
Honghao Chen, Zhang Yurong, xiaokun Feng et al.
Robustness of Deep Learning for Accelerated MRI: Benefits of Diverse Training Data
Kang Lin, Reinhard Heckel
Unexplored Faces of Robustness and Out-of-Distribution: Covariate Shifts in Environment and Sensor Domains
Eunsu Baek, Keondo Park, Ji-yoon Kim et al.
Unraveling Batch Normalization for Realistic Test-Time Adaptation
Zixian Su, Jingwei Guo, Kai Yao et al.
Why Larger Language Models Do In-context Learning Differently?
Zhenmei Shi, Junyi Wei, Zhuoyan Xu et al.