Poster "adversarial attacks" Papers

83 papers found • Page 1 of 2

Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

Xiaojun Jia, Sensen Gao, Simeng Qin et al.

NEURIPS 2025arXiv:2505.21494
18
citations

Adversarial Attacks on Data Attribution

Xinhe Wang, Pingbang Hu, Junwei Deng et al.

ICLR 2025arXiv:2409.05657
1
citations

Adversarial Robustness of Discriminative Self-Supervised Learning in Vision

Ömer Veysel Çağatan, Ömer TAL, M. Emre Gursoy

ICCV 2025arXiv:2503.06361

Adversary Aware Optimization for Robust Defense

Daniel Wesego, Pedram Rooshenas

NEURIPS 2025

Boosting Adversarial Transferability with Spatial Adversarial Alignment

Zhaoyu Chen, HaiJing Guo, Kaixun Jiang et al.

NEURIPS 2025arXiv:2501.01015
1
citations

Confidence Elicitation: A New Attack Vector for Large Language Models

Brian Formento, Chuan Sheng Foo, See-Kiong Ng

ICLR 2025arXiv:2502.04643
2
citations

Democratic Training Against Universal Adversarial Perturbations

Bing Sun, Jun Sun, Wei Zhao

ICLR 2025arXiv:2502.05542
1
citations

DepthVanish: Optimizing Adversarial Interval Structures for Stereo-Depth-Invisible Patches

Yun Xing, Yue Cao, Nhat Chung et al.

NEURIPS 2025arXiv:2506.16690

Detecting Adversarial Data Using Perturbation Forgery

Qian Wang, Chen Li, Yuchen Luo et al.

CVPR 2025arXiv:2405.16226
3
citations

DIA: The Adversarial Exposure of Deterministic Inversion in Diffusion Models

SeungHoo Hong, GeonHo Son, Juhun Lee et al.

ICCV 2025arXiv:2510.00778

Endowing Visual Reprogramming with Adversarial Robustness

Shengjie Zhou, Xin Cheng, Haiyang Xu et al.

ICLR 2025
2
citations

Enhancing Graph Classification Robustness with Singular Pooling

Sofiane Ennadir, Oleg Smirnov, Yassine ABBAHADDOU et al.

NEURIPS 2025arXiv:2510.22643

Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models

Shuyang Hao, Bryan Hooi, Jun Liu et al.

CVPR 2025arXiv:2411.18000
6
citations

Fit the Distribution: Cross-Image/Prompt Adversarial Attacks on Multimodal Large Language Models

Hai Yan, Haijian Ma, Xiaowen Cai et al.

NEURIPS 2025

GSBA$^K$: $top$-$K$ Geometric Score-based Black-box Attack

Md Farhamdur Reza, Richeng Jin, Tianfu Wu et al.

ICLR 2025arXiv:2503.12827
3
citations

Instant Adversarial Purification with Adversarial Consistency Distillation

Chun Tong Lei, Hon Ming Yam, Zhongliang Guo et al.

CVPR 2025arXiv:2408.17064
13
citations

IPAD: Inverse Prompt for AI Detection - A Robust and Interpretable LLM-Generated Text Detector

Zheng CHEN, Yushi Feng, Jisheng Dang et al.

NEURIPS 2025arXiv:2502.15902

Jailbreaking as a Reward Misspecification Problem

Zhihui Xie, Jiahui Gao, Lei Li et al.

ICLR 2025arXiv:2406.14393
11
citations

Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency

Shiji Zhao, Ranjie Duan, Fengxiang Wang et al.

ICCV 2025arXiv:2501.04931
30
citations

Keeping an Eye on LLM Unlearning: The Hidden Risk and Remedy

Jie Ren, Zhenwei Dai, Xianfeng Tang et al.

NEURIPS 2025arXiv:2506.00359
7
citations

LARGO: Latent Adversarial Reflection through Gradient Optimization for Jailbreaking LLMs

Ran Li, Hao Wang, Chengzhi Mao

NEURIPS 2025arXiv:2505.10838
4
citations

MIP against Agent: Malicious Image Patches Hijacking Multimodal OS Agents

Lukas Aichberger, Alasdair Paren, Guohao Li et al.

NEURIPS 2025arXiv:2503.10809
10
citations

MOS-Attack: A Scalable Multi-objective Adversarial Attack Framework

Ping Guo, Cheng Gong, Fei Liu et al.

CVPR 2025arXiv:2501.07251

Non-Adaptive Adversarial Face Generation

Sunpill Kim, Seunghun Paik, Chanwoo Hwang et al.

NEURIPS 2025arXiv:2507.12107
1
citations

NoPain: No-box Point Cloud Attack via Optimal Transport Singular Boundary

Zezeng Li, Xiaoyu Du, Na Lei et al.

CVPR 2025arXiv:2503.00063
5
citations

On the Alignment between Fairness and Accuracy: from the Perspective of Adversarial Robustness

Junyi Chai, Taeuk Jang, Jing Gao et al.

ICML 2025

On the Stability of Graph Convolutional Neural Networks: A Probabilistic Perspective

Ning Zhang, Henry Kenlay, Li Zhang et al.

NEURIPS 2025arXiv:2506.01213

Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attack on Breast Ultrasound Images

Yasamin Medghalchi, Moein Heidari, Clayton Allard et al.

CVPR 2025arXiv:2412.09910
4
citations

Robust LLM safeguarding via refusal feature adversarial training

Lei Yu, Virginie Do, Karen Hambardzumyan et al.

ICLR 2025arXiv:2409.20089
45
citations

R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning

Lijun Sheng, Jian Liang, Zilei Wang et al.

CVPR 2025arXiv:2504.11195
15
citations

Safe RLHF-V: Safe Reinforcement Learning from Multi-modal Human Feedback

Jiaming Ji, Xinyu Chen, Rui Pan et al.

NEURIPS 2025arXiv:2503.17682
9
citations

SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations

Buyun Liang, Liangzu Peng, Jinqi Luo et al.

NEURIPS 2025arXiv:2510.04398

Stochastic Regret Guarantees for Online Zeroth- and First-Order Bilevel Optimization

Parvin Nazari, Bojian Hou, Davoud Ataee Tarzanagh et al.

NEURIPS 2025arXiv:2511.01126
2
citations

TAROT: Towards Essentially Domain-Invariant Robustness with Theoretical Justification

Dongyoon Yang, Jihu Lee, Yongdai Kim

CVPR 2025arXiv:2505.06580
1
citations

Towards a 3D Transfer-based Black-box Attack via Critical Feature Guidance

Shuchao Pang, Zhenghan Chen, Shen Zhang et al.

ICCV 2025arXiv:2508.15650
2
citations

Towards Certification of Uncertainty Calibration under Adversarial Attacks

Cornelius Emde, Francesco Pinto, Thomas Lukasiewicz et al.

ICLR 2025arXiv:2405.13922
2
citations

Towards Understanding the Robustness of Diffusion-Based Purification: A Stochastic Perspective

Yiming Liu, Kezhao Liu, Yao Xiao et al.

ICLR 2025arXiv:2404.14309
7
citations

Understanding and Improving Adversarial Robustness of Neural Probabilistic Circuits

Weixin Chen, Han Zhao

NEURIPS 2025arXiv:2509.20549

Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning

Sunwoo Lee, Jaebak Hwang, Yonghyeon Jo et al.

ICML 2025arXiv:2502.02844
1
citations

$\texttt{MoE-RBench}$: Towards Building Reliable Language Models with Sparse Mixture-of-Experts

Guanjie Chen, Xinyu Zhao, Tianlong Chen et al.

ICML 2024arXiv:2406.11353
6
citations

Adversarially Robust Deep Multi-View Clustering: A Novel Attack and Defense Framework

Haonan Huang, Guoxu Zhou, Yanghang Zheng et al.

ICML 2024

Adversarial Prompt Tuning for Vision-Language Models

Jiaming Zhang, Xingjun Ma, Xin Wang et al.

ECCV 2024arXiv:2311.11261
34
citations

A Secure Image Watermarking Framework with Statistical Guarantees via Adversarial Attacks on Secret Key Networks

Feiyu CHEN, Wei Lin, Ziquan Liu et al.

ECCV 2024
1
citations

Breaking the Barrier: Enhanced Utility and Robustness in Smoothed DRL Agents

Chung-En Sun, Sicun Gao, Lily Weng

ICML 2024arXiv:2406.18062
6
citations

Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models

Vitali Petsiuk, Kate Saenko

ECCV 2024arXiv:2404.13706
8
citations

CosPGD: an efficient white-box adversarial attack for pixel-wise prediction tasks

Shashank Agnihotri, Steffen Jung, Margret Keuper

ICML 2024arXiv:2302.02213
30
citations

DataFreeShield: Defending Adversarial Attacks without Training Data

Hyeyoon Lee, Kanghyun Choi, Dain Kwon et al.

ICML 2024arXiv:2406.15635
1
citations

Defense Against Adversarial Attacks on No-Reference Image Quality Models with Gradient Norm Regularization

Yujia Liu, Chenxi Yang, Dingquan Li et al.

CVPR 2024arXiv:2403.11397
12
citations

Defense without Forgetting: Continual Adversarial Defense with Anisotropic & Isotropic Pseudo Replay

Yuhang Zhou, Zhongyun Hua

CVPR 2024arXiv:2404.01828
7
citations

Enhancing Adversarial Robustness in SNNs with Sparse Gradients

Yujia Liu, Tong Bu, Ding Jianhao et al.

ICML 2024arXiv:2405.20355
14
citations
PreviousNext