"trojan attacks" Papers
2 papers found
Conference
Concept-ROT: Poisoning Concepts in Large Language Models with Model Editing
Keltin Grimes, Marco Christiani, David Shriver et al.
ICLR 2025arXiv:2412.13341
6
citations
DISTIL: Data-Free Inversion of Suspicious Trojan Inputs via Latent Diffusion
Hossein Mirzaei, Zeinab Taghavi, Sepehr Rezaee et al.
ICCV 2025arXiv:2507.22813