by Gelei Deng Papers
3 papers found
Conference
Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
Chenhang Cui, An Zhang, Yiyang Zhou et al.
ICLR 2025arXiv:2410.14148
13
citations
RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards
jingnan zheng, Xiangtian Ji, Yijun Lu et al.
NEURIPS 2025arXiv:2506.07736
11
citations
Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models
Chenhang Cui, Gelei Deng, An Zhang et al.
NEURIPS 2025arXiv:2411.11496
4
citations