by Qingcheng Zeng Papers
2 papers found
Conference
CARES: Comprehensive Evaluation of Safety and Adversarial Robustness in Medical LLMs
Sijia Chen, Xiaomin Li, mengxue zhang et al.
NEURIPS 2025arXiv:2505.11413
16
citations
ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning
Shulin Huang, Linyi Yang, Yan Song et al.
NEURIPS 2025arXiv:2502.16268
15
citations