"out-of-distribution evaluation" Papers
3 papers found
Conference
Do ImageNet-trained Models Learn Shortcuts? The Impact of Frequency Shortcuts on Generalization
Shunxin Wang, Raymond Veldhuis, Nicola Strisciuglio
CVPR 2025arXiv:2503.03519
2
citations
KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks
Kaijing Ma, Xeron Du, Yunran Wang et al.
ICLR 2025arXiv:2410.06526
55
citations
ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning
Shulin Huang, Linyi Yang, Yan Song et al.
NEURIPS 2025arXiv:2502.16268
15
citations