"benchmark dataset" Papers
25 papers found
Conference
AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations
Junli Liu, Qizhi Chen, Zhigang Wang et al.
AIComposer: Any Style and Content Image Composition via Feature Integration
Haowen Li, Zhenfeng Fan, Zhang Wen et al.
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data and Metric Perspectives
Shaoyuan Xie, Lingdong Kong, Yuhao Dong et al.
BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics
Lukas Rauch, Raphael Schwinger, Moritz Wirth et al.
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?
Egor Zverev, Sahar Abdelnabi, Soroush Tabesh et al.
ClimaQA: An Automated Evaluation Framework for Climate Question Answering Models
Veeramakali Vignesh Manivannan, Yasaman Jafari, Srikar Eranky et al.
COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts
Jiansheng Li, Xingxuan Zhang, Hao Zou et al.
CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset
Xiao Wang, Fuling Wang, Yuehang Li et al.
DataSIR: A Benchmark Dataset for Sensitive Information Recognition
Fan Mo, Bo Liu, Yuan Fan et al.
DISTA-Net: Dynamic Closely-Spaced Infrared Small Target Unmixing
Shengdong Han, Shangdong Yang, Yuxuan Li et al.
Do Large Language Models Truly Understand Geometric Structures?
Xiaofeng Wang, Yiming Wang, Wenhong Zhu et al.
From Imitation to Innovation: The Emergence of AI's Unique Artistic Styles and the Challenge of Copyright Protection
Zexi Jia, Chuanwei Huang, Hongyan Fei et al.
GSOT3D: Towards Generic 3D Single Object Tracking in the Wild
Yifan Jiao, Yunhao Li, Junhua Ding et al.
Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline
Junlong Cheng, Bin Fu, Jin Ye et al.
LawShift: Benchmarking Legal Judgment Prediction Under Statute Shifts
Zhuo Han, Yi Yang, Yi Feng et al.
MMQA: Evaluating LLMs with Multi-Table Multi-Hop Complex Questions
Jian Wu, Linyi Yang, Dongyuan Li et al.
OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?
Junjielong Xu, Qinan Zhang, Zhiqing Zhong et al.
RoomEditor: High-Fidelity Furniture Synthesis with Parameter-Sharing U-Net
Zhenyi Lin, Xiaofan Ming, Qilong Wang et al.
Shot2Story: A New Benchmark for Comprehensive Understanding of Multi-shot Videos
Mingfei Han, Linjie Yang, Xiaojun Chang et al.
SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios
Kai Li, Wendi Sang, Chang Zeng et al.
A New Benchmark and Model for Challenging Image Manipulation Detection
Zhenfei Zhang, Mingyang Li, Ming-Ching Chang
DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition
Qi Wang, Zhou Xu, Yuming Lin et al.
EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval
Thomas Hummel, Shyamgopal Karthik, Mariana-Iuliana Georgescu et al.
HARDVS: Revisiting Human Activity Recognition with Dynamic Vision Sensors
Xiao Wang, Zongzhen Wu, Bo Jiang et al.
Towards More Practical Group Activity Detection: A New Benchmark and Model
Dongkeun Kim, Youngkil Song, Minsu Cho et al.