"multimodal dataset" Papers
15 papers found
Conference
CarbonSense: A Multimodal Dataset and Baseline for Carbon Flux Modelling
Matthew Fortier, Mats L. Richter, Oliver Sonnentag et al.
ICLR 2025arXiv:2406.04940
2
citations
CrypticBio: A Large Multimodal Dataset for Visually Confusing Species
Georgiana Manolache, Gerard Schouten, Joaquin Vanschoren
NEURIPS 2025oral
Extremely Simple Multimodal Outlier Synthesis for Out-of-Distribution Detection and Segmentation
Moru Liu, Hao Dong, Jessica Kelly et al.
NEURIPS 2025arXiv:2505.16985
4
citations
MM-CamObj: A Comprehensive Multimodal Dataset for Camouflaged Object Scenarios
Jiacheng Ruan, Wenzhen Yuan, Zehao Lin et al.
AAAI 2025paperarXiv:2409.16084
11
citations
MONITRS: Multimodal Observations of Natural Incidents Through Remote Sensing
Shreelekha Revankar, Utkarsh Mall, Cheng Perng Phoo et al.
NEURIPS 2025oralarXiv:2507.16228
Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions
Liang Xu, Chengqun Yang, Zili Lin et al.
ICCV 2025arXiv:2508.04681
2
citations
Pose as a Modality: A Psychology-Inspired Network for Personality Recognition with a New Multimodal Dataset
Bin Tang, Ke-Qi Pan, Miao Zheng et al.
AAAI 2025paperarXiv:2503.12912
1
citations
RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments
Haisheng Su, Feixiang Song, CONG MA et al.
CVPR 2025arXiv:2408.15503
6
citations
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models
Hongbo Liu, Jingwen He, Yi Jin et al.
NEURIPS 2025arXiv:2506.21356
7
citations
SK-VQA: Synthetic Knowledge Generation at Scale for Training Context-Augmented Multimodal LLMs
Xin Su, Man Luo, Kris Pan et al.
ICML 2025oralarXiv:2406.19593
6
citations
STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection
Divya Velayudhan, Abdelfatah Ahmed, Mohamad Alansari et al.
CVPR 2025highlightarXiv:2504.02823
2
citations
TAU-106K: A New Dataset for Comprehensive Understanding of Traffic Accident
Yixuan Zhou, Long Bai, Sijia Cai et al.
ICLR 2025oral
3
citations
Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation
Kaining Ying, Henghui Ding, Guangquan Jie et al.
ICCV 2025arXiv:2507.22886
6
citations
Whose View of Safety? A Deep DIVE Dataset for Pluralistic Alignment of Text-to-Image Models
Charvi Rastogi, Tian Huey Teh, Pushkar Mishra et al.
NEURIPS 2025spotlightarXiv:2507.13383
3
citations
A Touch, Vision, and Language Dataset for Multimodal Alignment
Letian Fu, Gaurav Datta, Huang Huang et al.
ICML 2024arXiv:2402.13232
74
citations