"multimodal datasets" Papers
9 papers found
Conference
Efficient Multimodal Dataset Distillation via Generative Models
Zhenghao Zhao, Haoxuan Wang, Junyi Wu et al.
NEURIPS 2025arXiv:2509.15472
2
citations
Filter Like You Test: Data-Driven Data Filtering for CLIP Pretraining
Mikey Shechter, Yair Carmon
NEURIPS 2025arXiv:2503.08805
2
citations
HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization
Zitang Zhou, Ke Mei, Yu Lu et al.
CVPR 2025arXiv:2503.01725
7
citations
Robust Cross-modal Alignment Learning for Cross-Scene Spatial Reasoning and Grounding
Yanglin Feng, Hongyuan Zhu, Dezhong Peng et al.
NEURIPS 2025
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
Hang Hua, Yunlong Tang, Chenliang Xu et al.
AAAI 2025paperarXiv:2404.12353
50
citations
Detecting and Preventing Hallucinations in Large Vision Language Models
Anisha Gunjal, Jihan Yin, Erhan Bas
AAAI 2024paperarXiv:2308.06394
264
citations
Differentially Private Representation Learning via Image Captioning
Tom Sander, Yaodong Yu, Maziar Sanjabi et al.
ICML 2024arXiv:2403.02506
7
citations
Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment
Brian Gordon, Yonatan Bitton, Yonatan Shafir et al.
ECCV 2024arXiv:2312.03766
17
citations
When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset
Yi Zhang, Wang Zeng, Sheng Jin et al.
ECCV 2024arXiv:2407.10125
21
citations