"multi-modal models" Papers
11 papers found
Conference
Captured by Captions: On Memorization and its Mitigation in CLIP Models
Wenhao Wang, Adam Dziedzic, Grace Kim et al.
ICLR 2025arXiv:2502.07830
4
citations
Knowledge Graph Enhanced Generative Multi-modal Models for Class-Incremental Learning
Xusheng Cao, Haori Lu, Linlan Huang et al.
NEURIPS 2025arXiv:2503.18403
1
citations
LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance
Zhang Li, Biao Yang, Qiang Liu et al.
ICCV 2025arXiv:2507.06272
1
citations
LiveXiv - A Multi-Modal live benchmark based on Arxiv papers content
Nimrod Shabtay, Felipe Maia Polo, Sivan Doveh et al.
ICLR 2025arXiv:2410.10783
12
citations
LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models
Shenghao Fu, Qize Yang, Qijie Mo et al.
CVPR 2025highlightarXiv:2501.18954
35
citations
OBI-Bench: Can LMMs Aid in Study of Ancient Script on Oracle Bones?
Zijian Chen, tingzhu chen, Wenjun Zhang et al.
ICLR 2025arXiv:2412.01175
16
citations
SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning
Wufei Ma, Yu-Cheng Chou, Qihao Liu et al.
NEURIPS 2025arXiv:2504.20024
23
citations
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes
Tony Alex, Sara Atito, Armin Mustafa et al.
ICLR 2025arXiv:2506.12222
10
citations
UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning
Ye Liu, Zongyang Ma, Junfu Pu et al.
NEURIPS 2025arXiv:2509.18094
4
citations
Youku Dense Caption: A Large-scale Chinese Video Dense Caption Dataset and Benchmarks
Zixuan Xiong, Guangwei Xu, wenkai zhang et al.
ICLR 2025
Think before Placement: Common Sense Enhanced Transformer for Object Placement
Yaxuan Qin, Jiayu Xu, Ruiping Wang et al.
ECCV 2024