"multi-modal models" Papers

11 papers found

Captured by Captions: On Memorization and its Mitigation in CLIP Models

Wenhao Wang, Adam Dziedzic, Grace Kim et al.

ICLR 2025arXiv:2502.07830
4
citations

Knowledge Graph Enhanced Generative Multi-modal Models for Class-Incremental Learning

Xusheng Cao, Haori Lu, Linlan Huang et al.

NEURIPS 2025arXiv:2503.18403
1
citations

LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance

Zhang Li, Biao Yang, Qiang Liu et al.

ICCV 2025arXiv:2507.06272
1
citations

LiveXiv - A Multi-Modal live benchmark based on Arxiv papers content

Nimrod Shabtay, Felipe Maia Polo, Sivan Doveh et al.

ICLR 2025arXiv:2410.10783
12
citations

LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models

Shenghao Fu, Qize Yang, Qijie Mo et al.

CVPR 2025highlightarXiv:2501.18954
35
citations

OBI-Bench: Can LMMs Aid in Study of Ancient Script on Oracle Bones?

Zijian Chen, tingzhu chen, Wenjun Zhang et al.

ICLR 2025arXiv:2412.01175
16
citations

SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning

Wufei Ma, Yu-Cheng Chou, Qihao Liu et al.

NEURIPS 2025arXiv:2504.20024
23
citations

SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes

Tony Alex, Sara Atito, Armin Mustafa et al.

ICLR 2025arXiv:2506.12222
10
citations

UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning

Ye Liu, Zongyang Ma, Junfu Pu et al.

NEURIPS 2025arXiv:2509.18094
4
citations

Youku Dense Caption: A Large-scale Chinese Video Dense Caption Dataset and Benchmarks

Zixuan Xiong, Guangwei Xu, wenkai zhang et al.

ICLR 2025

Think before Placement: Common Sense Enhanced Transformer for Object Placement

Yaxuan Qin, Jiayu Xu, Ruiping Wang et al.

ECCV 2024