"region captioning" Papers
3 papers found
Conference
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Chuofan Ma, Yi Jiang, Jiannan Wu et al.
ECCV 2024arXiv:2404.13013
107
citations
NExT-Chat: An LMM for Chat, Detection and Segmentation
Ao Zhang, Yuan Yao, Wei Ji et al.
ICML 2024arXiv:2311.04498
78
citations
Tokenize Anything via Prompting
Ting Pan, Lulu Tang, Xinlong Wang et al.
ECCV 2024arXiv:2312.09128
36
citations