"vision-language learning" Papers
4 papers found
Conference
Multi-Scale Contrastive Learning for Video Temporal Grounding
Thong Thanh Nguyen, Yi Bin, Xiaobao Wu et al.
AAAI 2025paperarXiv:2412.07157
3
citations
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya, Po-Yao Huang, Peize Sun et al.
NEURIPS 2025oralarXiv:2504.13181
129
citations
Referring Expression Comprehension for Small Objects
Kanoko Goto, Takumi Hirose, Mahiro Ukai et al.
ICCV 2025arXiv:2510.03701
1
citations
Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality Assessment
Huangbiao Xu, Xiao Ke, Yuezhou Li et al.
ECCV 2024
14
citations