"vision encoder" Papers
3 papers found
Conference
Enhancing Temporal Understanding in Video-LLMs through Stacked Temporal Attention in Vision Encoders
Leibniz University Hannover, L3S Research Center Ali Rasekh, Erfan Soula, Omid Daliran et al.
NEURIPS 2025oralarXiv:2510.26027
1
citations
LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs
Haoran Lou, Chunxiao Fan, Ziyan Liu et al.
ICCV 2025arXiv:2507.00505
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Xianhang Li, Yanqing Liu, Haoqin Tu et al.
ICCV 2025arXiv:2505.04601
6
citations