"visual token compression" Papers
9 papers found
Conference
ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
Zongyi Li, Shujie HU, Shujie LIU et al.
ICLR 2025oralarXiv:2410.20502
28
citations
Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM
Han Wang, Yuxiang Nie, Yongjie Ye et al.
ICCV 2025arXiv:2412.09530
15
citations
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
Zichen Wen, Shaobo Wang, Yufa Zhou et al.
NEURIPS 2025arXiv:2510.00515
9
citations
FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression
Bo Tong, Bokai Lai, Yiyi Zhou et al.
CVPR 2025arXiv:2412.04317
4
citations
Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid
Mingxin Huang, Yuliang Liu, Dingkang Liang et al.
ICLR 2025arXiv:2408.02034
22
citations
OpenMMEgo: Enhancing Egocentric Understanding for LMMs with Open Weights and Data
Hao Luo, Zihao Yue, Wanpeng Zhang et al.
NEURIPS 2025oral
Recoverable Compression: A Multimodal Vision Token Recovery Mechanism Guided by Text Information
Yi Chen, Jian Xu, Xu-Yao Zhang et al.
AAAI 2025paperarXiv:2409.01179
15
citations
StreamForest: Efficient Online Video Understanding with Persistent Event Memory
Xiangyu Zeng, Kefan Qiu, Qingyu Zhang et al.
NEURIPS 2025oralarXiv:2509.24871
6
citations
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
Yan Shu, Zheng Liu, Peitian Zhang et al.
CVPR 2025arXiv:2409.14485
155
citations