"visual feature extraction" Papers
4 papers found
Conference
DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding
Xiaoyi Bao, Chen-Wei Xie, Hao Tang et al.
ICCV 2025arXiv:2507.15569
1
citations
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders
Fiona Ryan, Ajay Bati, Sangmin Lee et al.
CVPR 2025highlightarXiv:2412.09586
20
citations
Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching
Bin Wang, Fan Wu, Linke Ouyang et al.
CVPR 2025arXiv:2409.03643
13
citations
On the Out-Of-Distribution Generalization of Large Multimodal Models
Xingxuan Zhang, Jiansheng Li, Wenjing Chu et al.
CVPR 2025
4
citations