"vision foundation models" Papers

44 papers found

Accessing Vision Foundation Models via ImageNet-1K

Yitian Zhang, Xu Ma, Yue Bai et al.

ICLR 2025arXiv:2407.10366
8
citations

All-in-One: Transferring Vision Foundation Models into Stereo Matching

Jingyi Zhou, Haoyu Zhang, Jiakang Yuan et al.

AAAI 2025paperarXiv:2412.09912
9
citations

ConceptScope: Characterizing Dataset Bias via Disentangled Visual Concepts

Jinho Choi, Hyesu Lim, Steffen Schneider et al.

NEURIPS 2025arXiv:2510.26186

Connecting Neural Models Latent Geometries with Relative Geodesic Representations

Hanlin Yu, Berfin Inal, Georgios Arvanitidis et al.

NEURIPS 2025arXiv:2506.01599
2
citations

CustomKD: Customizing Large Vision Foundation for Edge Model Improvement via Knowledge Distillation

Jungsoo Lee, Debasmit Das, Munawar Hayat et al.

CVPR 2025arXiv:2503.18244
4
citations

DEFOM-Stereo: Depth Foundation Model Based Stereo Matching

Hualie Jiang, Zhiqiang Lou, Laiyan Ding et al.

CVPR 2025arXiv:2501.09466
40
citations

Depth AnyEvent: A Cross-Modal Distillation Paradigm for Event-Based Monocular Depth Estimation

Luca Bartolomei, Enrico Mannocci, Fabio Tosi et al.

ICCV 2025arXiv:2509.15224
1
citations

DINO-Foresight: Looking into the Future with DINO

Efstathios Karypidis, Ioannis Kakogeorgiou, Spyridon Gidaris et al.

NEURIPS 2025arXiv:2412.11673
18
citations

DON’T NEED RETRAINING: A Mixture of DETR and Vision Foundation Models for Cross-Domain Few-Shot Object Detection

Changhan Liu, xunzhi xiang, Zixuan Duan et al.

NEURIPS 2025

EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos

Jilan Xu, Yifei Huang, Baoqi Pei et al.

ICLR 2025oralarXiv:2504.11732
16
citations

EmbodiedSAM: Online Segment Any 3D Thing in Real Time

Xiuwei Xu, Huangxing Chen, Linqing Zhao et al.

ICLR 2025arXiv:2408.11811
35
citations

Equipping Vision Foundation Model with Mixture of Experts for Out-of-Distribution Detection

Shizhen Zhao, Jiahui Liu, Xin Wen et al.

ICCV 2025arXiv:2510.10584
1
citations

Explore In-Context Segmentation via Latent Diffusion Models

Chaoyang Wang, Xiangtai Li, Henghui Ding et al.

AAAI 2025paperarXiv:2403.09616
14
citations

Exploring Task-Level Optimal Prompts for Visual In-Context Learning

Yan Zhu, Huan Ma, Changqing Zhang

AAAI 2025paperarXiv:2501.08841
2
citations

FastDINOv2: Frequency Based Curriculum Learning Improves Robustness and Training Speed

Jiaqi Zhang, Juntuo Wang, Zhixin Sun et al.

NEURIPS 2025arXiv:2507.03779
1
citations

FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation

Dong Zhao, Jinlong Li, Shuang Wang et al.

CVPR 2025arXiv:2503.17940
10
citations

FoundationStereo: Zero-Shot Stereo Matching

Bowen Wen, Matthew Trepte, Oluwaseun Joseph Aribido et al.

CVPR 2025arXiv:2501.09898
105
citations

Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation

Yuheng Shi, Minjing Dong, Chang Xu

ICCV 2025arXiv:2411.09219
15
citations

Learning Precise Affordances from Egocentric Videos for Robotic Manipulation

Li, Nikolaos Tsagkas, Jifei Song et al.

ICCV 2025arXiv:2408.10123
17
citations

LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models

Haiwen Huang, Anpei Chen, Volodymyr Havrylov et al.

ICCV 2025arXiv:2504.14032
12
citations

LUDVIG: Learning-Free Uplifting of 2D Visual Features to Gaussian Splatting Scenes

Juliette Marrie, Romain Menegaux, Michael Arbel et al.

ICCV 2025arXiv:2410.14462
13
citations

Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation

Xin Zhang, Robby T. Tan

CVPR 2025highlightarXiv:2504.03193
20
citations

Mind the Gap: Aligning Vision Foundation Models to Image Feature Matching

Yuhan Liu, Jingwen Fu, Yang Wu et al.

ICCV 2025arXiv:2507.10318

Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning

Yang You, Yixin Li, Congyue Deng et al.

ICLR 2025arXiv:2411.19458
8
citations

Near, far: Patch-ordering enhances vision foundation models' scene understanding

Valentinos Pariza, Mohammadreza Salehi, Gertjan J Burghouts et al.

ICLR 2025arXiv:2408.11054
9
citations

One Polyp Identifies All: One-Shot Polyp Segmentation with SAM via Cascaded Priors and Iterative Prompt Evolution

Xinyu Mao, Xiaohan Xing, Fei MENG et al.

ICCV 2025arXiv:2507.16337
2
citations

Online Segment Any 3D Thing as Instance Tracking

Hanshi Wang, Cai Zijian, Jin Gao et al.

NEURIPS 2025oralarXiv:2512.07599
1
citations

OpenBox: Annotate Any Bounding Boxes in 3D

In-Jae Lee, Mungyeom Kim, Kwonyoung Ryu et al.

NEURIPS 2025spotlightarXiv:2512.01352
1
citations

PICO: Reconstructing 3D People In Contact with Objects

Alpár Cseke, Shashank Tripathi, Sai Kumar Dwivedi et al.

CVPR 2025arXiv:2504.17695
9
citations

SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation

Pengfei Chen, Lingxi Xie, xinyue huo et al.

ICLR 2025arXiv:2407.16682
5
citations

STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning

Marius Memmel, Jacob Berg, Bingqing Chen et al.

ICLR 2025arXiv:2412.15182
22
citations

Stronger, Steadier & Superior: Geometric Consistency in Depth VFM Forges Domain Generalized Semantic Segmentation

Siyu Chen, Ting Han, Changshe Zhang et al.

ICCV 2025arXiv:2504.12753
2
citations

Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning

Yuxiang Lu, Shengcao Cao, Yu-Xiong Wang

ICLR 2025arXiv:2410.14633
6
citations

Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video

David Yifan Yao, Albert J. Zhai, Shenlong Wang

CVPR 2025highlightarXiv:2503.21761
14
citations

ViPOcc: Leveraging Visual Priors from Vision Foundation Models for Single-View 3D Occupancy Prediction

Yi Feng, Yu Han, Xijing Zhang et al.

AAAI 2025paperarXiv:2412.11210
7
citations

Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Generation

Zheng Anlin, Xin Wen, Xuanyang Zhang et al.

NEURIPS 2025
9
citations

Boosting Gaze Object Prediction via Pixel-level Supervision from Vision Foundation Model

Yang Jin, Lei Zhang, Shi Yan et al.

ECCV 2024arXiv:2408.01044
3
citations

DHR: Dual Features-Driven Hierarchical Rebalancing in Inter- and Intra-Class Regions for Weakly-Supervised Semantic Segmentation

Sanghyun Jo, Fei Pan, In-Jae Yu et al.

ECCV 2024arXiv:2404.00380
6
citations

FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models

Andrea Caraffa, Davide Boscaini, Amir Hamza et al.

ECCV 2024arXiv:2312.00947
47
citations

Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models

Raviteja Vemulapalli, Hadi Pouransari, Fartash Faghri et al.

ICML 2024arXiv:2311.18237
13
citations

Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models

Hengyi Wang, Shiwei Tan, Hao Wang

ICML 2024arXiv:2406.12649
9
citations

RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering Assisted Distillation

Haiming Zhang, Xu Yan, Dongfeng Bai et al.

AAAI 2024paperarXiv:2312.11829
32
citations

Unleashing the Power of Meta-tuning for Few-shot Generalization Through Sparse Interpolated Experts

Shengzhuang Chen, Jihoon Tack, Yunqiao Yang et al.

ICML 2024arXiv:2403.08477
4
citations

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Lianghui Zhu, Bencheng Liao, Qian Zhang et al.

ICML 2024arXiv:2401.09417
1457
citations