"vision foundation models" Papers

44 papers found

Filters:vision foundation models Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

Accessing Vision Foundation Models via ImageNet-1K

Yitian Zhang, Xu Ma, Yue Bai et al.

ICLR 2025arXiv:2407.10366

citations

All-in-One: Transferring Vision Foundation Models into Stereo Matching

Jingyi Zhou, Haoyu Zhang, Jiakang Yuan et al.

AAAI 2025paperarXiv:2412.09912

citations

ConceptScope: Characterizing Dataset Bias via Disentangled Visual Concepts

Jinho Choi, Hyesu Lim, Steffen Schneider et al.

NEURIPS 2025arXiv:2510.26186

Connecting Neural Models Latent Geometries with Relative Geodesic Representations

Hanlin Yu, Berfin Inal, Georgios Arvanitidis et al.

NEURIPS 2025arXiv:2506.01599

citations

CustomKD: Customizing Large Vision Foundation for Edge Model Improvement via Knowledge Distillation

Jungsoo Lee, Debasmit Das, Munawar Hayat et al.

CVPR 2025arXiv:2503.18244

citations

DEFOM-Stereo: Depth Foundation Model Based Stereo Matching

Hualie Jiang, Zhiqiang Lou, Laiyan Ding et al.

CVPR 2025arXiv:2501.09466

citations

Depth AnyEvent: A Cross-Modal Distillation Paradigm for Event-Based Monocular Depth Estimation

Luca Bartolomei, Enrico Mannocci, Fabio Tosi et al.

ICCV 2025arXiv:2509.15224

citations

DINO-Foresight: Looking into the Future with DINO

Efstathios Karypidis, Ioannis Kakogeorgiou, Spyridon Gidaris et al.

NEURIPS 2025arXiv:2412.11673

citations

DON’T NEED RETRAINING: A Mixture of DETR and Vision Foundation Models for Cross-Domain Few-Shot Object Detection

Changhan Liu, xunzhi xiang, Zixuan Duan et al.

NEURIPS 2025

EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos

Jilan Xu, Yifei Huang, Baoqi Pei et al.

ICLR 2025oralarXiv:2504.11732

citations

EmbodiedSAM: Online Segment Any 3D Thing in Real Time

Xiuwei Xu, Huangxing Chen, Linqing Zhao et al.

ICLR 2025arXiv:2408.11811

citations

Equipping Vision Foundation Model with Mixture of Experts for Out-of-Distribution Detection

Shizhen Zhao, Jiahui Liu, Xin Wen et al.

ICCV 2025arXiv:2510.10584

citations

Explore In-Context Segmentation via Latent Diffusion Models

Chaoyang Wang, Xiangtai Li, Henghui Ding et al.

AAAI 2025paperarXiv:2403.09616

citations

Exploring Task-Level Optimal Prompts for Visual In-Context Learning

Yan Zhu, Huan Ma, Changqing Zhang

AAAI 2025paperarXiv:2501.08841

citations

FastDINOv2: Frequency Based Curriculum Learning Improves Robustness and Training Speed

Jiaqi Zhang, Juntuo Wang, Zhixin Sun et al.

NEURIPS 2025arXiv:2507.03779

citations

FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation

Dong Zhao, Jinlong Li, Shuang Wang et al.

CVPR 2025arXiv:2503.17940

citations

FoundationStereo: Zero-Shot Stereo Matching

Bowen Wen, Matthew Trepte, Oluwaseun Joseph Aribido et al.

CVPR 2025arXiv:2501.09898

105

citations

Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation

Yuheng Shi, Minjing Dong, Chang Xu

ICCV 2025arXiv:2411.09219

citations

Learning Precise Affordances from Egocentric Videos for Robotic Manipulation

Li, Nikolaos Tsagkas, Jifei Song et al.

ICCV 2025arXiv:2408.10123

citations

LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models

Haiwen Huang, Anpei Chen, Volodymyr Havrylov et al.

ICCV 2025arXiv:2504.14032

citations

LUDVIG: Learning-Free Uplifting of 2D Visual Features to Gaussian Splatting Scenes

Juliette Marrie, Romain Menegaux, Michael Arbel et al.

ICCV 2025arXiv:2410.14462

citations

Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation

Xin Zhang, Robby T. Tan

CVPR 2025highlightarXiv:2504.03193

citations

Mind the Gap: Aligning Vision Foundation Models to Image Feature Matching

Yuhan Liu, Jingwen Fu, Yang Wu et al.

ICCV 2025arXiv:2507.10318

Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning

Yang You, Yixin Li, Congyue Deng et al.

ICLR 2025arXiv:2411.19458

citations

Near, far: Patch-ordering enhances vision foundation models' scene understanding

Valentinos Pariza, Mohammadreza Salehi, Gertjan J Burghouts et al.

ICLR 2025arXiv:2408.11054

citations

One Polyp Identifies All: One-Shot Polyp Segmentation with SAM via Cascaded Priors and Iterative Prompt Evolution

Xinyu Mao, Xiaohan Xing, Fei MENG et al.

ICCV 2025arXiv:2507.16337

citations

Online Segment Any 3D Thing as Instance Tracking

Hanshi Wang, Cai Zijian, Jin Gao et al.

NEURIPS 2025oralarXiv:2512.07599

citations

OpenBox: Annotate Any Bounding Boxes in 3D

In-Jae Lee, Mungyeom Kim, Kwonyoung Ryu et al.

NEURIPS 2025spotlightarXiv:2512.01352

citations

PICO: Reconstructing 3D People In Contact with Objects

Alpár Cseke, Shashank Tripathi, Sai Kumar Dwivedi et al.

CVPR 2025arXiv:2504.17695

citations

SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation

Pengfei Chen, Lingxi Xie, xinyue huo et al.

ICLR 2025arXiv:2407.16682

citations

STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning

Marius Memmel, Jacob Berg, Bingqing Chen et al.

ICLR 2025arXiv:2412.15182

citations

Stronger, Steadier & Superior: Geometric Consistency in Depth VFM Forges Domain Generalized Semantic Segmentation

Siyu Chen, Ting Han, Changshe Zhang et al.

ICCV 2025arXiv:2504.12753

citations

Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning

Yuxiang Lu, Shengcao Cao, Yu-Xiong Wang

ICLR 2025arXiv:2410.14633

citations

Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video

David Yifan Yao, Albert J. Zhai, Shenlong Wang

CVPR 2025highlightarXiv:2503.21761

citations

ViPOcc: Leveraging Visual Priors from Vision Foundation Models for Single-View 3D Occupancy Prediction

Yi Feng, Yu Han, Xijing Zhang et al.

AAAI 2025paperarXiv:2412.11210

citations

Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Generation

Zheng Anlin, Xin Wen, Xuanyang Zhang et al.

NEURIPS 2025

citations

Boosting Gaze Object Prediction via Pixel-level Supervision from Vision Foundation Model

Yang Jin, Lei Zhang, Shi Yan et al.

ECCV 2024arXiv:2408.01044

citations

DHR: Dual Features-Driven Hierarchical Rebalancing in Inter- and Intra-Class Regions for Weakly-Supervised Semantic Segmentation

Sanghyun Jo, Fei Pan, In-Jae Yu et al.

ECCV 2024arXiv:2404.00380

citations

FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models

Andrea Caraffa, Davide Boscaini, Amir Hamza et al.

ECCV 2024arXiv:2312.00947

citations

Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models

Raviteja Vemulapalli, Hadi Pouransari, Fartash Faghri et al.

ICML 2024arXiv:2311.18237

citations

Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models

Hengyi Wang, Shiwei Tan, Hao Wang

ICML 2024arXiv:2406.12649

citations

RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering Assisted Distillation

Haiming Zhang, Xu Yan, Dongfeng Bai et al.

AAAI 2024paperarXiv:2312.11829

citations

Unleashing the Power of Meta-tuning for Few-shot Generalization Through Sparse Interpolated Experts

Shengzhuang Chen, Jihoon Tack, Yunqiao Yang et al.

ICML 2024arXiv:2403.08477

citations

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Lianghui Zhu, Bencheng Liao, Qian Zhang et al.

ICML 2024arXiv:2401.09417

1457

citations