"vision foundation models" Papers
44 papers found
Conference
Accessing Vision Foundation Models via ImageNet-1K
Yitian Zhang, Xu Ma, Yue Bai et al.
All-in-One: Transferring Vision Foundation Models into Stereo Matching
Jingyi Zhou, Haoyu Zhang, Jiakang Yuan et al.
ConceptScope: Characterizing Dataset Bias via Disentangled Visual Concepts
Jinho Choi, Hyesu Lim, Steffen Schneider et al.
Connecting Neural Models Latent Geometries with Relative Geodesic Representations
Hanlin Yu, Berfin Inal, Georgios Arvanitidis et al.
CustomKD: Customizing Large Vision Foundation for Edge Model Improvement via Knowledge Distillation
Jungsoo Lee, Debasmit Das, Munawar Hayat et al.
DEFOM-Stereo: Depth Foundation Model Based Stereo Matching
Hualie Jiang, Zhiqiang Lou, Laiyan Ding et al.
Depth AnyEvent: A Cross-Modal Distillation Paradigm for Event-Based Monocular Depth Estimation
Luca Bartolomei, Enrico Mannocci, Fabio Tosi et al.
DINO-Foresight: Looking into the Future with DINO
Efstathios Karypidis, Ioannis Kakogeorgiou, Spyridon Gidaris et al.
DON’T NEED RETRAINING: A Mixture of DETR and Vision Foundation Models for Cross-Domain Few-Shot Object Detection
Changhan Liu, xunzhi xiang, Zixuan Duan et al.
EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos
Jilan Xu, Yifei Huang, Baoqi Pei et al.
EmbodiedSAM: Online Segment Any 3D Thing in Real Time
Xiuwei Xu, Huangxing Chen, Linqing Zhao et al.
Equipping Vision Foundation Model with Mixture of Experts for Out-of-Distribution Detection
Shizhen Zhao, Jiahui Liu, Xin Wen et al.
Explore In-Context Segmentation via Latent Diffusion Models
Chaoyang Wang, Xiangtai Li, Henghui Ding et al.
Exploring Task-Level Optimal Prompts for Visual In-Context Learning
Yan Zhu, Huan Ma, Changqing Zhang
FastDINOv2: Frequency Based Curriculum Learning Improves Robustness and Training Speed
Jiaqi Zhang, Juntuo Wang, Zhixin Sun et al.
FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation
Dong Zhao, Jinlong Li, Shuang Wang et al.
FoundationStereo: Zero-Shot Stereo Matching
Bowen Wen, Matthew Trepte, Oluwaseun Joseph Aribido et al.
Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation
Yuheng Shi, Minjing Dong, Chang Xu
Learning Precise Affordances from Egocentric Videos for Robotic Manipulation
Li, Nikolaos Tsagkas, Jifei Song et al.
LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models
Haiwen Huang, Anpei Chen, Volodymyr Havrylov et al.
LUDVIG: Learning-Free Uplifting of 2D Visual Features to Gaussian Splatting Scenes
Juliette Marrie, Romain Menegaux, Michael Arbel et al.
Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation
Xin Zhang, Robby T. Tan
Mind the Gap: Aligning Vision Foundation Models to Image Feature Matching
Yuhan Liu, Jingwen Fu, Yang Wu et al.
Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning
Yang You, Yixin Li, Congyue Deng et al.
Near, far: Patch-ordering enhances vision foundation models' scene understanding
Valentinos Pariza, Mohammadreza Salehi, Gertjan J Burghouts et al.
One Polyp Identifies All: One-Shot Polyp Segmentation with SAM via Cascaded Priors and Iterative Prompt Evolution
Xinyu Mao, Xiaohan Xing, Fei MENG et al.
Online Segment Any 3D Thing as Instance Tracking
Hanshi Wang, Cai Zijian, Jin Gao et al.
OpenBox: Annotate Any Bounding Boxes in 3D
In-Jae Lee, Mungyeom Kim, Kwonyoung Ryu et al.
PICO: Reconstructing 3D People In Contact with Objects
Alpár Cseke, Shashank Tripathi, Sai Kumar Dwivedi et al.
SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation
Pengfei Chen, Lingxi Xie, xinyue huo et al.
STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning
Marius Memmel, Jacob Berg, Bingqing Chen et al.
Stronger, Steadier & Superior: Geometric Consistency in Depth VFM Forges Domain Generalized Semantic Segmentation
Siyu Chen, Ting Han, Changshe Zhang et al.
Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning
Yuxiang Lu, Shengcao Cao, Yu-Xiong Wang
Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video
David Yifan Yao, Albert J. Zhai, Shenlong Wang
ViPOcc: Leveraging Visual Priors from Vision Foundation Models for Single-View 3D Occupancy Prediction
Yi Feng, Yu Han, Xijing Zhang et al.
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Generation
Zheng Anlin, Xin Wen, Xuanyang Zhang et al.
Boosting Gaze Object Prediction via Pixel-level Supervision from Vision Foundation Model
Yang Jin, Lei Zhang, Shi Yan et al.
DHR: Dual Features-Driven Hierarchical Rebalancing in Inter- and Intra-Class Regions for Weakly-Supervised Semantic Segmentation
Sanghyun Jo, Fei Pan, In-Jae Yu et al.
FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models
Andrea Caraffa, Davide Boscaini, Amir Hamza et al.
Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models
Raviteja Vemulapalli, Hadi Pouransari, Fartash Faghri et al.
Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models
Hengyi Wang, Shiwei Tan, Hao Wang
RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering Assisted Distillation
Haiming Zhang, Xu Yan, Dongfeng Bai et al.
Unleashing the Power of Meta-tuning for Few-shot Generalization Through Sparse Interpolated Experts
Shengzhuang Chen, Jihoon Tack, Yunqiao Yang et al.
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Lianghui Zhu, Bencheng Liao, Qian Zhang et al.