Poster "zero-shot learning" Papers
124 papers found • Page 3 of 3
Conference
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
Soroush Nasiriany, Fei Xia, Wenhao Yu et al.
Pix2Gif: Motion-Guided Diffusion for GIF Generation
Hitesh Kandala, Jianfeng Gao, Jianwei Yang
Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning
Shiming Chen, Wenjin Hou, Salman Khan et al.
Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
Mohaiminul Islam, Tushar Nagarajan, Huiyu Wang et al.
Recursive Visual Programming
Jiaxin Ge, Sanjay Subramanian, Baifeng Shi et al.
Revisiting the Role of Language Priors in Vision-Language Models
Zhiqiu Lin, Xinyue Chen, Deepak Pathak et al.
Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation
Yuanchen Ju, Kaizhe Hu, Guowei Zhang et al.
SAI3D: Segment Any Instance in 3D Scenes
Yingda Yin, Yuzheng Liu, Yang Xiao et al.
SelfVC: Voice Conversion With Iterative Refinement using Self Transformations
Paarth Neekhara, Shehzeen Hussain, Rafael Valle et al.
Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer
Rafail Fridman, Danah Yatim, Omer Bar-Tal et al.
Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation
Xinyao Li, Yuke Li, Zhekai Du et al.
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models
Haomiao Ni, Bernhard Egger, Suhas Lohit et al.
Tilt your Head: Activating the Hidden Spatial-Invariance of Classifiers
Johann Schmidt, Sebastian Stober
Towards Causal Foundation Model: on Duality between Optimal Balancing and Attention
Jiaqi Zhang, Joel Jennings, Agrin Hilmkil et al.
Training-free Video Temporal Grounding using Large-scale Pre-trained Models
Minghang Zheng, Xinhao Cai, Qingchao Chen et al.
VicTR: Video-conditioned Text Representations for Activity Recognition
Kumara Kahatapitiya, Anurag Arnab, Arsha Nagrani et al.
VideoCon: Robust Video-Language Alignment via Contrast Captions
Hritik Bansal, Yonatan Bitton, Idan Szpektor et al.
Video Question Answering with Procedural Programs
Rohan Choudhury, Koichiro Niinuma, Kris Kitani et al.
VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context
yunxin li, Baotian Hu, Haoyuan Shi et al.
Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models
Jinhao Li, Haopeng Li, Sarah Erfani et al.
ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image
Kyle Sargent, Zizhang Li, Tanmay Shah et al.
Zero-Shot Image Feature Consensus with Deep Functional Maps
Xinle Cheng, Congyue Deng, Adam Harley et al.
Zero-shot Text-guided Infinite Image Synthesis with LLM guidance
Soyeong Kwon, TAEGYEONG LEE, Taehwan Kim
ZeST: Zero-Shot Material Transfer from a Single Image
Ta-Ying Cheng, Prafull Sharma, Andrew Markham et al.