"zero-shot learning" Papers
164 papers found • Page 3 of 4
Conference
A Fixed-Point Approach for Causal Generative Modeling
Meyer Scetbon, Joel Jennings, Agrin Hilmkil et al.
Align Before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition
Yifei Chen, Dapeng Chen, Ruijin Liu et al.
Anchor-based Robust Finetuning of Vision-Language Models
Jinwei Han, Zhiwen Lin, Zhongyisun Sun et al.
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
William Zhu, Keren Ye, Junjie Ke et al.
BDIQA: A New Dataset for Video Question Answering to Explore Cognitive Reasoning through Theory of Mind
Yuanyuan Mao, Xin Lin, Qin Ni et al.
Binding Touch to Everything: Learning Unified Multimodal Tactile Representations
Fengyu Yang, Chao Feng, Ziyang Chen et al.
Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models
Weiwei Cao, Jianpeng Zhang, Yingda Xia et al.
C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition
Rongchang Li, Zhenhua Feng, Tianyang Xu et al.
Chinese Spelling Correction as Rephrasing Language Model
Linfeng Liu, Hongqiu Wu, Hai Zhao
Commonsense for Zero-Shot Natural Language Video Localization
Meghana Holla, Ismini Lourentzou
Context-I2W: Mapping Images to Context-Dependent Words for Accurate Zero-Shot Composed Image Retrieval
Yuanmin Tang, Jing Yu, Keke Gai et al.
CORECODE: A Common Sense Annotated Dialogue Dataset with Benchmark Tasks for Chinese Large Language Models
Dan Shi, Chaobin You, Jian-Tao Huang et al.
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification
Long-Fei Li, Peng Zhao, Zhi-Hua Zhou
Data-Free Generalized Zero-Shot Learning
Bowen Tang, Jing Zhang, Yan Long et al.
DeCoOp: Robust Prompt Tuning with Out-of-Distribution Detection
Zhi Zhou, Ming Yang, Jiang-Xin Shi et al.
Do Generalised Classifiers really work on Human Drawn Sketches?
Hmrishav Bandyopadhyay, Pinaki Nath Chowdhury, Aneeshan Sain et al.
DreamComposer: Controllable 3D Object Generation via Multi-View Conditions
Yunhan Yang, Yukun Huang, Xiaoyang Wu et al.
E3M: Zero-Shot Spatio-Temporal Video Grounding with Expectation-Maximization Multimodal Modulation
Peijun Bao, Zihao Shao, Wenhan Yang et al.
ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis
Jungil Kong, Junmo Lee, Jeongmin Kim et al.
FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance
Jiedong Zhuang, Jiaqi Hu, Lianrui Mu et al.
FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior
Zhekai Chen, Wen Wang, Zhen Yang et al.
GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes
Ibrahim Ethem Hamamci, Sezgin Er, Anjany Sekuboyina et al.
GroundVLP: Harnessing Zero-Shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection
Haozhan Shen, Tiancheng Zhao, Mingwei Zhu et al.
HiFi-123: Towards High-fidelity One Image to 3D Content Generation
Wangbo Yu, Li Yuan, Yanpei Cao et al.
Image Captioning with Multi-Context Synthetic Data
Feipeng Ma, Y. Zhou, Fengyun Rao et al.
Improving Diffusion Models for Inverse Problems Using Optimal Posterior Covariance
Xinyu Peng, Ziyang Zheng, Wenrui Dai et al.
InstructDoc: A Dataset for Zero
Shot Generalization of Visual Document Understanding with Instructions - Ryota Tanaka, Taichi Iki, Kyosuke Nishida et al.
Interactive Visual Task Learning for Robots
Weiwei Gu, Anant Sah, N. Gopalan
Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval
Yucheng Suo, Fan Ma, Linchao Zhu et al.
LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction
Penghui Du, Yu Wang, Yifan Sun et al.
LangCell: Language-Cell Pre-training for Cell Identity Understanding
Suyuan Zhao, Jiahuan Zhang, Yushuai Wu et al.
Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D
Mukund Varma T, Peihao Wang, Zhiwen Fan et al.
Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields
Yonggan Fu, Huaizhi Qu, Zhifan Ye et al.
OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model
Runyi Li, Xuhan SHENG, Weiqi Li et al.
PEARL: Zero-shot Cross-task Preference Alignment and Robust Reward Learning for Robotic Manipulation
Runze Liu, Yali Du, Fengshuo Bai et al.
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
Soroush Nasiriany, Fei Xia, Wenhao Yu et al.
Pix2Gif: Motion-Guided Diffusion for GIF Generation
Hitesh Kandala, Jianfeng Gao, Jianwei Yang
Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning
Shiming Chen, Wenjin Hou, Salman Khan et al.
Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer
Yaoting Wang, Liu Weisong, Guangyao Li et al.
Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
Mohaiminul Islam, Tushar Nagarajan, Huiyu Wang et al.
Recursive Visual Programming
Jiaxin Ge, Sanjay Subramanian, Baifeng Shi et al.
Revisiting the Role of Language Priors in Vision-Language Models
Zhiqiu Lin, Xinyue Chen, Deepak Pathak et al.
Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation
Yuanchen Ju, Kaizhe Hu, Guowei Zhang et al.
SAI3D: Segment Any Instance in 3D Scenes
Yingda Yin, Yuzheng Liu, Yang Xiao et al.
SelfVC: Voice Conversion With Iterative Refinement using Self Transformations
Paarth Neekhara, Shehzeen Hussain, Rafael Valle et al.
Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer
Rafail Fridman, Danah Yatim, Omer Bar-Tal et al.
Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation
Xinyao Li, Yuke Li, Zhekai Du et al.
StyleSinger: Style Transfer for Out
of-Domain Singing Voice Synthesis
Task Contamination: Language Models May Not Be Few-Shot Anymore
Changmao Li, Jeffrey Flanigan
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models
Haomiao Ni, Bernhard Egger, Suhas Lohit et al.