"zero-shot learning" Papers

164 papers found • Page 3 of 4

A Fixed-Point Approach for Causal Generative Modeling

Meyer Scetbon, Joel Jennings, Agrin Hilmkil et al.

ICML 2024arXiv:2404.06969
4
citations

Align Before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition

Yifei Chen, Dapeng Chen, Ruijin Liu et al.

CVPR 2024arXiv:2311.15619
17
citations

Anchor-based Robust Finetuning of Vision-Language Models

Jinwei Han, Zhiwen Lin, Zhongyisun Sun et al.

CVPR 2024arXiv:2404.06244
10
citations

ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling

William Zhu, Keren Ye, Junjie Ke et al.

ECCV 2024arXiv:2408.04102
2
citations

BDIQA: A New Dataset for Video Question Answering to Explore Cognitive Reasoning through Theory of Mind

Yuanyuan Mao, Xin Lin, Qin Ni et al.

AAAI 2024paperarXiv:2402.07402
6
citations

Binding Touch to Everything: Learning Unified Multimodal Tactile Representations

Fengyu Yang, Chao Feng, Ziyang Chen et al.

CVPR 2024arXiv:2401.18084
112
citations

Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models

Weiwei Cao, Jianpeng Zhang, Yingda Xia et al.

CVPR 2024arXiv:2404.04936
15
citations

C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition

Rongchang Li, Zhenhua Feng, Tianyang Xu et al.

ECCV 2024arXiv:2407.06113
12
citations

Chinese Spelling Correction as Rephrasing Language Model

Linfeng Liu, Hongqiu Wu, Hai Zhao

AAAI 2024paperarXiv:2308.08796
29
citations

Commonsense for Zero-Shot Natural Language Video Localization

Meghana Holla, Ismini Lourentzou

AAAI 2024paperarXiv:2312.17429
5
citations

Context-I2W: Mapping Images to Context-Dependent Words for Accurate Zero-Shot Composed Image Retrieval

Yuanmin Tang, Jing Yu, Keke Gai et al.

AAAI 2024paperarXiv:2309.16137
60
citations

CORECODE: A Common Sense Annotated Dialogue Dataset with Benchmark Tasks for Chinese Large Language Models

Dan Shi, Chaobin You, Jian-Tao Huang et al.

AAAI 2024paperarXiv:2312.12853
2
citations

Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification

Long-Fei Li, Peng Zhao, Zhi-Hua Zhou

AAAI 2024paperarXiv:2407.08787
4
citations

Data-Free Generalized Zero-Shot Learning

Bowen Tang, Jing Zhang, Yan Long et al.

AAAI 2024paperarXiv:2401.15657
15
citations

DeCoOp: Robust Prompt Tuning with Out-of-Distribution Detection

Zhi Zhou, Ming Yang, Jiang-Xin Shi et al.

ICML 2024arXiv:2406.00345
12
citations

Do Generalised Classifiers really work on Human Drawn Sketches?

Hmrishav Bandyopadhyay, Pinaki Nath Chowdhury, Aneeshan Sain et al.

ECCV 2024arXiv:2407.03893
2
citations

DreamComposer: Controllable 3D Object Generation via Multi-View Conditions

Yunhan Yang, Yukun Huang, Xiaoyang Wu et al.

CVPR 2024arXiv:2312.03611
19
citations

E3M: Zero-Shot Spatio-Temporal Video Grounding with Expectation-Maximization Multimodal Modulation

Peijun Bao, Zihao Shao, Wenhan Yang et al.

ECCV 2024
6
citations

ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis

Jungil Kong, Junmo Lee, Jeongmin Kim et al.

ICML 2024arXiv:2311.11745
3
citations

FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance

Jiedong Zhuang, Jiaqi Hu, Lianrui Mu et al.

ECCV 2024arXiv:2407.05578
8
citations

FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior

Zhekai Chen, Wen Wang, Zhen Yang et al.

ECCV 2024arXiv:2407.04947
12
citations

GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes

Ibrahim Ethem Hamamci, Sezgin Er, Anjany Sekuboyina et al.

ECCV 2024arXiv:2305.16037
53
citations

GroundVLP: Harnessing Zero-Shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection

Haozhan Shen, Tiancheng Zhao, Mingwei Zhu et al.

AAAI 2024paperarXiv:2312.15043
26
citations

HiFi-123: Towards High-fidelity One Image to 3D Content Generation

Wangbo Yu, Li Yuan, Yanpei Cao et al.

ECCV 2024arXiv:2310.06744
35
citations

Image Captioning with Multi-Context Synthetic Data

Feipeng Ma, Y. Zhou, Fengyun Rao et al.

AAAI 2024paperarXiv:2305.18072
18
citations

Improving Diffusion Models for Inverse Problems Using Optimal Posterior Covariance

Xinyu Peng, Ziyang Zheng, Wenrui Dai et al.

ICML 2024arXiv:2402.02149
41
citations

InstructDoc: A Dataset for Zero

Shot Generalization of Visual Document Understanding with Instructions - Ryota Tanaka, Taichi Iki, Kyosuke Nishida et al.

AAAI 2024paperarXiv:2401.13313
36
citations

Interactive Visual Task Learning for Robots

Weiwei Gu, Anant Sah, N. Gopalan

AAAI 2024paperarXiv:2312.13219
7
citations

Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval

Yucheng Suo, Fan Ma, Linchao Zhu et al.

CVPR 2024arXiv:2403.16005
49
citations

LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction

Penghui Du, Yu Wang, Yifan Sun et al.

ECCV 2024arXiv:2407.11335
16
citations

LangCell: Language-Cell Pre-training for Cell Identity Understanding

Suyuan Zhao, Jiahuan Zhang, Yushuai Wu et al.

ICML 2024arXiv:2405.06708
28
citations

Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D

Mukund Varma T, Peihao Wang, Zhiwen Fan et al.

CVPR 2024arXiv:2403.18922
13
citations

Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields

Yonggan Fu, Huaizhi Qu, Zhifan Ye et al.

ECCV 2024arXiv:2403.11131

OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model

Runyi Li, Xuhan SHENG, Weiqi Li et al.

ECCV 2024arXiv:2404.10312
11
citations

PEARL: Zero-shot Cross-task Preference Alignment and Robust Reward Learning for Robotic Manipulation

Runze Liu, Yali Du, Fengshuo Bai et al.

ICML 2024arXiv:2306.03615
9
citations

PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs

Soroush Nasiriany, Fei Xia, Wenhao Yu et al.

ICML 2024arXiv:2402.07872
188
citations

Pix2Gif: Motion-Guided Diffusion for GIF Generation

Hitesh Kandala, Jianfeng Gao, Jianwei Yang

ECCV 2024arXiv:2403.04634
5
citations

Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning

Shiming Chen, Wenjin Hou, Salman Khan et al.

CVPR 2024arXiv:2404.07713
36
citations

Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer

Yaoting Wang, Liu Weisong, Guangyao Li et al.

AAAI 2024paperarXiv:2309.07929
38
citations

Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos

Mohaiminul Islam, Tushar Nagarajan, Huiyu Wang et al.

ECCV 2024arXiv:2409.20557
10
citations

Recursive Visual Programming

Jiaxin Ge, Sanjay Subramanian, Baifeng Shi et al.

ECCV 2024arXiv:2312.02249
10
citations

Revisiting the Role of Language Priors in Vision-Language Models

Zhiqiu Lin, Xinyue Chen, Deepak Pathak et al.

ICML 2024arXiv:2306.01879
39
citations

Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation

Yuanchen Ju, Kaizhe Hu, Guowei Zhang et al.

ECCV 2024arXiv:2401.07487
84
citations

SAI3D: Segment Any Instance in 3D Scenes

Yingda Yin, Yuzheng Liu, Yang Xiao et al.

CVPR 2024arXiv:2312.11557
79
citations

SelfVC: Voice Conversion With Iterative Refinement using Self Transformations

Paarth Neekhara, Shehzeen Hussain, Rafael Valle et al.

ICML 2024arXiv:2310.09653
7
citations

Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer

Rafail Fridman, Danah Yatim, Omer Bar-Tal et al.

CVPR 2024arXiv:2311.17009
100
citations

Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation

Xinyao Li, Yuke Li, Zhekai Du et al.

CVPR 2024arXiv:2403.06946
19
citations

StyleSinger: Style Transfer for Out

of-Domain Singing Voice Synthesis

AAAI 2024paperarXiv:2312.10741
40
citations

Task Contamination: Language Models May Not Be Few-Shot Anymore

Changmao Li, Jeffrey Flanigan

AAAI 2024paperarXiv:2312.16337
132
citations

TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models

Haomiao Ni, Bernhard Egger, Suhas Lohit et al.

CVPR 2024arXiv:2404.16306
22
citations