Poster "zero-shot generalization" Papers
72 papers found • Page 1 of 2
Conference
$\texttt{G1}$: Teaching LLMs to Reason on Graphs with Reinforcement Learning
Xiaojun Guo, Ang Li, Yifei Wang et al.
Aether: Geometric-Aware Unified World Modeling
Haoyi Zhu, Yifan Wang, Jianjun Zhou et al.
Attribute-formed Class-specific Concept Space: Endowing Language Bottleneck Model with Better Interpretability and Scalability
Jianyang Zhang, Qianli Luo, Guowu Yang et al.
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up
Songhua Liu, Zhenxiong Tan, Xinchao Wang
Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient
Zigeng Chen, Xinyin Ma, Gongfan Fang et al.
Compositional Entailment Learning for Hyperbolic Vision-Language Models
Avik Pal, Max van Spengler, Guido D'Amely di Melendugno et al.
Cross-Embodiment Dexterous Grasping with Reinforcement Learning
Haoqi Yuan, Bohan Zhou, Yuhui Fu et al.
DEFOM-Stereo: Depth Foundation Model Based Stereo Matching
Hualie Jiang, Zhiqiang Lou, Laiyan Ding et al.
Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera
Yuliang Guo, Sparsh Garg, S. Mahdi H. Miangoleh et al.
Detect Anything 3D in the Wild
Hanxue Zhang, Haoran Jiang, Qingsong Yao et al.
Disentangling Representations through Multi-task Learning
Pantelis Vafidis, Aman Bhargava, Antonio Rangel
DrVD-Bench: Do Vision-Language Models Reason Like Human Doctors in Medical Image Diagnosis?
Tianhong Zhou, xu yin, Yingtao Zhu et al.
EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer
Yuxuan Zhang, Yirui Yuan, Yiren Song et al.
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
Zhuofan Zong, Dongzhi Jiang, Bingqi Ma et al.
EmbodiedSAM: Online Segment Any 3D Thing in Real Time
Xiuwei Xu, Huangxing Chen, Linqing Zhao et al.
Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games
Runyu Lu, Peng Zhang, Ruochuan Shi et al.
Exploring the Limits of Vision-Language-Action Manipulation in Cross-task Generalization
Jiaming Zhou, Ke Ye, Jiayi Liu et al.
GenM3: Generative Pretrained Multi-path Motion Model for Text Conditional Human Motion Generation
Junyu Shi, Lijiang LIU, Yong Sun et al.
IMDPrompter: Adapting SAM to Image Manipulation Detection by Cross-View Automated Prompt Learning
Quan Zhang, Yuxin Qi, Xi Tang et al.
IPFormer: Visual 3D Panoptic Scene Completion with Context-Adaptive Instance Proposals
Markus Gross, Aya Fahmy, Danit Niwattananan et al.
Jasmine: Harnessing Diffusion Prior for Self-supervised Depth Estimation
Jiyuan Wang, Chunyu Lin, cheng guan et al.
KGARevion: An AI Agent for Knowledge-Intensive Biomedical QA
Xiaorui Su, Yibo Wang, Shanghua Gao et al.
Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks
Michael Matthews, Michael Beukman, Chris Lu et al.
Learning to Generate Diverse Pedestrian Movements from Web Videos with Noisy Labels
Zhizheng Liu, Joe Lin, Wayne Wu et al.
Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings
Yehya Farhat, Hamza ElMokhtar Shili, Fangshuo Liao et al.
LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias
Haian Jin, Hanwen Jiang, Hao Tan et al.
Mind the Quote: Enabling Quotation-Aware Dialogue in LLMs via Plug-and-Play Modules
Yueqi Zhang, Peiwen Yuan, Yiwei Li et al.
Mint: A Simple Test-Time Adaptation of Vision-Language Models against Common Corruptions
Wenxuan Bao, Ruxi Deng, Jingrui He
On the Out-Of-Distribution Generalization of Large Multimodal Models
Xingxuan Zhang, Jiansheng Li, Wenjing Chu et al.
OpenOmni: Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-time Emotional Speech Synthesis
Run Luo, Ting-En Lin, Haonan Zhang et al.
OW-OVD: Unified Open World and Open Vocabulary Object Detection
Xing Xi, Yangyang Huang, Ronghua Luo et al.
PacGDC: Label-Efficient Generalizable Depth Completion with Projection Ambiguity and Consistency
Haotian Wang, Aoran Xiao, Xiaoqin Zhang et al.
Re-Thinking Inverse Graphics With Large Language Models
Haiwen Feng, Michael J Black, Weiyang Liu et al.
Scalable Autoregressive Monocular Depth Estimation
Jinhong Wang, Jintai Chen, Jian liu et al.
Scale-invariant attention
Ben Anson, Xi Wang, Laurence Aitchison
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
Jensen Zhou, Hang Gao, Vikram Voleti et al.
Synthetic-to-Real Self-supervised Robust Depth Estimation via Learning with Motion and Structure Priors
Weilong Yan, Ming Li, Li Haipeng et al.
Think Thrice Before You Act: Progressive Thought Refinement in Large Language Models
Chengyu Du, Jinyi Han, Yizhou Ying et al.
Tree-Guided Diffusion Planner
Hyeonseong Jeon, Cheolhong Min, Jaesik Park
UGM2N: An Unsupervised and Generalizable Mesh Movement Network via M-Uniform Loss
Zhichao Wang, Xinhai Chen, Qinglin Wang et al.
UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation
Emmanuelle Bourigault, Amir Jamaludin, Abdullah Hamdi
UniGTE: Unified Graph–Text Encoding for Zero-Shot Generalization across Graph Tasks and Domains
Duo Wang, Yuan Zuo, Guangyue Lu et al.
Unsupervised Zero-Shot Reinforcement Learning via Dual-Value Forward-Backward Representation
Jingbo Sun, Songjun Tu, Qichao Zhang et al.
vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation
Bastian Wittmann, Yannick Wattenberg, Tamaz Amiranashvili et al.
Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers
Omer Sahin Tas, Royden Wagner
ZEBRA: Towards Zero-Shot Cross-Subject Generalization for Universal Brain Visual Decoding
Haonan Wang, Jingyu Lu, Hongrui Li et al.
Zero-shot Inexact CAD Model Alignment from a Single Image
Pattaramanee Arsomngern, Sasikarn Khwanmuang, Matthias Nießner et al.
Zero-Shot Monocular Scene Flow Estimation in the Wild
Yiqing Liang, Abhishek Badki, Hang Su et al.
BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models
Ye-Bin Moon, Nam Hyeon-Woo, Wonseok Choi et al.
Bridging Environments and Language with Rendering Functions and Vision-Language Models
Théo Cachet, Christopher Dance, Olivier Sigaud