"zero-shot generalization" Papers
87 papers found • Page 1 of 2
Conference
$\texttt{G1}$: Teaching LLMs to Reason on Graphs with Reinforcement Learning
Xiaojun Guo, Ang Li, Yifei Wang et al.
Aether: Geometric-Aware Unified World Modeling
Haoyi Zhu, Yifan Wang, Jianjun Zhou et al.
Attribute-formed Class-specific Concept Space: Endowing Language Bottleneck Model with Better Interpretability and Scalability
Jianyang Zhang, Qianli Luo, Guowu Yang et al.
Bokehlicious: Photorealistic Bokeh Rendering with Controllable Apertures
Tim Seizinger, Florin-Alexandru Vasluianu, Marcos Conde et al.
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up
Songhua Liu, Zhenxiong Tan, Xinchao Wang
Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient
Zigeng Chen, Xinyin Ma, Gongfan Fang et al.
Compositional Entailment Learning for Hyperbolic Vision-Language Models
Avik Pal, Max van Spengler, Guido D'Amely di Melendugno et al.
ConfigX: Modular Configuration for Evolutionary Algorithms via Multitask Reinforcement Learning
Hongshu Guo, Zeyuan Ma, Jiacheng Chen et al.
Cross-Embodiment Dexterous Grasping with Reinforcement Learning
Haoqi Yuan, Bohan Zhou, Yuhui Fu et al.
DEFOM-Stereo: Depth Foundation Model Based Stereo Matching
Hualie Jiang, Zhiqiang Lou, Laiyan Ding et al.
Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera
Yuliang Guo, Sparsh Garg, S. Mahdi H. Miangoleh et al.
Detect Anything 3D in the Wild
Hanxue Zhang, Haoran Jiang, Qingsong Yao et al.
Disentangling Representations through Multi-task Learning
Pantelis Vafidis, Aman Bhargava, Antonio Rangel
DrVD-Bench: Do Vision-Language Models Reason Like Human Doctors in Medical Image Diagnosis?
Tianhong Zhou, xu yin, Yingtao Zhu et al.
EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer
Yuxuan Zhang, Yirui Yuan, Yiren Song et al.
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
Zhuofan Zong, Dongzhi Jiang, Bingqi Ma et al.
EC-Diffuser: Multi-Object Manipulation via Entity-Centric Behavior Generation
Carl Qi, Dan Haramati, Tal Daniel et al.
EmbodiedSAM: Online Segment Any 3D Thing in Real Time
Xiuwei Xu, Huangxing Chen, Linqing Zhao et al.
Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games
Runyu Lu, Peng Zhang, Ruochuan Shi et al.
Exploring the Limits of Vision-Language-Action Manipulation in Cross-task Generalization
Jiaming Zhou, Ke Ye, Jiayi Liu et al.
GenM3: Generative Pretrained Multi-path Motion Model for Text Conditional Human Motion Generation
Junyu Shi, Lijiang LIU, Yong Sun et al.
IMDPrompter: Adapting SAM to Image Manipulation Detection by Cross-View Automated Prompt Learning
Quan Zhang, Yuxin Qi, Xi Tang et al.
IPFormer: Visual 3D Panoptic Scene Completion with Context-Adaptive Instance Proposals
Markus Gross, Aya Fahmy, Danit Niwattananan et al.
Jasmine: Harnessing Diffusion Prior for Self-supervised Depth Estimation
Jiyuan Wang, Chunyu Lin, cheng guan et al.
KGARevion: An AI Agent for Knowledge-Intensive Biomedical QA
Xiaorui Su, Yibo Wang, Shanghua Gao et al.
Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks
Michael Matthews, Michael Beukman, Chris Lu et al.
Know Thyself by Knowing Others: Learning Neuron Identity from Population Context
Vinam Arora, Divyansha Lachi, Ian Knight et al.
Learning to Generate Diverse Pedestrian Movements from Web Videos with Noisy Labels
Zhizheng Liu, Joe Lin, Wayne Wu et al.
Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings
Yehya Farhat, Hamza ElMokhtar Shili, Fangshuo Liao et al.
LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias
Haian Jin, Hanwen Jiang, Hao Tan et al.
Mind the Quote: Enabling Quotation-Aware Dialogue in LLMs via Plug-and-Play Modules
Yueqi Zhang, Peiwen Yuan, Yiwei Li et al.
Mint: A Simple Test-Time Adaptation of Vision-Language Models against Common Corruptions
Wenxuan Bao, Ruxi Deng, Jingrui He
OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints
Mingjie Pan, Jiyao Zhang, Tianshu Wu et al.
On the Out-Of-Distribution Generalization of Large Multimodal Models
Xingxuan Zhang, Jiansheng Li, Wenjing Chu et al.
OpenOmni: Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-time Emotional Speech Synthesis
Run Luo, Ting-En Lin, Haonan Zhang et al.
OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language Prompts
Shiting (Ginny) Xiao, Rishabh Kabra, Yuhang Li et al.
OW-OVD: Unified Open World and Open Vocabulary Object Detection
Xing Xi, Yangyang Huang, Ronghua Luo et al.
PacGDC: Label-Efficient Generalizable Depth Completion with Projection Ambiguity and Consistency
Haotian Wang, Aoran Xiao, Xiaoqin Zhang et al.
Repurposing Marigold for Zero-Shot Metric Depth Estimation via Defocus Blur Cues
Chinmay Talegaonkar, Nikhil Gandudi Suresh, Zachary Novack et al.
Re-Thinking Inverse Graphics With Large Language Models
Haiwen Feng, Michael J Black, Weiyang Liu et al.
Scalable Autoregressive Monocular Depth Estimation
Jinhong Wang, Jintai Chen, Jian liu et al.
Scale-invariant attention
Ben Anson, Xi Wang, Laurence Aitchison
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
Jensen Zhou, Hang Gao, Vikram Voleti et al.
Synthetic-to-Real Self-supervised Robust Depth Estimation via Learning with Motion and Structure Priors
Weilong Yan, Ming Li, Li Haipeng et al.
Think Thrice Before You Act: Progressive Thought Refinement in Large Language Models
Chengyu Du, Jinyi Han, Yizhou Ying et al.
Tree-Guided Diffusion Planner
Hyeonseong Jeon, Cheolhong Min, Jaesik Park
UGM2N: An Unsupervised and Generalizable Mesh Movement Network via M-Uniform Loss
Zhichao Wang, Xinhai Chen, Qinglin Wang et al.
UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation
Emmanuelle Bourigault, Amir Jamaludin, Abdullah Hamdi
UniGTE: Unified Graph–Text Encoding for Zero-Shot Generalization across Graph Tasks and Domains
Duo Wang, Yuan Zuo, Guangyue Lu et al.
Unsupervised Zero-Shot Reinforcement Learning via Dual-Value Forward-Backward Representation
Jingbo Sun, Songjun Tu, Qichao Zhang et al.