α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Yi Zhu
Yi Zhu
1
Affiliations
Affiliations
University of Chinese Academy of Sciences
23
papers
1,821
total citations
papers (23)
Vision-Language Navigation With Self-Supervised Auxiliary Reasoning Tasks
CVPR 2020
arXiv
267
citations
Earthformer: Exploring Space-Time Transformers for Earth System Forecasting
NEURIPS 2022
arXiv
259
citations
VidTr: Video Transformer Without Convolutions
ICCV 2021
arXiv
220
citations
SOON: Scenario Oriented Object Navigation With Graph-Based Exploration
CVPR 2021
arXiv
168
citations
CrossCLR: Cross-Modal Contrastive Learning for Multi-Modal Video Representations
ICCV 2021
arXiv
152
citations
Towards Geospatial Foundation Models via Continual Pretraining
ICCV 2023
arXiv
117
citations
PreDiff: Precipitation Nowcasting with Latent Diffusion Models
NEURIPS 2023
arXiv
104
citations
Progressive Coordinate Transforms for Monocular 3D Object Detection
NEURIPS 2021
arXiv
90
citations
CrossNorm and SelfNorm for Generalization Under Distribution Shifts
ICCV 2021
arXiv
67
citations
ADAPT: Vision-Language Navigation With Modality-Aligned Action Prompts
CVPR 2022
arXiv
63
citations
Vision-Dialog Navigation by Exploring Cross-Modal Memory
CVPR 2020
arXiv
52
citations
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
CVPR 2025
arXiv
48
citations
Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition
NEURIPS 2023
arXiv
41
citations
Motion-Excited Sampler: Video Adversarial Attack with Sparked Prior
ECCV 2020
arXiv
37
citations
CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation
NEURIPS 2022
arXiv
36
citations
Motion-Guided Masking for Spatiotemporal Representation Learning
ICCV 2023
arXiv
29
citations
rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified Dataset
NEURIPS 2025
arXiv
25
citations
Blending Anti-Aliasing into Vision Transformer
NEURIPS 2021
arXiv
24
citations
MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation
ICCV 2023
arXiv
19
citations
CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image
CVPR 2025
arXiv
3
citations
Self-Motivated Communication Agent for Real-World Vision-Dialog Navigation
ICCV 2021
0
citations
Learning Canonical F-Correlation Projection for Compact Multiview Representation
CVPR 2022
0
citations
Domain Consensus Clustering for Universal Domain Adaptation
CVPR 2021
0
citations