α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Zhe Chen
Zhe Chen
28
papers
4,875
total citations
papers (28)
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
CVPR 2024
arXiv
2,295
citations
InternImage: Exploring Large-Scale Vision Foundation Models With Deformable Convolutions
CVPR 2023
arXiv
994
citations
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
NEURIPS 2023
arXiv
625
citations
DDP: Diffusion Model for Dense Visual Prediction
ICCV 2023
arXiv
205
citations
Contrastive Boundary Learning for Point Cloud Segmentation
CVPR 2022
arXiv
142
citations
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
ICLR 2024
arXiv
118
citations
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
ECCV 2024
arXiv
89
citations
AVSegFormer: Audio-Visual Segmentation with Transformer
AAAI 2024
arXiv
82
citations
ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area
AAAI 2025
arXiv
58
citations
CLAMP: Prompt-Based Contrastive Learning for Connecting Language and Animal Pose
CVPR 2023
arXiv
40
citations
Recurrent Glimpse-Based Decoder for Detection With Transformer
CVPR 2022
arXiv
39
citations
Invertible Neural BRDF for Object Inverse Rendering
ECCV 2020
arXiv
30
citations
Pose-Disentangled Contrastive Learning for Self-Supervised Facial Representation
CVPR 2023
arXiv
28
citations
SimDistill: Simulated Multi-Modal Distillation for BEV 3D Object Detection
AAAI 2024
arXiv
25
citations
All Points Matter: Entropy-Regularized Distribution Alignment for Weakly-supervised 3D Segmentation
NEURIPS 2023
arXiv
21
citations
Traffic Flow Optimisation for Lifelong Multi-Agent Path Finding
AAAI 2024
arXiv
18
citations
Docopilot: Improving Multimodal Models for Document-Level Understanding
CVPR 2025
arXiv
15
citations
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
CVPR 2025
arXiv
12
citations
OCHID-Fi: Occlusion-Robust Hand Pose Estimation in 3D via RF-Vision
ICCV 2023
arXiv
8
citations
Incomplete Modality Disentangled Representation for Ophthalmic Disease Grading and Diagnosis
AAAI 2025
arXiv
8
citations
Online Guidance Graph Optimization for Lifelong Multi-Agent Path Finding
AAAI 2025
arXiv
8
citations
Structural Information Guided Multimodal Pre-training for Vehicle-Centric Perception
AAAI 2024
arXiv
7
citations
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
CVPR 2025
arXiv
5
citations
SHeaP: Self-supervised Head Geometry Predictor Learned via 2D Gaussians
ICCV 2025
arXiv
3
citations
ReactGPT: Understanding of Chemical Reactions via In-Context Tuning
AAAI 2025
0
citations
Concurrent Planning and Execution in Lifelong Multi-Agent Path Finding with Delay Probabilities
AAAI 2025
0
citations
RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis
NEURIPS 2025
arXiv
0
citations
Toward Modality Gap: Vision Prototype Learning for Weakly-supervised Semantic Segmentation with CLIP
AAAI 2025
arXiv
0
citations