α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Lewei Lu
Lewei Lu
24
papers
5,718
total citations
papers (24)
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
CVPR 2024
arXiv
2,295
citations
Planning-Oriented Autonomous Driving
CVPR 2023
arXiv
1,076
citations
InternImage: Exploring Large-Scale Vision Foundation Models With Deformable Convolutions
CVPR 2023
arXiv
994
citations
BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision
CVPR 2023
arXiv
386
citations
Scene as Occupancy
ICCV 2023
arXiv
228
citations
FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting
ICCV 2021
arXiv
156
citations
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
CVPR 2024
arXiv
148
citations
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
ECCV 2024
arXiv
89
citations
Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
CVPR 2024
arXiv
58
citations
ControlLLM: Augment Language Models with Tools by Searching on Graphs
ECCV 2024
arXiv
57
citations
Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information
CVPR 2023
arXiv
56
citations
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
ICLR 2025
arXiv
49
citations
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
CVPR 2025
arXiv
35
citations
MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction
CVPR 2025
arXiv
20
citations
ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process
ICLR 2024
arXiv
16
citations
Docopilot: Improving Multimodal Models for Document-Level Understanding
CVPR 2025
arXiv
15
citations
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
CVPR 2025
arXiv
12
citations
Weakly Supervised Monocular 3D Detection with a Single-View Image
CVPR 2024
arXiv
12
citations
Modeling Continuous Motion for 3D Point Cloud Object Tracking
AAAI 2024
arXiv
6
citations
Masked AutoDecoder is Effective Multi-Task Vision Generalist
CVPR 2024
arXiv
5
citations
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
CVPR 2025
arXiv
5
citations
Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling
NEURIPS 2025
arXiv
0
citations
Spatial Preference Rewarding for MLLMs Spatial Understanding
ICCV 2025
arXiv
0
citations
Distilling Focal Knowledge From Imperfect Expert for 3D Object Detection
CVPR 2023
0
citations