α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Tong Lu
Tong Lu
25
papers
12,212
total citations
papers (25)
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction Without Convolutions
ICCV 2021
arXiv
4,656
citations
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
CVPR 2024
arXiv
2,295
citations
BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers
ECCV 2022
arXiv
1,720
citations
InternImage: Exploring Large-Scale Vision Foundation Models With Deformable Convolutions
CVPR 2023
arXiv
994
citations
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
NEURIPS 2023
arXiv
625
citations
TAM: Temporal Adaptive Module for Video Recognition
ICCV 2021
arXiv
341
citations
DDP: Diffusion Model for Dense Visual Prediction
ICCV 2023
arXiv
205
citations
Panoptic SegFormer: Delving Deeper Into Panoptic Segmentation With Transformers
CVPR 2022
arXiv
176
citations
SeedFormer: Patch Seeds Based Point Cloud Completion with Upsample Transformer
ECCV 2022
arXiv
174
citations
Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?
CVPR 2024
arXiv
172
citations
Adaptive Graph Convolution for Point Cloud Analysis
ICCV 2021
arXiv
169
citations
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
CVPR 2024
arXiv
148
citations
FB-BEV: BEV Representation from Forward-Backward View Transformations
ICCV 2023
arXiv
126
citations
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
ICLR 2024
arXiv
118
citations
AVSegFormer: Audio-Visual Segmentation with Transformer
AAAI 2024
arXiv
82
citations
Memory-and-Anticipation Transformer for Online Action Understanding
ICCV 2023
arXiv
61
citations
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding
ICLR 2025
arXiv
43
citations
CRA-PCN: Point Cloud Completion with Intra- and Inter-level Cross-Resolution Transformers
AAAI 2024
arXiv
30
citations
Spectrum-to-Kernel Translation for Accurate Blind Image Super-Resolution
NEURIPS 2021
arXiv
26
citations
AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting
ECCV 2020
arXiv
25
citations
Docopilot: Improving Multimodal Models for Document-Level Understanding
CVPR 2025
arXiv
15
citations
EgoExoBench: A Benchmark for First- and Third-person View Video Understanding in MLLMs
NEURIPS 2025
arXiv
11
citations
Deconfound Semantic Shift and Incompleteness in Incremental Few-shot Semantic Segmentation
AAAI 2025
0
citations
MOERL: When Mixture-of-Experts Meet Reinforcement Learning for Adverse Weather Image Restoration
ICCV 2025
0
citations
RepKPU: Point Cloud Upsampling with Kernel Point Representation and Deformation
CVPR 2024
0
citations