α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Jianfeng Wang
Jianfeng Wang
28
papers
5,065
total citations
papers (28)
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
ICML 2024
arXiv
1,066
citations
Segment Everything Everywhere All at Once
NEURIPS 2023
arXiv
703
citations
End-to-End Semi-Supervised Object Detection With Soft Teacher
ICCV 2021
arXiv
592
citations
An Empirical Study of Training End-to-End Vision-and-Language Transformers
CVPR 2022
arXiv
439
citations
Generalized Decoding for Pixel, Image, and Language
CVPR 2023
arXiv
336
citations
Scaling Up Vision-Language Pre-Training for Image Captioning
CVPR 2022
arXiv
300
citations
End-to-End Object Detection With Fully Convolutional Network
CVPR 2021
arXiv
234
citations
ReCo: Region-Controlled Text-to-Image Generation
CVPR 2023
arXiv
194
citations
TAP: Text-Aware Pre-Training for Text-VQA and Text-Caption
CVPR 2021
arXiv
160
citations
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
NEURIPS 2022
arXiv
153
citations
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling
ECCV 2022
arXiv
135
citations
Injecting Semantic Concepts Into End-to-End Image Captioning
CVPR 2022
arXiv
123
citations
Compressing Visual-Linguistic Model via Knowledge Distillation
ICCV 2021
arXiv
116
citations
RSG: A Simple but Effective Module for Learning Imbalanced Datasets
CVPR 2021
arXiv
110
citations
NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis
NEURIPS 2022
arXiv
97
citations
Boosting Weakly Supervised Object Detection with Progressive Knowledge Transfer
ECCV 2020
arXiv
58
citations
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
CVPR 2024
arXiv
50
citations
Rethinking Bayesian Deep Learning Methods for Semi-Supervised Volumetric Medical Image Segmentation
CVPR 2022
arXiv
38
citations
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
ICLR 2025
arXiv
36
citations
Segment and Caption Anything
CVPR 2024
arXiv
33
citations
Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding
CVPR 2023
arXiv
28
citations
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
ICLR 2025
arXiv
19
citations
DAP: Detection-Aware Pre-Training With Weak Supervision
CVPR 2021
arXiv
16
citations
MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos
CVPR 2024
arXiv
14
citations
IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation
ECCV 2024
arXiv
11
citations
LiVOS: Light Video Object Segmentation with Gated Linear Matching
CVPR 2025
arXiv
4
citations
"A Simple Approach and Benchmark for 21,000-Category Object Detection"
ECCV 2022
0
citations
Label Distribution Learning on Auxiliary Label Space Graphs for Facial Expression Recognition
CVPR 2020
0
citations