α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Licheng Yu
Licheng Yu
22
papers
1,702
total citations
papers (22)
UNITER: UNiversal Image-TExt Representation Learning
ECCV 2020
arXiv
469
citations
TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
ECCV 2020
arXiv
329
citations
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models
ECCV 2020
arXiv
139
citations
Violin: A Large-Scale Dataset for Video-and-Language Inference
CVPR 2020
arXiv
75
citations
AVID: Any-Length Video Inpainting with Diffusion Model
CVPR 2024
arXiv
69
citations
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis
CVPR 2024
arXiv
67
citations
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
CVPR 2023
arXiv
67
citations
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
CVPR 2024
arXiv
62
citations
FashionViL: Fashion-Focused Vision-and-Language Representation Learning
ECCV 2022
arXiv
60
citations
Apollo: An Exploration of Video Understanding in Large Multimodal Models
CVPR 2025
arXiv
55
citations
Learning Procedure-Aware Video Representation From Instructional Videos and Their Narrations
CVPR 2023
arXiv
48
citations
BachGAN: High-Resolution Image Synthesis From Salient Object Layout
CVPR 2020
arXiv
42
citations
Tell Me What Happened: Unifying Text-Guided Video Completion via Multimodal Masked Video Generation
CVPR 2023
arXiv
40
citations
Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis
CVPR 2024
arXiv
36
citations
Unsupervised Vision-and-Language Pre-Training via Retrieval-Based Multi-Granular Alignment
CVPR 2022
arXiv
35
citations
Connecting What To Say With Where To Look by Modeling Human Attention Traces
CVPR 2021
arXiv
31
citations
CiT: Curation in Training for Effective Vision-Language Data
ICCV 2023
arXiv
31
citations
Layout-Agnostic Scene Text Image Synthesis with Diffusion Models
CVPR 2024
arXiv
17
citations
Accelerating Multimodal Large Language Models by Searching Optimal Vision Token Reduction
CVPR 2025
arXiv
16
citations
Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs
CVPR 2025
arXiv
7
citations
ROICtrl: Boosting Instance Control for Visual Generation
CVPR 2025
arXiv
7
citations
"GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval"
ECCV 2022
0
citations