α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Juncheng Li
Juncheng Li
28
papers
1,496
total citations
papers (28)
Masked Autoencoders that Listen
NEURIPS 2022
arXiv
395
citations
Structure-Preserving Deraining With Residue Channel Prior Guidance
ICCV 2021
arXiv
147
citations
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
CVPR 2025
arXiv
135
citations
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data
CVPR 2024
arXiv
129
citations
Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
ICML 2024
arXiv
104
citations
Fine-Grained Semantically Aligned Vision-Language Pre-Training
NEURIPS 2022
arXiv
100
citations
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions
ICLR 2024
arXiv
90
citations
Compositional Temporal Grounding With Structured Variational Cross-Graph Correspondence Learning
CVPR 2022
arXiv
82
citations
Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation
CVPR 2020
arXiv
79
citations
Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World
ICCV 2023
arXiv
48
citations
Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models
ICCV 2023
arXiv
32
citations
Auto-Encoding Morph-Tokens for Multimodal LLM
ICML 2024
arXiv
32
citations
Adaptive Hierarchical Graph Reasoning With Semantic Coherence for Video-and-Language Inference
ICCV 2021
arXiv
28
citations
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
CVPR 2025
arXiv
20
citations
STEP: Enhancing Video-LLMs’ Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training
CVPR 2025
arXiv
15
citations
Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program
ICCV 2025
arXiv
10
citations
SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
CVPR 2025
arXiv
10
citations
IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves
ICCV 2025
arXiv
9
citations
Janus-Pro-R1: Advancing Collaborative Visual Comprehension and Generation via Reinforcement Learning
NEURIPS 2025
arXiv
7
citations
Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
ICCV 2025
arXiv
6
citations
Boosting Virtual Agent Learning and Reasoning: A Step-Wise, Multi-Dimensional, and Generalist Reward Model with Benchmark
ICML 2025
arXiv
6
citations
Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness
ICCV 2025
arXiv
4
citations
Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene
CVPR 2025
arXiv
4
citations
What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark for Essential Virtual Agent Capabilities
ICML 2025
arXiv
4
citations
Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-Based Active Learning
CVPR 2023
0
citations
The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation
ICCV 2025
arXiv
0
citations
Learning Coupled Dictionaries from Unpaired Data for Image Super-Resolution
CVPR 2024
0
citations
DIEM: Decomposition-Integration Enhancing Multimodal Insights
CVPR 2024
0
citations