α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Siliang Tang
Siliang Tang
27
papers
1,242
total citations
papers (27)
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
ICML 2024
arXiv
306
citations
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
CVPR 2025
arXiv
135
citations
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data
CVPR 2024
arXiv
129
citations
Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
ICML 2024
arXiv
104
citations
Fine-Grained Semantically Aligned Vision-Language Pre-Training
NEURIPS 2022
arXiv
100
citations
Compositional Temporal Grounding With Structured Variational Cross-Graph Correspondence Learning
CVPR 2022
arXiv
82
citations
Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation
CVPR 2020
arXiv
79
citations
HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation
ICML 2025
arXiv
69
citations
Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World
ICCV 2023
arXiv
48
citations
Auto-Encoding Morph-Tokens for Multimodal LLM
ICML 2024
arXiv
32
citations
Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models
ICCV 2023
arXiv
32
citations
Adaptive Hierarchical Graph Reasoning With Semantic Coherence for Video-and-Language Inference
ICCV 2021
arXiv
28
citations
Learning in Imperfect Environment: Multi-Label Classification with Long-Tailed Distribution and Partial Labels
ICCV 2023
arXiv
26
citations
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
CVPR 2025
arXiv
20
citations
STEP: Enhancing Video-LLMs’ Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training
CVPR 2025
arXiv
15
citations
Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program
ICCV 2025
arXiv
10
citations
Janus-Pro-R1: Advancing Collaborative Visual Comprehension and Generation via Reinforcement Learning
NEURIPS 2025
arXiv
7
citations
Boosting Virtual Agent Learning and Reasoning: A Step-Wise, Multi-Dimensional, and Generalist Reward Model with Benchmark
ICML 2025
arXiv
6
citations
Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
ICCV 2025
arXiv
6
citations
What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark for Essential Virtual Agent Capabilities
ICML 2025
arXiv
4
citations
Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness
ICCV 2025
arXiv
4
citations
Data Shunt: Collaboration of Small and Large Models for Lower Costs and Better Performance
AAAI 2024
0
citations
Learning To Learn by Jointly Optimizing Neural Architecture and Weights
CVPR 2022
0
citations
Learning to Generate Visual Questions with Noisy Supervision
NEURIPS 2021
0
citations
Semi-Supervised Active Learning for Semi-Supervised Models: Exploit Adversarial Examples With Graph-Based Virtual Labels
ICCV 2021
0
citations
DIEM: Decomposition-Integration Enhancing Multimodal Insights
CVPR 2024
0
citations
The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation
ICCV 2025
arXiv
0
citations