α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Yuxuan Wang
Yuxuan Wang
1
Affiliations
Affiliations
Peking University
22
papers
669
total citations
papers (22)
SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation
CVPR 2022
arXiv
220
citations
Efficient Neural Music Generation
NEURIPS 2023
arXiv
84
citations
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models
ICML 2024
arXiv
76
citations
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
NEURIPS 2025
arXiv
57
citations
Neural Dubber: Dubbing for Videos According to Scripts
NEURIPS 2021
arXiv
52
citations
Language Model Can Listen While Speaking
AAAI 2025
arXiv
51
citations
DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation
ICML 2025
arXiv
41
citations
PolyVoice: Language Models for Speech to Speech Translation
ICLR 2024
arXiv
29
citations
TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling
ICML 2024
arXiv
21
citations
Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation
ICCV 2025
arXiv
10
citations
OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts
CVPR 2025
arXiv
10
citations
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation
AAAI 2025
arXiv
7
citations
VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges
ICCV 2025
arXiv
4
citations
Friends-MMC: A Dataset for Multi-modal Multi-party Conversation Understanding
AAAI 2025
arXiv
4
citations
Reasoning Mamba: Hypergraph-Guided Region Relation Calculating for Weakly Supervised Affordance Grounding
CVPR 2025
2
citations
FairHuman: Boosting Hand and Face Quality in Human Image Generation with Minimum Potential Delay Fairness in Diffusion Models
ICCV 2025
arXiv
1
citations
VGMamba: Attribute-to-Location Clue Reasoning for Quantity-Agnostic 3D Visual Grounding
ICCV 2025
0
citations
"GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval"
ECCV 2022
0
citations
Empowering Convolutional Neural Nets with MetaSin Activation
NEURIPS 2023
0
citations
Sounding that Object: Interactive Object-Aware Image to Audio Generation
ICML 2025
arXiv
0
citations
Vision-Language Interactive Relation Mining for Open-Vocabulary Scene Graph Generation
ICCV 2025
0
citations
Parallel Beam Search Algorithms for Domain-Independent Dynamic Programming
AAAI 2024
0
citations