"text-to-speech synthesis" Papers
9 papers found
Conference
DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors
Keon Lee, Dong Won Kim, Jaehyeon Kim et al.
ICLR 2025arXiv:2406.11427
28
citations
ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering
Yakun Song, Zhuo Chen, Xiaofei Wang et al.
AAAI 2025paperarXiv:2401.07333
66
citations
HALL-E: Hierarchical Neural Codec Language Model for Minute-Long Zero-Shot Text-to-Speech Synthesis
Yuto Nishimura, Takumi Hirose, Masanari Ohi et al.
ICLR 2025arXiv:2410.04380
5
citations
Improved Sampling Algorithms for Lévy-Itô Diffusion Models
Vadim Popov, Assel Yermekova, Tasnima Sadekova et al.
ICLR 2025
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
Yuancheng Wang, Haoyue Zhan, Liwei Liu et al.
ICLR 2025arXiv:2409.00750
161
citations
MoonCast: High-Quality Zero-Shot Podcast Generation
Zeqian Ju, Dongchao Yang, Shen Kai et al.
NEURIPS 2025oralarXiv:2503.14345
19
citations
T2V2: A Unified Non-Autoregressive Model for Speech Recognition and Synthesis via Multitask Learning
Nabarun Goswami, Hanqin Wang, Tatsuya Harada
ICLR 2025oral
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation
Alexander Liu, Sang-gil Lee, Chao-Han Huck Yang et al.
ICLR 2025arXiv:2503.00733
4
citations
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding
Chenpeng Du, Yiwei Guo, Feiyu Shen et al.
AAAI 2024paperarXiv:2306.07547
59
citations