Poster "sequence length reduction" Papers
2 papers found
Conference
Beyond Next Token Prediction: Patch-Level Training for Large Language Models
Chenze Shao, Fandong Meng, Jie Zhou
ICLR 2025arXiv:2407.12665
5
citations
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Julie Kallini, Shikhar Murty, Christopher Manning et al.
ICLR 2025arXiv:2410.20771
16
citations