Poster "decoder-only architectures" Papers
3 papers found
Conference
Making Text Embedders Few-Shot Learners
Chaofan Li, Minghao Qin, Shitao Xiao et al.
ICLR 2025arXiv:2409.15700
89
citations
Nesterov Method for Asynchronous Pipeline Parallel Optimization
Thalaiyasingam Ajanthan, Sameera Ramasinghe, Yan Zuo et al.
ICML 2025arXiv:2505.01099
2
citations
PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs
Oskar van der Wal, Pietro Lesci, Max Müller-Eberstein et al.
ICLR 2025arXiv:2503.09543
16
citations