Poster "perplexity optimization" Papers
3 papers found
Conference
First Attentions Last: Better Exploiting First Attentions for Efficient Parallel Training
Gyudong Kim, Hyukju Na, Jin Kim et al.
NEURIPS 2025
DOGE: Domain Reweighting with Generalization Estimation
Simin Fan, Matteo Pagliardini, Martin Jaggi
ICML 2024arXiv:2310.15393
69
citations
InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
Boxin Wang, Wei Ping, Lawrence McAfee et al.
ICML 2024arXiv:2310.07713
70
citations