Poster "data curation" Papers
6 papers found
Conference
Analyzing Similarity Metrics for Data Selection for Language Model Pretraining
Dylan Sam, Ayan Chakrabarti, Afshin Rostamizadeh et al.
NEURIPS 2025arXiv:2502.02494
1
citations
Data Mixture Optimization: A Multi-fidelity Multi-scale Bayesian Framework
Thomson Yen, Andrew Siah, Haozhe Chen et al.
NEURIPS 2025arXiv:2503.21023
2
citations
Understanding the Gain from Data Filtering in Multimodal Contrastive Learning
Divyansh Pareek, Sewoong Oh, Simon Du
NEURIPS 2025arXiv:2512.14230
Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in low-data regimes
Nabeel Seedat, Nicolas Huynh, Boris van Breugel et al.
ICML 2024arXiv:2312.12112
51
citations
Data Filtering Networks
Alex Fang, Albin Madappally Jose, Amit Jain et al.
ICLR 2024arXiv:2309.17425
222
citations
Scaling Laws for Data Filtering— Data Curation cannot be Compute Agnostic
Sachin Goyal, Pratyush Maini, Zachary Lipton et al.
CVPR 2024arXiv:2404.07177
68
citations