Poster "dataset curation" Papers
10 papers found
Conference
DataRater: Meta-Learned Dataset Curation
Dan Andrei Calian, Greg Farquhar, Iurii Kemaev et al.
NEURIPS 2025arXiv:2505.17895
7
citations
DATE-LM: Benchmarking Data Attribution Evaluation for Large Language Models
Cathy Jiao, Yijun Pan, Emily Xiao et al.
NEURIPS 2025arXiv:2507.09424
Filter Like You Test: Data-Driven Data Filtering for CLIP Pretraining
Mikey Shechter, Yair Carmon
NEURIPS 2025arXiv:2503.08805
2
citations
HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation
Kun Liu, Qi Liu, Xinchen Liu et al.
CVPR 2025arXiv:2503.23715
14
citations
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Navve Wasserman, Noam Rotstein, Roy Ganz et al.
CVPR 2025arXiv:2404.18212
29
citations
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text
Nikhil Kandpal, Brian Lester, Colin Raffel et al.
NEURIPS 2025arXiv:2506.05209
11
citations
VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation
Wenhao Wang, Yi Yang
NEURIPS 2025arXiv:2503.01739
11
citations
Dataset Growth
Ziheng Qin, zhaopan xu, YuKun Zhou et al.
ECCV 2024arXiv:2405.18347
4
citations
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace et al.
CVPR 2024arXiv:2402.19479
351
citations
Position: Measure Dataset Diversity, Don't Just Claim It
Dora Zhao, Jerone Andrews, Orestis Papakyriakopoulos et al.
ICML 2024arXiv:2407.08188
32
citations