α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Guilherme Penedo
Guilherme Penedo
1
Affiliations
Affiliations
Hugging Face
2
papers
62
total citations
papers (2)
FineWeb2: One Pipeline to Scale Them All — Adapting Pre-Training Data Processing to Every Language
COLM 2025
arXiv
51
citations
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text
NEURIPS 2025
arXiv
11
citations