α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Luca Soldaini
Luca Soldaini
1
Affiliations
Affiliations
Allen Institute for AI
9
papers
841
total citations
papers (9)
Tulu 3: Pushing Frontiers in Open Language Model Post-Training
COLM 2025
arXiv
494
citations
What's In My Big Data?
ICLR 2024
arXiv
126
citations
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
CVPR 2025
arXiv
111
citations
Organize the Web: Constructing Domains Enhances Pre-Training Data Curation
ICML 2025
arXiv
53
citations
Establishing Task Scaling Laws via Compute-Efficient Model Ladders
COLM 2025
arXiv
22
citations
DataDecide: How to Predict Best Pretraining Data with Small Experiments
ICML 2025
arXiv
18
citations
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text
NEURIPS 2025
arXiv
11
citations
RouterRetriever: Routing over a Mixture of Expert Embedding Models
AAAI 2025
arXiv
6
citations
Teaching Models to Understand (but not Generate) High-risk Data
COLM 2025
arXiv
0
citations