Paper "synthetic data" Papers
8 papers found
Conference
BiXSE: Improving Dense Retrieval via Probabilistic Graded Relevance Distillation
Christos Tsirigotis, Vaibhav Adlakha, Joao Monteiro et al.
COLM 2025paperarXiv:2508.06781
1
citations
CLIPPER: Compression enables long-context synthetic data generation
Chau Minh Pham, Yapei Chang, Mohit Iyyer
COLM 2025paperarXiv:2502.14854
2
citations
D3: A Dataset for Training Code LMs to Act Diff-by-Diff
Ulyana Piterbarg, Kanishk Gandhi, Lerrel Pinto et al.
COLM 2025paper
Out-of-Distribution Detection using Synthetic Data Generation
Momin Abbas, Muneeza Azmat, Raya Horesh et al.
COLM 2025paperarXiv:2502.03323
5
citations
ReasonIR: Training Retrievers for Reasoning Tasks
Rulin Shao, Rui Qiao, Varsha Kishore et al.
COLM 2025paperarXiv:2504.20595
44
citations
Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models
Thao Nguyen, Yang Li, Olga Golovneva et al.
COLM 2025paperarXiv:2506.04689
13
citations
Style over Substance: Distilled Language Models Reason Via Stylistic Replication
Philip Lippmann, Jie Yang
COLM 2025paperarXiv:2504.01738
2
citations
The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains
Scott Geng, Hamish Ivison, Chun-Liang Li et al.
COLM 2025paperarXiv:2507.06187
8
citations