"synthetic data" Papers
13 papers found
Conference
BiXSE: Improving Dense Retrieval via Probabilistic Graded Relevance Distillation
Christos Tsirigotis, Vaibhav Adlakha, Joao Monteiro et al.
COLM 2025paperarXiv:2508.06781
1
citations
CLIPPER: Compression enables long-context synthetic data generation
Chau Minh Pham, Yapei Chang, Mohit Iyyer
COLM 2025paperarXiv:2502.14854
2
citations
Constrained Posterior Sampling: Time Series Generation with Hard Constraints
Sai Shankar Narasimhan, Shubhankar Agarwal, Litu Rout et al.
NEURIPS 2025arXiv:2410.12652
2
citations
Coupling Generative Modeling and an Autoencoder with the Causal Bridge
Ruolin Meng, Ming-Yu Chung, Dhanajit Brahma et al.
NEURIPS 2025arXiv:2509.25599
D3: A Dataset for Training Code LMs to Act Diff-by-Diff
Ulyana Piterbarg, Kanishk Gandhi, Lerrel Pinto et al.
COLM 2025paper
Out-of-Distribution Detection using Synthetic Data Generation
Momin Abbas, Muneeza Azmat, Raya Horesh et al.
COLM 2025paperarXiv:2502.03323
5
citations
ReasonIR: Training Retrievers for Reasoning Tasks
Rulin Shao, Rui Qiao, Varsha Kishore et al.
COLM 2025paperarXiv:2504.20595
44
citations
Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models
Thao Nguyen, Yang Li, Olga Golovneva et al.
COLM 2025paperarXiv:2506.04689
13
citations
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
Vikash Sehwag, Xianghao Kong, Jingtao Li et al.
CVPR 2025arXiv:2407.15811
26
citations
Style over Substance: Distilled Language Models Reason Via Stylistic Replication
Philip Lippmann, Jie Yang
COLM 2025paperarXiv:2504.01738
2
citations
Synthetic-powered predictive inference
Meshi Bashari, Roy Maor Lotan, Yonghoon Lee et al.
NEURIPS 2025arXiv:2505.13432
4
citations
The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains
Scott Geng, Hamish Ivison, Chun-Liang Li et al.
COLM 2025paperarXiv:2507.06187
8
citations
A Tale of Tails: Model Collapse as a Change of Scaling Laws
Elvis Dohmatob, Yunzhen Feng, Pu Yang et al.
ICML 2024arXiv:2402.07043
110
citations