"synthetic data" Papers

13 papers found

BiXSE: Improving Dense Retrieval via Probabilistic Graded Relevance Distillation

Christos Tsirigotis, Vaibhav Adlakha, Joao Monteiro et al.

COLM 2025paperarXiv:2508.06781
1
citations

CLIPPER: Compression enables long-context synthetic data generation

Chau Minh Pham, Yapei Chang, Mohit Iyyer

COLM 2025paperarXiv:2502.14854
2
citations

Constrained Posterior Sampling: Time Series Generation with Hard Constraints

Sai Shankar Narasimhan, Shubhankar Agarwal, Litu Rout et al.

NEURIPS 2025arXiv:2410.12652
2
citations

Coupling Generative Modeling and an Autoencoder with the Causal Bridge

Ruolin Meng, Ming-Yu Chung, Dhanajit Brahma et al.

NEURIPS 2025arXiv:2509.25599

D3: A Dataset for Training Code LMs to Act Diff-by-Diff

Ulyana Piterbarg, Kanishk Gandhi, Lerrel Pinto et al.

COLM 2025paper

Out-of-Distribution Detection using Synthetic Data Generation

Momin Abbas, Muneeza Azmat, Raya Horesh et al.

COLM 2025paperarXiv:2502.03323
5
citations

ReasonIR: Training Retrievers for Reasoning Tasks

Rulin Shao, Rui Qiao, Varsha Kishore et al.

COLM 2025paperarXiv:2504.20595
44
citations

Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models

Thao Nguyen, Yang Li, Olga Golovneva et al.

COLM 2025paperarXiv:2506.04689
13
citations

Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget

Vikash Sehwag, Xianghao Kong, Jingtao Li et al.

CVPR 2025arXiv:2407.15811
26
citations

Style over Substance: Distilled Language Models Reason Via Stylistic Replication

Philip Lippmann, Jie Yang

COLM 2025paperarXiv:2504.01738
2
citations

Synthetic-powered predictive inference

Meshi Bashari, Roy Maor Lotan, Yonghoon Lee et al.

NEURIPS 2025arXiv:2505.13432
4
citations

The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains

Scott Geng, Hamish Ivison, Chun-Liang Li et al.

COLM 2025paperarXiv:2507.06187
8
citations

A Tale of Tails: Model Collapse as a Change of Scaling Laws

Elvis Dohmatob, Yunzhen Feng, Pu Yang et al.

ICML 2024arXiv:2402.07043
110
citations