Bounding the Excess Risk for Linear Models Trained on Marginal-Preserving, Differentially-Private, Synthetic Data

3citations

arXiv:2402.04375 PDF

citations

#1646

in ICML 2024

of 2635 papers

Top Authors

Data Points

Top Authors

Yvonne Zhou Mingyu Liang Ivan Brugere Danial Dervovic Antigoni Polychroniadou Min Wu Dana Dachman-Soled

Topics

differentially-private synthetic data excess risk bounds linear models marginal preservation empirical risk lipschitz loss functions privacy-preserving machine learning

Abstract

The growing use of machine learning (ML) has raised concerns that an ML model may reveal private information about an individual who has contributed to the training dataset. To prevent leakage of sensitive data, we consider using differentially- private (DP), synthetic training data instead of real training data to train an ML model. A key desirable property of synthetic data is its ability to preserve the low-order marginals of the original distribution. Our main contribution comprises novel upper and lower bounds on the excess empirical risk of linear models trained on such synthetic data, for continuous and Lipschitz loss functions. We perform extensive experimentation alongside our theoretical results.

Citation History

Jan 28, 2026

Feb 13, 2026

3+3

Feb 13, 2026