Blameless Users in a Clean Room: Defining Copyright Protection for Generative Models
Top Authors
Abstract
Are there any conditions under which a generative model’s outputs are guaranteed not to infringe the copyrights of its training data? This is the question of "provable copyright protection" first posed by Vyas, Kakade, and Barak [ICML 2023]. They definenear access-freeness (NAF)and propose it as sufficient for protection. This paper revisits the question and establishes new foundations for provable copyright protection---foundations that are firmer both technically and legally. First, we show that NAF alone does not prevent infringement. In fact, NAF models can enable verbatim copying, a blatant failure of copy protection that we dub beingtainted. Then, we introduce ourblameless copy protection frameworkfor defining meaningful guarantees, and instantiate it withclean-room copy protection. Clean-room copy protection allows a user to control their risk of copying by behaving in a way that is unlikely to copy in a counterfactual "clean-room setting." Finally, we formalize a common intuition about differential privacy and copyright by proving that DP implies clean-room copy protection when the dataset isgolden, a copyright deduplication requirement.