Main content

An important development - US regulators revisit copyright for AI

George Lawton Profile picture for user George Lawton September 1, 2023
Summary:
Drilling down on the latest moves by the Copyright Office of the US Library of Congress.

copyright

Some of the biggest controversies with generative AI swirl around the nature of copyright, fair use, transparency, and authorship. As I previously reported, the training aspects of generative AI can distill the plots of stories, logic of code, and style of art in ways not considered in existing copyright regulations. Now, the Copyright Office of the US Library of Congress is shining a fresh light on these controversies with a Notice of Inquiry on Copyright and AI.

This is the next step for the Copyright Office’s AI initiative, first launched in early 2023. It builds on feedback and questions arising from four public listening sessions and two webinars. In a press release, Shira Perlmutter, register of copyrights and director of the US Copyright Office, writes:

“We launched this initiative at the beginning of the year to focus on the increasingly complex issues raised by generative AI. This notice of inquiry and the public comments we will receive represent a critical next step. We look forward to continuing to examine these issues of vital importance to the evolution of technology and the future of human creativity.”

The inquiry seeks to address the following issues:

  • Transparency: The appropriate levels of transparency and disclosure with respect to the use of copyrighted works.
  • Authorship: The legal status of AI-generated outputs.
  • Fair use: The use of copyrighted works to train AI models.
  • Likeness: The appropriate treatment of AI-generated outputs that mimic personal attributes of human artists.

It's important to note that the copyright office is not currently addressing issues related to fair contracts and terms of service that sometimes get conflated with copyright.

The full Copyright Office Request for Comments (RFC) includes sixty-seven questions across various domains such as training, transparency & record keeping, generative AI outputs, copyrightability, infringement, labeling or identification, and likeness or publicity. Some of the issues, such as likeness and digital watermarks, currently fall outside the scope of federal copyright law but hint at the possibilities for future regulation on these matters.

Various public controversies, lawsuits, and proposed regulations have erupted over each of these things already. Here are some examples of how these are turning up today:

Transparency

As it stands in the US, generative AI companies are under no obligation to disclose what data they were trained on. OpenAI and Meta have not disclosed the data used to train their most recent models. This raised concerns about copyright, privacy, and potential bias.

EU regulators are already proposing to require this kind of disclosure. One big issue is that there is no easy way to remove sensitive information, as per GDPR's right-to-be-forgotten provisions, without retraining the model. This is an expensive process since personally identifiable data is sprinkled across billions of neural network weights rather than a field in a database.   

Regarding transparency, the Copyright Office request for comments (RFC) asks:

The Office is aware that there is disagreement about whether or when the use of copyrighted works to develop datasets for training AI models (in both generative and non-generative systems) is infringing. This Notice seeks information about the collection and curation of AI datasets, how those datasets are used to train AI models, the sources of materials ingested into training, and whether permission by and/or compensation for copyright owners is or should be required when their works are included.

Fair use

Various lawsuits have sprung up from authors and content creators against generative AI companies. Joseph Saveri Law firm filed a class action lawsuit against Stability AI, Midjourney, and Deviant Art for copyright infringement. Getty Images filed a separate lawsuit against Stability AI for hoovering up and repurposing copyrighted works.

Sarah Silverman and other authors also sued OpenAI and Meta for training models on their work. The New York Times also recently sued OpenAI.

These cases have some important differences since book summaries may fall under fair use provisions like CliffsNotes or York Notes. At the same time, competitive AI generative art may be considered unfair competition for artists and platforms like Getty.

On this matter, the copyright office requests comments on:

If an output is found to be substantially similar to a copyrighted work that was part of the training dataset, and the use does not qualify as fair, how should liability be apportioned between the user whose instructions prompted the output and developers of the system and dataset?

Authorship

As it stands today, copyright protections are only granted to human authors or creators. The same goes for patents. Over the last decade, Stephen Thaler has been suing copyright registrars around the world to secure patents invented by his AI called DABUS. All have turned him down except for South Africa. He has begun a similar process for copyrighting AI art, which has similarly been rejected.

Another question is what happens when an AI helps humans create art, movies, books, or other content. The US Copyright Office first granted a registration for a graphic novel by Kristina Kashtanova but later rescinded the registration after she began posting on social media about using Midjourney to create the images. It later granted her a different registration for the text and arrangement of images but not the images themselves, arguing the result was not the product of human authorship.

Here, the RFC asks:

Although we believe the law is clear that copyright protection in the United States is limited to works of human authorship, questions remain about where and how to draw the line between human creation and AI-generated content. For example, are there circumstances where a human’s use of a generative AI system could involve sufficient control over the technology, such as through the selection of training materials and multiple iterations of instructions (“prompts”), to result in output that is human authored.

Likeness

Generative AI can imitate the likeness of human voices, images, and video to create realistic deep fakes. In April, Universal Music Group invoked copyright violation after fans created a deep fake of artists Drake and the Weekend singing “Heart on My Sleeve.” YouTube and TikTok quickly removed the new song.

Author Jane Friedman claims Amazon initially refused to remove books attributed to her because she had not trademarked her name. However, they pulled it down after a public outcry. At the other end of the spectrum, artist Grimes invited AI artists to deepfake her voice to create their own songs in exchange for a cut of the profits.

At the moment, personal attributes such as voice, likeness or style are not generally protected under federal copyright law, but other considerations come into place.  The RFC notes:

Although these personal attributes are not generally protected by copyright law, their copying may implicate varying state rights of publicity and unfair competition law, as well as have relevance to various international treaty obligations.

Contract law

Other controversies that are orthogonal to copyright but sometimes conflated relate to fair contracts and terms of service (ToS). For example, Saveri Law also filed a class action lawsuit against GitHub, Microsoft, and OpenAI for violating conditions of open-source licenses on contract and privacy claims.

The various writers' and actors’ strikes in the US are fueled in part by contract provisions relating to the use of AI in films or for repurposing their likeness or work. Zoom recently stirred up controversy when it added provisions to its ToS, allowing it to train AI using customer content. It quickly backpedaled.

My take

The current legal landscape around generative AI content is still a Wild West, which might be considered an opportunity or theft, depending on which side of the equation you are on. Some firms are quickly rushing in to hoover up data at epic scale to train better models. And at the moment, there are few incentives to disclose how this works.

The first crop of tools did a good job of highlighting the potential to deliver interesting and useful content. And they also highlighted how they might be applied to create value more broadly within the enterprise. But the various new generative AI services are also shining a light on ways that large language models, GANs, and other techniques can appropriate and reuse the fruits of human creativity with little or no compensation or credit.

It's telling that the RFC does not mention the terms “contract” or “terms of service” even once. This seems a bit shortsighted since these will play a growing role in this debate. For example, Google has recently updated its privacy policy to analyze the information people share online to train new AI models. So, what happens when content creators who get paid for site visits start seeing potential visitors siphoned off to Google Search Generative Experience summaries of their work without clicking through?

This is a difficult problem to solve at many levels. Meta cut all news links after a recent Canadian law called for payments to publishers, but both Meta and Google chose to pay up in Australia. Also, Google is likely to proceed cautiously here so as not to risk its significant share of the digital advertising market, which PwC estimates could grow to $732.6 billion by 2026.

Loading
A grey colored placeholder image