Generative AI - authors and artists declare war on AI vendors worldwide
- Summary:
- The fightback against ChatGPT and others starts here as online rebellions, US class actions, and an alliance of nearly a million professionals take the fight to AI’s creators – companies who have been careless at best.
Sci-fi movie The Creator opens this week, depicting a war for the future between humans, robots, and AI. A well-worn concept that dates back to the coining of the word ‘robot’ in the 1920 play, R.U.R.
But there are signs that the battle is beginning for real – in the US law courts; in strikes; in Europe – where nearly a million professionals are rebelling; in the UK; and worldwide on social networks.
Below are just some of the growing number of actions.
Talk about artificial intelligence, and copyright and intellectual property are never far from the conversation. That has certainly been the case this week on X, the platform formerly known as Twitter.
There, a rebellion is brewing among authors who have discovered that full copies of their copyrighted works are among 183,000 texts on the Books3 database, which has been used to train generative AI systems.
Authors can search for their work in the database here: https://full-stack-search-prod.vercel.app.
Tens of thousands of pirated texts were among data scraped from the internet by AI companies, creating a critical legal distinction between public-domain work, and works that were merely placed in a public domain, the internet, by persons unknown.
Scraping copyrighted data from the internet for commercial purposes is illegal, but it can be allowed for academic research. That copyrighted data of any kind – including images – is used to train commercial software so it can generate similar works surely closes the ‘research’ loophole.
Authors speak out
On X, Irvine Welsh, author of Trainspotting, Glue, Marabou Stork Nightmares, and other epochal tales, wrote:
Idea for sci fi novel: writer involuntarily mentors robots of the future to train them in a language and culture that won’t exist in that future – to produce stuff the dwindling number of humans won’t read, and the infinite number of robots won’t need to.
Chocolat author Joanne Harris said:
This is only one website of several, using pirated books to train AI. 18 of my books are on this database, and have already been used to train AI, without my consent.
Sathnam Sanghera, author of The Boy with the Top Knot, wrote:
Millions of hours of authors' work being exploited by big tech with zero payment. Funny how writers keep getting shafted. RAGE.
Crime writer Val McDermid was among the famous names on X urging others to take action, while young adults (YA) fiction author Celine Kiernan spoke for many creative professionals:
I wrote these books based on a life of experience, the death of my dad, the many wars that gnaw our world, the history of my country & family. They came directly from my heart. What are they to a machine but words in sequence?
An excellent point, to which she added:
What has the world let itself in for? In the creative fields, AI can never be anything but a warped photograph of a stranger.
Powerful words. And a view I shared in a report earlier this year, which described ‘prompt engineers’ punching the button on AI’s ‘global photocopier’.
As I noted in that and other pieces, it is almost as if AI companies see artists, authors, academics, designers, journalists, designers, musicians, and filmmakers as the world’s most urgent problem – above climate change, pandemics, rogue asteroids, and nuclear catastrophe. If only we could deprive creatives of an income, the world might be saved!
Earlier this month, I reported that the UK’s Publishers Association is also carrying the fight – in this case, to the government, which has omitted copyright altogether from its discussions on AI regulation. An oversight that needs urgent remedy, as the government appears to favor asking vendors what to do.
Addressing a Westminster eForum on AI regulation, Caroline Cummins, the Association’s Head of Policy and Public Affairs, said:
The first [problem] is copyright infringement on a massive scale, when text and data that is subject to copyright is used in the training of AI without consent or compensation.
There's a lack of transparency around training data and AI, which makes it incredibly difficult for creators or rights holders to even see how their work has been used. And there are many examples of human creators – authors, musicians, artists – seeing their work undermined, or in some cases replicated, by AI models.
The US joins the war
That’s certainly the view of The Author’s Guild, America’s oldest writers’ organization. Last week, it launched a class action against ChatGPT maker OpenAI. (This follows in the wake of similar actions by visual artists against other vendors earlier this year.)
The suit alleges “flagrant and harmful infringements of plaintiffs’ registered copyrights”. More, it called ChatGPT a “massive commercial enterprise” that is reliant upon “systematic theft on a mass scale.” Ouch.
George RR Martin, John Grisham, Jodi Picoult, David Baldacci, Sylvia Day, Jonathan Franzen, and Elin Hilderbrand are among the 17 high-profile authors putting their names to the lawsuit – powerful names in Hollywood among them.
On that point, the US film and TV community – and by extension, filmmakers everywhere – have been impacted by AI too. Deep fakes and memes abound online, suggesting a future in which actors’ likenesses will become valuable commercial properties (not to mention targets of crime). But owned by whom?
The strike by the Writers Guild of America (WGA) – which ended this week – was largely over the potential use of AI in scripts.
Writers were afraid that studios would use generative systems to write or rewrite stories, effectively replacing the real-world writers’ room with a database full of hidden, uncredited people. Like the authors who are suing OpenAI, in fact.
Under the terms of the forward-looking WGA agreement, AI cannot be used for those purposes, and neither can generated texts be considered as source material.
This is broadly in line with a US judge’s ruling this month that AI-generated content cannot be copyrighted; only changes made to it by a human creator. A ruling that will – and should – have far-reaching consequences.
The WGA also mandates full disclosure: writers are free to use AI if they wish, but studios cannot order them to use such a tool, or specify which product. Studios must also disclose any AI-generated elements given to a writer.
Again, this is broadly in line with international moves to force the disclosure of AI’s use in creative endeavors.
Meanwhile in Europe…
The European Union has also been leading the fight, via the Draft AI Act and other measures. However, an extraordinary alliance of organizations believes that legislators and regulators should go much further.
Those organizations are:
- CEATL (European Council of Literary Translators’ Associations)
- ECSA (European Composer and Songwriter Alliance)
- IFJ (International Federation of Journalists)
- EFJ (European Federation of Journalists)
- EGAIR (European Guild for Artificial Intelligence Regulation)
- EWC (European Writers’ Council)
- FERA (Federation of European Screen Directors)
- FIA (International Federation of Actors)
- FIM (International Federation of Musicians)
- FSE (Federation of Screenwriters in Europe)
- IAO (International Artist Organisation)
- UNI (an alliance of 140 unions in media, entertainment, and the arts)
- UVA (United Voice Artists)
Together, they represent close to a million creative professionals in Europe and elsewhere. Yesterday (26 September), those professionals shared an open letter to EU policymakers.
It said:
We all share a common concern as generative AI rapidly spreads in a legal environment which is poorly enforced and lacks adequate safeguards regarding the use of our members' works and personal data for AI training purposes.
Equally problematic are the numerous unauthorized, abusive, and deceptively transformative uses of our members' protected works and personal data by AI-powered technologies.”
It continued:
We must reiterate our position and insist on the absolute need for a human-centric approach to regulating generative AI.
This approach should recognize, secure, and enforce the right of our members to control the use of their artistic creations during the machine-learning process.
To make sure it protects human artistry and creativity, it must be built upon principles of informed consent, transparency, fair remuneration, and contractual practices.
But isn’t AI itself a creative tool – one already used legitimately by artists of every kind?
The letter acknowledged the “extraordinary technological advancement with immense potential to enhance various aspects of our lives, including in our sectors”, but added:
However, it is crucial to recognize that alongside these benefits, there exists a darker aspect to this technology.
Generative AI is trained on large sets of data and huge amounts of protected contents scraped and copied from the internet. It is programmed to deliver outputs that closely mimic and have the ability to compete with human creation. This technology poses several risks to our creative communities.”
So, what are those risks? The letter explained:
Firstly, the protected works, voices, and images of our members are often used without their knowledge, consent, and remuneration to generate content.
[But] there is also a broader societal risk, as people may be led to believe that the content they encounter – whether in text, audio, or visuals – is a genuine and truthful human creation, when it is the mere result of AI generation or manipulation.
This deception can have far-reaching implications for the spread of misinformation and the erosion of trust in the authenticity of digital content.”
The financial context
This fightback was inevitable, and it really is for the future.
That’s because copyright and IP underpin countless successful industries, from publishing, music, film, and other creative sectors, through to the proprietary interests of Big Pharma and Big Tech, whose vendors have been stockpiling patents for decades.
Many of those industries see AI, potentially, as an existential threat to the core of their businesses. Indeed, the likes of OpenAI could, unchecked, become economic black holes sucking in all forms of creative and knowledge-based activity. That needs urgent guardrails and limitations.
There are also security implications. As I reported last month, the fact that most generative AI is deployed by individuals, not enterprises – employees using the likes of ChatGPT and Stable Diffusion as ‘shadow IT’, unsanctioned by managers – is creating another type of IP nightmare.
Source code is the most common form of privileged data being pasted into cloud tools by, in effect, rogue individuals – employees who don’t know any better. By doing so, they are divulging source code and other private or proprietary data to AI vendors, who may use it to train their systems. Can we trust those vendors to be ethical?
So, what are the economics behind this fight?
In the UK alone, government figures show that the creative industries brought in £109 billion ($132 billion) to the economy in 2021 – nearly six percent of total economic output, not far behind Financial Services’ eight percent contribution.
In the US, the figures are extraordinary. There, arts and culture contributed an estimated $1 trillion to the economy in 2021 – more than Construction ($945 billion) and Transportation ($688 billion).
However, the US IT sector brought in nearly $2 trillion – not far off the value of the entire British economy. But is that a good enough reason to sacrifice other industries to the IT sector?
To disembowel them on the altar of growth?
My take
diginomica believes that all generative AI systems should be regarded as derivative work generators, as they need source training data from which to generate new texts or images. If that data has been scraped illegally, or scraped unknowingly from pirate sources, then that can only be a breach of copyright law.
Indeed, saying “I used a derivative work engine to make this content!” doesn’t sound as cool as saying “I used an AI!” Yet it is the truth. So, my advice is, if you can’t say “I used a derivative work engine” with pride, then don’t use generative AI!
On the face of it, it seems unlikely that AI companies were unaware of the extent of what some authors are calling personal identity theft – the scraping of, in many cases, their lives, emotions, and experiences, as well as their imaginations.
The lawsuits were inevitable, then. But the big question is not so much who will win, but whether AI companies – in some cases trillion-dollar corporations, and in others backed by them – knew this would happen.
If they did then, surely, they were well aware they were breaking the law. Or perhaps just wanted to test its limits. Either way, ethics played no part in their strategy.
In related news this week, Reuters reports that a jury must decide the outcome of a lawsuit by Thomson Reuters, which accuses Ross Intelligence of – ironically – copying content from its legal research platform, Westlaw, to train a competing AI platform. As such, it will be one of the first cases of alleged data-scraping by an AI vendor to go to court. Meta and Stability AI are among other vendors battling separate allegations of copyright infringement.
Updated 5pm UK time September 27th, with related news from Reuters.