Main content

How AI impacts the publishing sector, and what needs to be done to protect that industry

Chris Middleton Profile picture for user cmiddleton September 15, 2023
Publishing has spent over 30 years grappling with technology upheavals, and now AI is here – for both good and ill. What are the key issues?

An image of someone reading a newspaper with the headline ‘the world is changing’
(Image by Gerd Altmann from Pixabay)

Few industries have been as affected by new technology as publishing has over the past 35 years. And now AI is here to change things yet again. But is it a threat or an opportunity?

First a quick rundown of the many technology changes that have impacted publishers of every kind in recent decades, so we can see where AI fits in. Context is everything!

At the turn of the 90s, desktop publishing convinced everyone they were a writer or designer – and perhaps they were, in that flatter, more democratic world. Today, AI gives people effortless creativity at the touch of a button – or rather generated derivative works.

Then the Web presented a new world of opportunity and engagement, but also huge challenges for publishers in monetizing their own content. Soon, even some respected brands were chasing clicks for cash, in some cases losing sight of who or what their mastheads once represented. 

That said, the Web offers unparalleled opportunities for gathering data about who consumes our content, with all the personalization, on-demand services, and efficiencies that follow – particularly in areas such as academic, educational, and short-run print with hybrid digital elements. Today, AI may help.

In the midst of all this, search engines sought to put information at everyone’s fingertips, and in many ways succeeded. But this also created a tiny pinhole into a widescreen landscape of data, because everyone stopped scrolling past page one. On the back of this, Google parent Alphabet grew into a $1.7 trillion company that sits across the entire publishing sector. A gatekeeper of sorts – not to mention a syphoner of publishing’s revenues.

So, content providers then invested millions in SEO and soon employed more algorithm-gamers than writers and editors. As a result, more and more of that industry drifted towards machine-readable marcomms that made it onto Google page one, and away from content for intelligent humans. In many ways, AI now makes SEO irrelevant (thank God).

Social media and smartphones accelerated this process, until most people spent a few seconds at best on the average webpage (even on the world’s most popular sites today, it is less than a minute). At this point, information had become valued by the speed at which it moved – and its ability to communicate instantly – rather than its accuracy. Today, AI can summarize content, so you don’t even have to visit another webpage.

Thus, humans became a species that surfed on the crests of unseen fathoms of data, in search of the next meme. Reading a headline, but not the story; clicking Like without critical thinking; swiping past anything that didn’t entertain in an instant; consuming 10 seconds of a song or 20 seconds of a video, until most content became a meaningless grey goo in which only the most extreme, unusual, or provocative thing/person held readers’ attention.

None of that is great for publishing. And in such an environment, the most divisive or extreme individuals sometimes thrived, because there’s dollars in clicks, eyeballs, loyalty, and follows – on social platforms that are so badly designed that swarms of bots and fake accounts can push any malign influencers deeper into their own echo chambers. 

Thus, hate became a viable business model for an influential minority. Yet at the same time originals, eccentrics, and other personalities could thrive just as quickly. Today, study after study has shown that AI may provide the impetus for a flood of misinformation and fake content.

But what about publishing that is not about celebrity or notoriety, but about advancing verified human data?

Fast forward to today, and people routinely claim that all human knowledge is now online, as if the Web is like some vast, networked Library of Alexandria, some repository of all wisdom. Myth tells us that the ancient library burned, but in fact it fell into disuse over centuries as intellectuals were purged from the city. 

But in the digital world, we are certainly on a burning platform, as our digital pasts are rapidly erased by formats, websites, communities, and servers vanishing or falling into disuse. Information disappears from the Web all the time, and increasingly it is being topped up by misinformation and garbage – often with the express intention of decoupling us from trust and certainty. About… pretty much everything.

Where are we now? 

In 2023, AI is here to tell us that the very concept of intellectual property is also now disintegrating. It is breaking apart into that endless stream of bits unless we protect the interests of the millions of humans (and their publishers) who earn a living from their own ideas, work, expertise, skill, and/or talent. 

Of course, anyone can choose to gift their ideas to the world and thus increase the sum of human knowledge and happiness. They always have, and always will, via Creative Commons, Open Source, and public-domain content; a wonderful thing. But what about the millions who need to earn a living from their own work, to assert their right to say, “I made this”? 

Put another way, why should OpenAI, Stability AI, and the rest, be allowed to scrape copyrighted work with impunity in a quest to become the world’s self-assigned filters, knowledge explainers and, above all, content generators – often from the source data of millions of uncredited humans?

Caroline Cummins is Head of Policy and Public Affairs for the Publishers Association, the influential UK body that represents over 180 companies – the kind that produce digital and print books, research journals, and educational resources. In general, trusted data, therefore.

Addressing a Westminster eForum on AI regulation and market development, she opened in an upbeat tone, saying:

Publishing is an innovative and growing industry, worth £7 billion [$8.7 billion] and driving more than £4 billion [$4.99 billion] in exports. It also underpins many other parts of the UK’s world-class creative industries, including global hits in TV, film, and theatre. 

The conversation on AI is obviously very important right now, but it's been part of publishers’ work for years. Publishers have always innovated with technology to enhance human creativity. And they have now embraced AI in their work to benefit their readers, authors, and businesses. Whether that's driving efficiencies in operations, enhancing marketing, or freeing up researchers by summarizing vast areas of scholarship.

She singled out academic publisher Elsevier whose generative AI product, Scopus, draws on peer-reviewed content from over 27,000 journals. She explained:

This enables researchers to engage with research in a conversational manner and, crucially, because it's based on highly curated, verified content, it is much less likely to produce hallucinations, biases, and inaccuracies [or as Marc Benioff called them this week, “lies”]

So, in that context, we're really excited about the Prime Minister's commitment to make the UK a world leader in safe and responsible AI.

But then Cummins’ presentation took a darker and more combative turn. She said:

It is vital that government upholds two things [in its plans for safe and responsible AI]. First, that the humans who create that incredible art, literature, music, and knowledge for publishers are authors, and the lifeblood of our industry. And second, our gold-standard intellectual property regime, which gives creators the ability to create, and rights holders the incentive to invest. 

Creativity and knowledge are incredible goods in their own right, and can underpin the development of high-quality, safe, and secure AI. But in that context, it's vital that the government's AI strategy addresses some really pressing issues.

So, what might they be? Cummins did not hold back:

The first is copyright infringement on a massive scale, when text and data that is subject to copyright is used in the training of AI without consent or compensation. 

There's a lack of transparency around training data and AI, which makes it incredibly difficult for creators or rights holders to even see how their work has been used. And there are many examples of human creators – authors, musicians, artists – seeing their work undermined, or in some cases replicated, by AI models.

This is of deep concern to publishers and to the wider creative industries.

Yet despite these issues, intellectual property and AI’s impact on creators were not even considered as part of the government's white paper on AI regulation, she said. So, what’s the solution? She added: 

To unleash safe and reliable AI, we are calling on the government to ensure that all AI developers adhere to UK law and international laws, including on IP. That those laws remain in compliance with the UK’s obligations in international treaties. And that regulators are given responsibility in the regulatory framework for compliance.

Where to draw the line 

Cummins then went further, firing a warning shot across the bows of companies such as OpenAI (without mentioning them by name):

We want to see AI companies being fully transparent about the content that they're ingesting, to seek permission before using content that is subject to copyright, and to pay for licences – including retrospectively [ouch!].

We want to see an explicit recognition of intellectual property and the rights of creators within the government's response to the white paper consultation.

Wise words and a clear statement of intent. However, they raise important legal questions. For example, is ingesting copyrighted material for training purposes already a restricted act under the Copyright Designs and Patents Act (1988)? 

And second, is human authorship a prerequisite for copyright protection – something that the US has insisted must be the case? (The US says you can’t copyright something an AI has generated, only whatever changes a human has made to the generated work.)

Cerys Wyn Davies is Partner at law firm Pinsent Masons. She said:

In terms of copyright infringement, first of all there is protection of the data itself, as to whether the data ingested is entitled to copyright protection as an original work in the first place. That's our first port of call to work out. 

Assuming it is, we then have what is described as a text and data-mining exception, which means that people can use text and data that is taken from the internet, for example, provided it is for non-commercial purposes. So, for a long time, researchers have used material that's available online, and that's been permissible. 

But we then got to a position where the Intellectual Property Office put out to consultation a lot of IP issues. One was to broaden the text and data mining exception to cover commercial use, which would have effectively given a green light to make taking text and data from the internet something that was largely permissible.

One of the challenges here is establishing who put copyrighted content into a public domain (the internet) and whether they had permission to do so. It stands to reason that giving the green light to commercial exploitation of legally obtained copyrighted data from the internet would make it harder, and perhaps impossible, to pursue illegal sharers.

The Publishers Association’s Cummins responded:

While the government did consider a possible exception for text and data mining for commercial purposes, that was not bought in. So, at the moment, you are only not infringing copyright if you are doing it for non-commercial research purposes. 

However, a lot of recent analyses show that datasets which are understood to have trained Large Language Models do contain copyrighted data, including books – or huge rafts of books that are our members’ copyright, but have still been used in training LLMs. But there is no copyright exception for that.

My take

The sense that we are on the cusp of legal changes or challenges is hard to avoid. 

Might the government throw copyright to the wind in an effort to seize the perceived economic growth advantages of AI? And if so, what might that do to the UK’s creative industries that, emphatically, already bring in billions in revenue and employ an estimated seven percent of the workforce?

The big question must be: to whom is the government listening? To those creative industries, or to the US Big Techs that are, as we explored in my previous report, members of influential think tanks giving keynotes at policy events? 

And the subtext must surely be: why was copyright not even considered in the government’s AI consultation?

A grey colored placeholder image