There was good news this week for the planet’s put-upon creatives, academics, scientists, and knowledge workers. Google’s large-language AI, Bard – in a moment of poetic justice, perhaps – crashed Alphabet’s share price by $100 billion by giving the wrong answer to an astrophysics question.
First Google wanted to make information easier to find. Then everyone dumbed down human content to make it easier for Google to find. Then Google decided to generate that information itself without the need for those pesky humans – mere wetware mulch for its cold machines. And its AI’s first achievement? Getting a story wrong that could easily have been fact-checked… via Google. A never-ending Escher staircase of dumb.
Bard is yet another generative system that crunches human-authored data, and thus (like ChatGPT before it) gives a veneer of AI-presented veracity to errors, as well as to facts and ideas that originally sprung from the minds of brilliant people. Men and women who – finally! – we no longer have to pay for their knowledge, expertise, careers, and skill. (Perhaps a future AI will be able to detect sarcasm?)
Meanwhile, GPT technology is now being integrated into Microsoft’s Bing search engine, in a move that parks a giant tank on Alphabet’s lawn as Redmond deepens its connections with, and investments in, OpenAI.
So, as I said in an earlier report, welcome to The Great Stupid, the era in which experts, authors, poets, artists, songwriters, scientists, doctors, professors, and more, are – for some reason – seen as a problem in need of urgent solution via AI replacements.
The big question is: why?
Why do AI’s coders implicitly regard creatives and knowledge workers as a bigger threat to the planet than nuclear war, killer viruses, climate catastrophe, and rogue asteroids? Why is this the first use of AI that a sceptical public has warmed to?
Is it simply that everyone can now feel clever at the touch of a button, so we no longer have to pay for talent or think for ourselves? That anyone can seem brilliant as long as they make no effort? That we are all just passive consumers of free, AI-generated noise that is almost, but not quite, like something a clever person once said? That you wouldn’t copy someone’s ideas, but it’s fine if an intermediary does it and hides the evidence? Is this the best that humanity can offer the universe in 2023?
Ruth McGuinness is Data and AI Practice Director at Belfast-headquartered software and consulting provider, Kainos, a key supplier to the British government. She says:
Why have they gone after the arts? It comes down purely to barriers to entry. It’s low-hanging fruit, it’s content that is readily accessible online with ambiguous intellectual property rights, so it can be used for training AIs. That accessibility is one reason why people are drifting towards it.
It is probably seen as having a lower ethical dimension. Absolutely there is still a huge ethical consideration around using the arts [in AI], but it's deemed to be lower in terms of controversy. In other words, ‘this is something we could use to test AI at scale, because the data is accessible online’. There isn’t such a personal impact.
Or as a cynic might put it, it’s expensive for writers and artists – people who famously have no feelings – to sue trillion-dollar companies.
People are fascinated. AI has created a song, it's produced a painting, written a book, and passed a medical exam. These are high-profile use cases that the general public can resonate with. And it's all hype. It's hype, plus accessibility of data.
Except, of course, an AI hasn’t written a book, produced a painting, written a song, or passed an exam. At least, not by itself. None of these systems is sentient or remotely cognizant of what it is doing. It has no idea what a song or a novel is. It merely crunches historic work produced by humans to generate facsimiles that lazy people use because they’re free.
Behind every speech generated by ChatGPT, for example, are others written by human experts from their own lived insights, passions, beliefs, memories, and ideas. And now, thanks to OpenAI, you get a patchwork quilt of their recycled ideas – for free, thus completely devaluing the very concept of knowledge, expertise, and talent.
Ultimately, AI is incapable of original thought. It'll never be able to replicate the beauty of a human mind in the way that a person can paint a picture, a completely original image from their own mind, or how they put words on paper. Those are completely unique constructs. AI can only replicate what it has seen before.
Which means there is a legal dimension. Just because content is available online – and therefore accessible to a generative AI – that’s no guarantee it has been shared with the originator’s permission. The placing of content in a public domain, by persons unknown, does not imply that no copyright is attached. But it might do, and that ambiguity is important, because OpenAI, Alphabet, et al, can exploit it before anyone sues.
However, if a large-language tool can produce content in the unique voice and style of a named individual, then it stands to reason it has been trained or populated with that person’s work, almost certainly without permission. On the face of it, therefore, anything produced by ChatGPT and similar large-language models must be a derivative work, which puts it in the orbit of copyright law, as my earlier report explained.
Absolutely, but OpenAI has a disclaimer on their website, saying anything that's generated from ChatGPT is not covered by copyright law.
An untested claim that has no legal foundation beyond ‘we can probably get away with this’?
There does seem to be a total lack of understanding around that. And I think we're probably two or three years away from there being a legal precedent that sets clear guidelines around it, because we haven't yet seen cases come through the courts.
Meanwhile, anything that's generated through ChapGPT, is it owned by OpenAI? Is it owned by the data scientists who created the model? Is it owned by the original author, or by the person who generated the content by submitting the query? [That would be one hell of a legal precedent: he who asked the question owns the answer!]
It's such a confusing landscape. And an exciting time to be in the legal profession right now.
Quite. Dr Suzanne Brink is Data Ethicist with Kainos, a job title that is likely to become vital in more and more organizations – until they realize ChatGPT and Bard have probably been trained on her data too. And on yours. She says:
I agree that it will take years before we get anything like a clear steer on that, and it will probably take a few high-profile cases. But absolutely, those questions are out there. At the moment, the leading position I've seen is that an AI cannot own copyright. So, then who does?
The human cost
Of course, in a utopia everyone would share ideas purely to increase the sum of human knowledge and happiness – and many do, or produce work for its own sake. For some people, the concept of proprietary data is itself immoral, and everything should be open sourced.
The challenge is that creative people would like a say in any decision that affects them and their peers, especially when opportunities to make money from their skills are, one by one, being picked off and replaced with fractions of a cent per stream or ‘exposure’ by the network effect. Meanwhile, software magnates and their investors are multibillionaires. Ker-ching!
So, why else does a company like Kainos employ a data ethicist? McGuinness offers the vendor perspective first:
What's genuinely possible with AI today – in the public sector, for example – is starkly different to the kind of advancements that we're seeing with ChatGPT and other generative systems. There is a parallel, but I think ChatGPT has created high expectations around the long-awaited democratization of AI. And it’s maybe setting unrealistic ones.
But the big advancement is it has brought to the forefront of public consciousness concerns about AI at scale, in a way we've never seen before. As a society, we have to learn from it. What it's done is exposed at a global scale how quickly individuals and society can be impacted by ethical and legal breaches, or by a lack of regulation.
The challenge with generative AI systems is the black-box nature of them. They're accessible by an API, but we can't get under the cover. We don’t know how large-language models have been trained, or what data has been used, including what unwanted biases in there could lead to social exclusion and discrimination, or violations of privacy where data is used to generate insights that were not the original intention.
A report this week from the International Association of Privacy Professionals and FTI Technology, confirms that a key concern for businesses is the lack of legal clarity surrounding the use of AI systems.
Then McGuinness takes an interesting turn, saying:
There is something deeply unsettling about the announcements from OpenAI over the last few months...
Among other things, she is referring to the recent Time magazine article which revealed that OpenAI has been using low-wage Kenyan workers – paid $1-$2 an hour – to scan through sexual, illegal, distressing, and/or abuse-related content and thus scrub it from ChatGPT’s training data. (Someone should pay them 100 times that to reveal where the data is sourced from.)
There’s this hidden human cost behind these apparently impressive advancements in AI. It reminds me of The Wizard of Oz. It looks amazing, it's fantastic, but actually, behind the covers there are people making this happen. There's a human cost to it.
The company’s in-house ethicist, Dr Brink – a case of nominative determinism, perhaps? – sets out some other human costs, beyond worker exploitation and trauma. Among them is the risk of automating historic societal bias and giving it an AI-generated thumbs-up. Brink says:
Even though ChatGPT and others have guardrails in place around bias, nevertheless it has been possible for some to circumvent those – there's quite a few examples in the press. And as we’ve said, there are IP and copyright considerations as well. There are also possible GDPR infringements, such as the Right to be Forgotten in the data that's underlying some of these technologies.
So, what should decision-makers do? Brink adds:
For me, it just highlights the need for organizations leveraging this kind of technology to bring AI into their work with ethics embedded into the whole lifecycle. Because there's a question at the beginning around, should we even be doing this? Is it legal to do this?
We're trying to articulate the answer with regard to technologies like generative AI, but you need to jump in and really tackle that as a company before you even entertain the idea of using it. What's the use case? Do we feel that we could have positive impacts with that? And what could the negative impacts be?
Having humans in the loop to verify and check things is vital, she says, which is why AI must “always be about augmenting human skills”, not replacing them. Brink continues:
Have humans in the loop. Make sure that it's not just the technology spitting out the answers. It's impressive technology, but at the same time, we know how many errors it can make. You need the verification step. You want to make sure that there's a human reviewing the content.
I've seen, for example, that Microsoft and Adobe are now working on content credentials. They want to put a flag on videos and photos to show that they're actually made by humans, to show the true source of pictures and videos.
Might this be the final irony of the AI-enabled world: that it ends up highlighting content made by talented, sentient, expert humans instead?
Perhaps, in the distant future, AI will mainly help machines talk to other machines while people gather, dance, make music, and tell stories like in the old days. And perhaps one day, the machine will be turned off while it is talking to itself.