AI - open wide and say ‘ah!’ as the UK House of Lords debates open source

Chris Middleton Profile picture for user cmiddleton November 13, 2023
Summary:
Should LLMs and generative AI be more open and transparent? The mood music says yes. But what does ‘open’ really mean?


open

The Communications and Digital Committee of the UK Parliament's House of Lords has been doing sterling work on AI this month, specifically on key aspects of the Large Language Models (LLMs) and generative tools that have seized the tech initiative. 

(As the British Parliament’s second, upper chamber, its role could best be described as ‘checks, balances, and high-level debate’, in contrast to the churn of policies and personalities that characterizes the Commons.)

Last week, the Select Committee hosted a revealing discussion on international copyright and LLMs. This week it will be quizzing Microsoft and Meta on regulating the sector. But on 8 November, it turned its attention to grilling witnesses on a subtler, but no less divisive topic - open versus closed source models. 

Expert witnesses were: Irene Solaiman, Head of Global Policy at open-source ML community Hugging Face; Professor John McDermid, Professor of Software Engineering at the University of York; Dr Adriano Koshiyama, Co-CEO of AI governance and compliance firm Holistic AI; and Dr Moez Draief, Managing Director of Mozilla.ai, a start-up from the Foundation that aims to build trustworthy systems.

So, beyond fact-finding, balance, and clarity, why did the Lords feel the need to host this discussion? Chair Baroness Stowell explained:

There are concerns from open-source proponents that moves to introduce safety and testing requirements might, inadvertently or otherwise, introduce barriers to new market entrants.

In short, perhaps the UK Government’s AI Safety Summit, with its glad-handing of big vendors and personalities, might have unintended consequences. 

Unsurprisingly, Mozilla’s Dr Draief set out an evangelical view of the open-source movement and its noble place in innovation – browsers, cloud (Linux), 5G, and more, saying:

It's clearly important not to see this as ‘either or’. [However], open source enabled us to arrive at the innovation that we see, Large Language Models, through open data, open science, and open libraries. Without those, we would not even have LLMs. 

So, open source is not the problem in the AI space, it's part of the solution. But it's important to consider it as a ‘gradient of openness’ that will enable different industries and individuals to choose whether they want proprietary or open-source models to power their solutions.

Leaving aside the vexed question of whether all LLMs were, indeed, developed entirely on open data and open libraries, Baroness Fraser suggested that regulators might prefer closed-source developers. The reason? Guard rails would already be up, she claimed (a leap of faith if ever there was one).

Draeif responded:

You’re alluding to safety. Open source provides an opportunity for many people to examine those technologies, test them in variety of settings, and provide fixes to the problems that arise.  So, it’s very useful to have open source as a means of creating transparency around the technology. If we were to rely on a few engineers in certain parts of the world to define safety, I think it would be limiting in terms of opportunities to uncover problems.

Who could he mean? Then he added:

Security through secrecy and obscurity is not the way forward when it comes to understanding this technology. I would urge regulators to think carefully about the impact on opportunities for transparency, competition, and access.

There is the data that the model is trained on. But it's also the process of training the model, the model itself, its weights, the evaluation processes that are used to decide whether something has good performance. So, the sliding scale has many dimensions to it. It can go from ‘everything open’ to ‘everything closed’.

But...

But it is not quite that simple, suggested Hugging Face’s Solaiman:

[The problem is] there is no explicit definition of open source regarding Large Language Models. There are a lot of parallels that we can draw from open source software. But when we get into the components, such as training data, technical papers that share how the model was trained, there's a level of access that researchers need to be able to improve their systems. To be able to look through the data, and the code, depending on the risk and use case. 

But what's really important in openness is disclosure. We've been working hard at Hugging Face on levels of transparency, such as model cards, data sheets, or documentation to allow researchers, consumers, and regulators – in a very consumable fashion – to understand the different components that are being released within the system.

Then she added:

One of the difficult parts about releases is that the processes are not often published. So deployers have almost full control over the release method along that gradient of options, but we don't have insight into the pre-deployment considerations. […] There is no set standard on how to conduct that process.

At this point, Professor McDermid intervened, saying:

Policymakers and regulators will have no choice but to deal with both open-source and closed models. If you look at something like the Law Commission’s work on autonomous vehicles, they talk about the ‘requirement to collaborate’. 

And you’ll need that, whether it’s closed models or open ones, to deal with problems and rectify them. I think that’s probably the right way to think about it: while you have more access in open source, you need this requirement to collaborate [with either model] to deal with problems.

Did the risk of regulatory capture worry him? (The risk of regulators becoming more concerned with the sector’s interests than with the public’s.) McDermid responded:

Regulation is going to be very difficult. It's going to be a real challenge, both in terms of resources and skills. One of the things that government needs to do is help the regulators to build that skill space. It can be done, but it’s a really big role for them and they need help in doing it.

Whether government has those skills – beyond those of its new, proprietary best friends – is a moot point. 

At this juncture, Baroness Harding asked a question about vendors’ motivations – one that might raise eyebrows among veteran political observers:

How worried should we be that the arguments in favour of closed models are entirely about commercial self-interest from the owners of existing LLM technology? And how worried that the arguments in favour of open source are also entirely self-serving? Is that what's really happening rather than a policy debate?

Governance expert Dr Koshiyama responded with a clever point – one that, in this limited context, exposed the myth that open- versus closed-source is as simplistic as a battle of Jedi against Sith.

I remember one client, a big IT corporation, was trying to make the case for why cloud AI solutions should be closed source, and better than open source. And I was thinking about how they could cache the argument for that. [An example from a different area occurred to me] facial recognition software, which is widely available, open source, on GitHub. 

Academics have done so much research over the years, on the models available on GitHub. There has been so much evidence of bias with respect to race, gender, age, you name it. But those models have not been fixed. Instead, people have pursued using them, right into production. And, unfortunately, [problems] have been perpetuating since then. 

So, even though there's transparency around the risks, and there’s academic research, it seems that no one has managed to go in there and create a new open-source version that fixes those problems. 

Then the argument for closed-source alternatives was to say, well, ‘We could have closed-source facial-recognition software, but then we would use a third party to test it. And when we provide it to our customers, we would ensure that the system is safe before deployment.’ So, because they are a company, they will take liability and responsibility. But when it's open source, who knows what can happen? So that was the argument on the closed-source side. But of course, open source has its own arguments, too. [In reality] it depends on the application, on the use case that we're discussing.

Sulaiman added:

It's important to recognise that language models are still an evolving research field and researchers do need access. I'm speaking from industry – and there's always a level of scepticism that regulators have when coming from industry. But when looking at conferences, like the fairness, accountability, and transparency conference RightsCon, I'm seeing, frankly, fear from researchers that they're unable to conduct [their work], especially on the more social side of research. Even through simple APIs.

This is a distinction between openness and access. A larger language model, which will often require more compute to run, may not be accessible to researchers unless they have that compute infrastructure, or even a basic query API. Just being able to ask the language model for a response may not be enough access to conduct the research they need to improve the model, along an axis such as bias.

McDermid summarized the problem in broader terms:

“With LLMs] I think what we really need to say is, ‘What information do we actually need?’ We need to know something about the distribution of training data, whether that's relevant for the application domain. And it should be relatively free of bias. You need to be able to ask about openness: the right level of the right information. 

Personally, I don't want to look at two billion neurons of weightings. But I can help people understand what things they actually need. To extract from the models, or from the development process, to make sense of all the judgements about risk.

My take

Of course, many proprietary firms have contributed to open-source projects; indeed, Microsoft now owns GitHub. But while acknowledging this, Dr Draeif observed:

I think there's been a recent shift from openness [in the AI sector] to being more closed. And I hope that this does not become the norm, because I think we will all suffer from it.

Put another way, big money talks loudest, perhaps – including to government. 

He continued:

A country like the UK benefits a lot from open-source technology. And in the absence of big technology players in the country, it would be extremely beneficial to the UK if open source continued to thrive. With the right guard rails, and with a community that's ensuring the safety of this technology.

Hear, hear. 

Loading
A grey colored placeholder image