While the world debates big-picture concepts of AI risk, such as bias or lack of transparency, more detailed aspects of the technology are sometimes overlooked. One is validation – the need to validate AI models for safe adoption in different industries.
Users need to know that they can trust those systems, which is a more complex challenge than it first appears. In turn, this is inextricably linked with other business imperatives, such as regulation.
Take Financial Services, one of the most tightly regulated sectors since the 2008-09 financial crisis. Long before AI arrived in a viable form, huge decisions rested on whether quantitative models could be validated and trusted, so that analysts could interpret data sets with confidence and understand how markets were moving.
Enter the dragon of AI
Enter AI, and those same principles apply. AI models need validating too, so that banks, investment houses, stockbrokers, and other advisors can be sure that their analytical tools are providing trustworthy insights. And that the system works and is not going to fall over. Indeed, validation is arguably more important than ever in an AI-driven world, as more and more processes become automated – and biases risk doing so too.
All this takes place within the discipline of risk assessment, which is the stock-in-trade of Mumbai, India-headquartered CRISIL. Founded in 1987, the global risk solutions and research company (and India’s first credit ratings agency) has been part of Standard & Poor’s (S&P Global) since 2005. Two-thirds owned by it, in fact.
According to CRISIL, the company has validated over 20,000 quantitative data models since 2015. But this year alone, it has also validated more than 100 AI models so far, including in areas such as financial crime, fraud, and financial strategy, for clients including global banks, asset managers, and derivatives traders.
What’s inside the black box?
That’s good news. But as AI's influence grows – along with its dominance over internal processes – the ‘black box’ nature of some solutions could become a problem, says CRISIL. This is because it may become harder to find out how and why an automated decision was made, or a result generated. In the high-octane world of risk assessment – and it is high octane, given the multibillion-dollar stakes in play – that would be bad news.
Anshuman Prasad is CRISIL’s Global Head of Risk Analytics and Markets Transformation – the man responsible for managing the application of new technologies, such as AI and machine learning, in these contexts. Is he concerned about the headlong rush to adopt them – in Financial Services and elsewhere? He tells me:
There’s been a lot of interest, and there are definite use cases, but this was prior to generative AI, ChatGPT, and Large Language Models. In those cases, adoption hasn't been that quick. Generally, Financial Services has actually lagged behind other industries.
We generally adopt technology that has been proven or tested outside of the sector. And the people who have taken that forward have generally been recruited from outside Financial Services, from people like Google or other large technology players. They've moved into the industry, bringing that technology with them.
Then there are the regulatory challenges, of course:
Yes, there are stringent regulations that the banks are worried about. So, the use cases where they've adopted AI have, until now, been in places that auditors have scored as low risk. Areas like customer service, where they've used things like text analytics and simplistic AI. But now generative AI is coming into the picture.
The generative challenge emerges
Prasad explains how this is impacting on CRISIL’s work in Financial Services:
We're doing some cutting-edge work on how to build out a validation framework for generative AI models. There are well-established procedures for traditional AI, but when it comes to generative AI or LLMs, how you validate that is more of a challenge, and how you mitigate those risks. We are collaborating with our banking clients to help them understand this.
Meanwhile, fraud detection has long used machine learning for anti-money-laundering, and so on. But the way it has worked has generally been in generating the signals, if you like, which are then investigated by a human researcher. It flags suspicious transactions, which then have to be investigated.
But there have already been issues where systems have missed suspicious transactions, or generated false positives. Or generated so many alerts that it becomes difficult for a human researcher to actually investigate them.
So, it seems that the rise of generative AI, and the surge towards enterprise adoption, may be a cause for concern. The lack of AI skills within specific business use cases is certainly an area that has been flagged in recent research, as is the rise of automated solutions. And not just at the enterprise user’s end, but also the attacker’s.
For example, research from identity verification specialist Onfido this week finds that while nearly 70% of 1,500 businesses surveyed in the US, UK, and Italy recognize the threat of AI being used in fraud, just 27% are prioritizing its use in prevention. Nearly one in three expect generative AI fraud to become a national problem, with US and UK leaders also concerned about data privacy and consent infringements.
But a particular concern for CRISIL is the rise of algorithmic trading models, such as systematic trading, or automated arbitrage, speculation, market-making, and inter-market spreading.Prasad explains:
This is an area that needs more scrutiny, because algorithmic trading models are often criticised for sometimes being outside the normal model-risk framework. There is an established model risk framework, but sometimes an algorithmic trading model will fall outside of it, because algorithms can be classified as either a model, or as a non-model, in that it's a procedure or set of calculations.
In general, CRISIL is concerned that some form of end-to-end validation framework for AI is becoming necessary, to both instil appropriate levels of control, and to create trust. This is partly because there are fears that AI may automate entrenched human biases, and even give them a veneer of machine neutrality.
The quantity of AI is not strained
So, how is CRISIL working with its clients in these areas? He explains:
When it comes to the AI frameworks for validation, there is first the modelling part, which is also the quantitative part. So, let's touch on bias in that context. There are ways to mitigate the lack of large samples of data on which to train the model. For example, there is over-sampling. And there are quantitative techniques where you could ensure that models are not biased towards or against race, gender, location, and so on.
But part of the problem is that modelling is only one element of the equation, and all AI models should be vigorously stress-tested for different scenarios.Often the other part of the challenge is that it is just the vendor’s models that are being used. This is where the bank or financial institution has acquired the model from a vendor, and there are proprietary elements in it. So, that too may operate as a kind of black box.
Buying in AI – and buying in problems
The implication is that Financial Services may not only be buying in a vendor’s technology expertise and implementation skills – often bringing those into the institution itself – but also proprietary models that contain opaque elements. In a highly regulated environment, that may cause problems. He continues:
What also concerns us as an industry is the governance aspects. Things like access control. Imagine an instance that we have to validate, where it's about who can access what kind of information. If you're talking about a global bank, for example, which is using an AI chatbot in its recruitment or HR management processes, then which policy is relevant to which individual in which location?
Plus, there is clearly a cost involved in assessing these models and in a validation review. And that cost needs to be assessed in terms of the proportionality – including the proportionality of the risk.And, do you have appropriate policies in place, and the right ethical considerations about whether you should you be using AI at all? Training and awareness are essential. But there are gaps, which need to be addressed.
Tackling the simple stuff too
Not all of the challenge is highly technical, Prasad adds:
There is the application assessment itself, which may include cybersecurity concerns. For example, even something as simple as the load on the system. Might a chatbot crash due to the number of users logging in? So, it's a full, comprehensive assessment that you need to look at.
But then he returns to the problem of the ‘black box’ for Financial Services – one of the world’s most heavily, and punitively, regulated markets (for good reason, as survivors of the 2008-09 financial crisis will attest). Prasad says:
There are two aspects to the ‘black box problem’. One is simply that the modules are built by a third party, so the client doesn’t have full access to the code. But then there’s the dark ‘explainability’ problem. There are machine learning models that are inherently not explainable, and inherently unexplainable neural networks.
But there are risk-mitigation steps that can give greater confidence in using these models, and that's where a lot of our research, and academic research, is focused. You incrementally put the black box through different use cases, and you assess the output and test.
But it can take a lot of imagination to create the use cases and scenarios that you put the model through. But that's what we do. We ‘candidate’ a number of those use cases. We look at the performance of the model very rigorously.
And if you put the model into the model risk framework, you would have a second, independent, equally qualified quantitative person whose job is to detect what could go wrong with the model. But even that is not a fail-safe solution, because then you need to look at all these other aspects…
Who regulates the regulators?
Finally, do the financial regulators themselves have the expertise to look at these areas? Or is that another instance where organisations such as CRISIL come play? Not just helping Financial Services providers, but also their overseers? Prasad says:
I haven't thought about that question. But, you know, from a regulatory standpoint, I would assume that they have the required level of competence.
CRISIL makes the point that, when it comes to financial models, model risk management (MRM) frameworks can be “tweaked and adapted for checking core AI algorithms”. These include US Federal regulation SR 11-7, which dates from 2011, or the UK’s Prudential Regulation Authority’s SS 1/23 this year, which can be “adapted further for AI”.
A statement from the company says:
The driving factors behind preparing an end-to-end validation framework for AI models are the need for appropriate control over them and for creating trust in terms of bias, robustness, and fairness.
In this light, CRISIL has routinely been validating/checking AI models for large, complex financial institutions for some time, especially in areas such as financial crime prevention and retail operations.
Valid insights from a provider of the same. But insights that reveal the reality of enterprise adoption, especially in heavily regulated sectors. You can’t just plug in your black box and expect it to run your business. You need to explore AI’s impact in a huge number of ways. And you need expert help – sometimes to point out the simple stuff.