Main content

What the US AI Safety Consortium means for enterprises

George Lawton Profile picture for user George Lawton February 14, 2024
The Biden administration has created the US AI Safety Institute Consortium to coordinate AI risk management efforts. Experts weigh in on the new opportunities, challenges, shortcomings and how it compares to efforts in Europe and China.

Joe Biden

US Secretary of Commerce Gina Raimondo recently announced the creation of the US AI Safety Institute Consortium (AISIC) on behalf of the Biden administration. The effort promises to “unite AI creators and users, academics, government and industry researchers, and civil society organizations in support of the development and deployment of safe and trustworthy Artificial Intelligence (AI).”

The consortium will be housed within the National Institute of Standards and Technology’s (NIST) US AI Safety Institute (USAISI) set up last November. High level goals include developing guidelines for red-teaming AI security testing, evaluating capabilities, risk management, safety and watermarking synthetic content. Over two hundred companies and organizations have already signed on to the new consortium. NIST said the consortium represents the largest collection of test and evaluation teams established to date and will focus on setting the foundations for a new measurement science in AI safety. 

So, how will it benefit enterprises more broadly, particularly when this may conflict with the interests of AI leaders, regulators, and citizens? Clarifying the terminology for one. Seth Batey, Data Protection Officer and Senior Managing Privacy Counsel at Fivetran says:

The consortium can benefit enterprises by giving more prescriptive guidance on responsible AI use. Dialogue around responsible AI use often includes AI buzzwords like transparency or testing for fairness, accuracy, and bias. These terms and the discussion around AI typically fall short of giving real guidelines on how to implement these best practices and how safeguards may differ depending on the use case. 

Batey observes there is some prescriptive guidance for AI, like NIST’s AI Risk Framework. But, most of the guidance has been fairly general or stops short of truly demonstrating how to implement AI at a granular level.  He argues that including the leading AI companies may provide a better chance of providing meaningful guidance that aligns with the tools companies are currently using or evaluating.

Others are not sure that such a diverse group of companies will find a way to collaborate effectively. Caroline Carruthers, CEO of Carruthers and Jackson, a data consultancy, observes:

It’s interesting to note that the US’s first attempt at doing this has been much slower than other governments. While it’s great news to see that countries like the US are waking up to a new way of thinking, and there are some huge organizations signed up to join AISIC, the concern would be whether they are actually working together. Any consortium requires compromise, but with this many sizeable organizations involved, will they be able to move at the pace they need in the short run to build more trustworthy AI? There are some admirably lofty aims being set out by the consortium, but action needs to start now.

Better science and testbeds

Some are hopeful that the new consortium will lead to better science and testbeds, particularly with NIST’s stewardship involved. NIST has played a seminal role in many important measurement, security, and safety standards. It could play a guiding role in helping more groups understand the limitations of the technology related to safety, governance, and trust. Karen Myers, director of the Artificial Intelligence Center at SRI, says: 

One exciting aspect of the NIST consortium is the focus on improving the science of evaluating AI systems, which is critically needed to achieve safety and trustworthiness. Another key focus of the consortium is on developing AI testbeds that will enable extensive and principled evaluation of technologies using carefully designed methodologies and standards. These testbeds will provide a valuable resource to a broad range of enterprises, both universities and corporations, to comprehensively assess AI systems for specific types of applications.

Abhishek Gupta, Founder and Principal Researcher, Montreal AI Ethics Institute believes the consortium will be able to take advantage of NIST’s seminal work as part of the NIST AI Risk Management Framework. He argues: 

The reason leadership by NIST here is important is because it is strongly grounded in scientific principles, focused on measurement and technical fundamentals, which is essential for the success of any such effort. Especially, since we've seen a ton of principles, guidelines, etc. come out in recent months which aren't really actionable because they're missing the critical technical and scientific elements that NIST brings together through its history and competence of its staff.

Another promising aspect is that the consortium includes industry, government, academia, and civil society. Gupta believes this will prove a strength in taking the latest in research advances and combining that with the applied experience that the industry brings. This is particularly important when it comes to the innovative advances in products and services that are starting to face the battlefield of real-world deployments.

It also has important implications for improving the connections between new tools and the development of actionable processes, approaches, techniques, frameworks, and methodologies that will shift the conversation from principles to practice. Gupta explains:

What this means in the short-run is that we'll see a greater pressure on the rest of the ecosystem to step up their game in terms of the value that they seek to provide when it comes to responsible AI efforts, e.g., through boosting the emphasis on technical and scientific grounding in their proposals. In the long run, as the effort evolves, it will lay out a roadmap for the necessary tooling and services that the ecosystem should invest in to ensure the successful application of responsible AI in practice, not just as a theoretical exercise but as practical steps that are applicable to large-scale products and services deployed in the real-world meeting the good and the bad that comes from interacting with real users, both well-intentioned and malicious.

Key challenges

Many experts also believe some important issues and risks are not included within the consortium’s current focus. Batey is concerned that the red-teaming and watermarking pieces will be challenging areas to develop guidance for that actually reduces risk without undermining the business objectives or hampering low-risk use cases.

Dominique Shelton Leipzig, Partner, Cybersecurity & Data Privacy Practice at Mayer Brown, and author of the recent book Trust: Responsible AI, Innovation, Privacy, and Data Leadership, is concerned that current efforts don’t plant the seeds for continuous testing required to detect model drift and new risks that arise in the operational context after the models are released into production. She argues:

It is important for companies licensing large language models to build their own applications to understand this movement and continuously test, monitor, and audit their models, with code running in the background every day in order to know when the models have drifted outside of the company’s risk tolerance and fix the models. Continuous testing running in the background is the only way for a company to detect when a model has drifted.

Leipzig cites a recent incident when a logistics company disabled its chatbot after a customer posted a viral video of the chatbot swearing at the customer and denigrating the company as being the “worst delivery firm in the world.”  Continuous testing, monitoring, and auditing frameworks must be improved to alert companies when models depart from their guardrails. She explains:

“t will be necessary to insert guardrails within the AI tools themselves if licensees do not want unpleasant surprises like the one described above with the chatbot spewing insults at customers. The AI tool needs guardrails, just like humans, to let them know in computer code the companies’ expectations, and the companies need monitoring tools, just as they did with humans, to ensure that the standards of the company are being effectuated.

Others are concerned about the process itself, which could be challenging with so many competing interests involved. Carruthers says:

A successful consortium requires organizations who are often competing to collaborate closely, which can be difficult to set up, but even harder to keep on track. There needs to be an auditable body checking that members are adhering to standards but also ensuring that the consequences of not doing so are clear. Is there a shared path or consensus on where we should be going, and who is responsible for overseeing that process? What happens if one country or consortium decides to do one thing and another decides to diverge from that path? As we’ve seen, AI doesn’t work this way. 

AI safety also will require developing a framework and processes for reliably auditing AI tools. The AI development lifecycle includes a mix of disparate data sources, foundation models, development tools, data engineering, data science, and operational modeling. Independent auditors will be required to vet the chain of trust spanning the different open source offerings, vendors, and data sources used across this toolchain. This is a far more complex endeavor than auditing finances or organizing a software bill of materials. It will also need to be able to adapt to the discovery of new AI-specific vulnerabilities and risks. Jennifer Kosar, Trust and Transparency Solutions Leader, PwC US, explains:

Certain goals, such as trust in AI or audits of AI systems, may require consideration of how independent assessments of specific AI systems and other relevant practices, or the performance of AI-based models, are conducted, and by whom. Furthermore, given the pace of innovation in this space, the consortium will need to ensure it can respond quickly to technological evolutions and adapt its Responsible AI guidelines accordingly, just as enterprises require robust change management protocols to evolve at an accelerated pace. Additionally, NIST’s guidance will need to be adapted into practical steps that meet the unique risks and operations present in certain spaces, such as banking and healthcare; as such, we may see industry groups and sector-specific regulators reaching to fill this role.

Comparative approaches

There is also some concern that the new consortium won’t provide the right balance between vendors, government regulators, and civil society necessary to implement and enforce appropriate standards and best practices to discourage risky behaviors and bad actors.  Carruthers observes:

The US has been slower off the mark regarding AI safety, which is unusual. Similar to the EU, UK and Chinese approaches, they once again have gone for lofty aims centered on setting standards and telling people what they can and can’t do with AI. The differences between each of these approaches are going to play out through the detail, and none of them, so far, have gone into the level of detail that is required to actually implement these standards.

Leipzig breaks down the essential building blocks all these efforts need to include for building trustworthy AI into: 

  1. Risk ranking the AI. 
  2. Ensuring high-quality, accurate data for which the company has IP, privacy, and business rights to use for high-risk AI. 
  3. Continuous testing, monitoring, and auditing every second of every minute of every day.
  4. Creating technical documentation that can be used if the model drifts to isolate the issue and return the model to the companies’ expectations. 
  5. Ensuring human involvement to review technical documentation to address model drift and return the model to accuracy when this occurs.
  6. Building fail-safe mechanisms to kill AI use cases when models cannot be returned to accuracy. 

Batey says that each region is adopting different regulatory priorities, guiding what and how they plan to enforce AI safety. For example, the US has focused on watermarking, specifically preventing fraud, election interference, and other harms related to false, AI-generated content. Additionally, unlike the EU's AI Act, the US has relied on executive action in the absence of congressional action. 

Meanwhile, China has been primarily focused on preventing AI content that it considers “fake news,” It also seems likely to be lenient on copyright issues related to AI development. 

The UK has also taken a different approach by shying away from a comprehensive AI bill. It is treating the various types of AI risk as situational and context-dependent by allowing sectoral authorities to take the lead on how AI affects their respective industry, pushing a decentralized approach where each industry knows best.  Batey contrasts this with the EU and the US, which are trying to provide risk categorization buckets that list various industries in a comprehensive fashion to ensure strong safeguards for all industries. 

But Batey sees much consensus underneath these differences:

While all of these regimes appear to be taking slightly different approaches and directions, there does seem to be some consensus as to what needs to be covered in AI legislation, regardless of the form it takes.  Additionally, each regime has considered some type of risk categorization approach, which includes a recognition that not all AI use cases, specifically for generative AI, introduce the same amount or types of risks.

These different approaches and directions may also burden enterprises operating across many regions. Kosar explains:

Many jurisdictions share the desire to promote innovation and the opportunities that Generative AI, and AI more broadly, can offer while balancing risk and avoiding harm. Other territories have made progress in defining specific expectations and prohibitions, as the US is expected to continue to do through the Executive Order and actions of the appropriate agencies. What is clear is that organizations should not wait to begin preparing for global regulatory compliance, as both direct and secondary impacts to them will depend on how their AI systems operate and who they affect.

My take

NIST has a proven track record in building some of the world’s more trusted standards. The formation of the consortium under its guidance seems like important progress and may inform best practices across governments and enterprises worldwide. 

At the same time, cultural differences and political priorities will also shape the actual implementations of AI risk management in practice across different regions. I hope they get it right since there is much at stake. 

A grey colored placeholder image