Main content

Responsible AI isn't just about ethics, it's about accuracy - a deeper look at Workday's evolving AI architecture

Jon Reed Profile picture for user jreed May 22, 2024
Summary:
Responsible AI is often tied to ethics - but shouldn't we also tie it to AI accuracy? Large Language Models have weaknesses there - but can that be mitigated? At Workday's Innovation Summit, I went under the hood with Workday's AI leaders.

Carl Eschenbach of Workday at the Workday Innovation Summit
(CEO Carl Eschenbach at Workday Summit)

As the spring events roll on, I press on with my AI 'deep dive' vendor profiles

We need customer validation and gut checks. We need architectural specifics on how gen AI accuracy is improved - and customer data protected. Oh, and if we make assurances about responsible AI, we need transparency there too. What products were canned or altered? What areas of AI are off limits?

I brought an axe to grind about "responsible AI" to Workday's Innovation Summit, but Workday was ready for me. Even at the opening reception, Kathy Pham, Workday VP, Artificial Intelligence and Machine Learning, was up for fielding my (over)heated input on the extent of vendor responsibility for customer education on AI use cases.

I believe vendors need to take deeper responsibility not just for building ethical AI, but for providing guidance for customers on the ways to make the tools work, and the pitfalls to avoid. Even a responsibly-made AI tool is a powerful tool, open to proper use - or misuse. Therefore, "responsible AI" should have customer education as a top line priority. Soon, I had enough context from Workday execs and customers to publish Workday Innovation Summit - what AI product risks are unacceptable? Workday explains, and customers react

Responsible AI - enterprises may gloss over ethics, but they can't ignore accuracy

But the debate doesn't end there. What happens if customers don't take the "responsible" part of AI seriously? During my Workday Innovation Summit video with Constellation's Holger Mueller, he raised that exact issue. Mueller isn't sure the AI ethical talk in our industry is going to hold up in the longer term. 

He has a point. Recently, I talked to a consulting director who told me only 1/3 of their enterprise customers are serious about getting AI ethics right. The other 2/3 want to plow ahead, in pursuit of productivity gains and headcount efficiency. That mentality is reflected in this news piece, e.g. "Company leaders seem more focused instead on allocating resources to quickly develop AI in a way that boosts productivity." 

But as I see it, "responsible AI" is more than just ethics. As I told Mueller:

I think it's going to be important in the future to divide responsible AI up into ethical stuff, and into stuff that pertains more to output accuracy and actually getting a result. Because part of the responsible AI conversation is the right architecture to get more accurate results, in the proper design. That part of the conversation is not going to go away.

Smart companies will push for accurate results as a way to build AI trust. Others may be scared into it by headlines of embarrassing AI fails. Either way, without some level of accuracy, you don't have user/consumer trust. And without trust, you don't have adoption.

On AI accuracy, and the changing role of LLMs at Workday

Workday still uses external LLMs, but I've seen a big change at Workday, towards using the right model for the task at hand, whether it's a Large Language Model, or a medium or smaller model suitable for that process. This reduced dependency on external LLMs is promising for several reasons. If you can reduce the usage of external LLMs, you are reducing compute costs, employing the most relevant model, and further protecting/limiting any data flow outside your own tools - including customer data - something Workday tracks closely. Shane Luke, VP Product and Engineering, Head of AI & Machine Learning at Workday, explained to the analyst audience why small scale ML models were historically limited: 

With smaller scale Machine Learning models - you might call these the traditional Machine Learning days, which is kind of funny to call it that, because it was at most a couple years ago. But now, the smaller scale models, they tend to not generalize very well.

 LLMs generalize really well. But they come along with other challenges that we have to deal with: challenges in the data they were trained on, challenges with how they perform, certainly cost efficiency, and for sure, safety challenges - so many challenges to deal with when you're building a platform approach. 

How Workday mitigates the downside of smaller models

Luke says Workday has addressed some of the historical downsides of smaller models. How? One key is using a very different type of training data than the big LLMs: 

With our models, though, we do have a little bit of an unfair advantage, in that we have this quality data set that's very different than some of the datasets that are used on these large scale LLMs for training. LLMs provide amazing capabilities. It's stunning to see how well some of these models do, and how general they are. But they are trained on Internet scale data that has things like toxic content, and disinformation included in it. So there's a cost to that. With our datasets, we don't have that. All of that is engineering to build trust with customers.

Improving gen AI output can be done in a number of ways. But I've seen one commonality: it requires a well-thought technical architecture that puts LLMs in a context, rather than using them standalone. This requires more than a foundation model. A few specifics from Workday's approach: A/B testing of model types, using those tests to optimize and fine tune the models, and engaging customers "as early as possible" in testing cycles. One key for Workday? Using an adapter that is a set of weights: essentially it is a subset of the total for the model, but one that is tuned for a particular customer. In these cases, the model receives an inference request from a particular tenant. If I've lost you there, Luke explains why that matters:

We have one of these for each tenant that the model has been served. So in this way, a single model can serve many tenants, without data ever mixing - retaining that same trust with the customer.

Perhaps most important: Luke says this had led to breakthroughs in generalization capabilities from smaller models. 

We built recommender systems to have just a handful of components, that can take in data from almost anywhere and work with it - any prompt, any arbitrary data type. It could be workers, job profiles, learning content. It can automatically detect work relationships between data types, and then generalize across many different areas of the product.

My take - "responsible AI" requires collective tech literacy

This content gets more technical than a line of business leader might typically go, but part of earning AI trust is grappling with that technical conversation. At some events this spring, customers told me they were disappointed not to be taken under the hood, into the guts of the vendor's AI architecture. I don't see how we can claim to be responsible AI practitioners without upping our tech literacy. How that models arrive at their output is a big piece of the puzzle.

What's interesting about using smaller models in this way is that you might not necessarily need RAG. RAG (Retrieval Augmented Generation) is currently one of the most popular ways of getting a more accurate result from an LLM, by adding current/custom information, in order to generate a more relevant response. But if the model itself is domain-specific, RAG - which I view as an industrial-strength bandaid rather than a cognitive AI breakthrough - may not be needed. 

However, Luke did tell me that Workday has knowledge graphs in play in some of these scenarios. Knowledge graphs, in conjunction with gen AI models (and sometimes with RAG as well) are a promising approach, as graphs can indicate relationships between data that the model might not otherwise pick up on. Whether this takes us to what some call "causal AI", where causal inference is integrated with today's most popular deep learning approaches, remains to be seen - but it will certainly help models "understand" our workplace context and relationships.

Workday is pursuing deeper explainability of AI results, embedded inside user screens/processes. Some vendors seem to think that if the AI gets the job done, the heavy lifting of "how did the AI arrive at this answer?" isn't a priority. Maybe it's not a priority for consumers, but for enterprises, improving the AI audit trail is important to trust - and could prove essential for compliance.

Brian Sommer's post on the Workday Innovation Summit gives readers another view on Workday's AI pursuits. Though Sommer and I agree on many aspects of enterprise AI, we differ a bit here. I believe Workday has gone further with responsible AI credibility than most vendors; I've already explained why

The same goes with pricing. Though I agree with Sommer that embedding AI in software should not increase licensing costs, Workday has been one of the vendors out in front with a pricing message customers want to hear: that they won't be charged for core processes where their data is a key part of the value delivered. Vendors monetizing AI thanks to the customer data they've trained on seems deeply ironic. All vendors should take the AI throw-down challenge. Show customers something they've never seen before, something that changes their business in some fundamental way. Charging for that is absolutely fair game. Monetizing co-pilots is dodo birding - that business model will be extinct before we know it. 

Where will those transformative scenarios come from? That's Sommer's biggest beef - for AI and beyond. And there, Sommer and I are in full agreement. There is something bigger in play here, but we haven't seen much sign of it on keynote stages this spring. Right now, customers probably want to start with constrained AI in areas such as job description creation, support assistants, enterprise search or anomaly detection. But there is anticipation of something more, a breakthrough that we haven't seen in the wild yet. And no, I'm not counting "agents" and OpenAI's already-controversial voice assistant. Along those lines, it will be interesting to see what Workday's AI marketplace partners come up with - another thread to pick up at Workday Rising later this year. 

Loading
A grey colored placeholder image