In my last article, I made the case for an AI winners-and-losers type of year - not an "everybody wins with AI" year.
Yes, AI might be lifting tech stock prices (for now), but it's not magical pixie dust that disperses the fumes of macro-economic uncertainty for customers that fast track it. As I wrote:
The stakes for enterprise AI in 2024 are already high. The shakeout at OpenAI (and the EU AI act) have added new levels of complexity, raising questions about whether open source AI is viable - and how companies will approach AI amidst new IP lawsuits.
Can customers avoid AI vendor lock in?
Given the risk profile of AI in 2024, customers might plow ahead and break things, or they might sit on the sidelines. But there's a third option: I believe most customers that get results from AI in 2024 will do so in the context of trusted vendor relationships, not through build-your-own:
Customers that move forward with AI, but in a more deliberate manner, [will likely] acquire AI solutions from trusted software vendors, who will theoretically assume a major chunk of the liability risk - and source different Large Language Models (LLMs) as needed without being locked into one.
But what about AI vendor lock in? Customers wary of open source AI uncertainties might worry that they are locked into doing AI with a handful of vendors. I see a major bright spot for AI vendor options - more customer choice than expected.
This opens up the intriguing possibility that smaller startups of domain experts can build AI into their solutions - and provide impact that is much closer to "out of the box" than training your own LLM, and without the risk mitigation issues that do-AI-yourself currently raises.
Enterprise software vendors should make their platforms available to AI partners - and, ideally, take steps like Workday has, to provide a level of validation those partners (see: Workday's announcement of their AI marketplace and certification program).
Aisera rolls out new AI bots - can industry LLMs get a better result?
The two most compelling industry AI vendors I found in 2023 were at Workday Rising - and both hail from Workday Ventures as well. But while Legion is focused on a very specific problem (hourly workers), Aisera - the subject of today's piece - has evolved from IT service bots to a much broader purview.
But it gets even more interesting: Aisera has now built out AI service bots ("Co-pilots" in their lingo) for a range of "employee experience" domains, from IT to HR to procurement to SalesOps. They have their own Large Language Models (LLMs) for those domains as well (for better and often for worse, most vendors pull from third party LLMs, due to the hurdle of developing/managing LLMs internally. Even some huge software vendors haven't built their own LLMs yet).
For customers in search of cleaner/more accurate gen AI results, industry-specific LLMs are a potentially vast improvement over the misadventures caused by bots and "co-pilots" trained on GPT models. Aisera has also incorporated different flavors of individual and group reinforcement learning, as another way to raise bot accuracy/effectiveness for each customer. Many vendors shy away from this, preferring solutions that exclude human feedback cycles from the AI loop. But Aisera's approach is worth a closer look.
Aisera has primarily focused on internal, employee-focused bots (aka Co-pilots for EX, CX, Voice Experience and Ops Experience). But that is changing. Aisera just announced DaveGPT, A Generative AI Assistant, built for "leading neo-bank Dave." The past generations of service bots were more infuriating than successful. So why a customer service bot? Aisera extols the advantages of a gen AI bot, supported by a domain-specific LLM:
Enterprise chatbots have long been plagued by poor conversational dialog capabilities. An inability to work outside of preset scripts or give and receive clarifying information in real-time resulted in a frustrating member experience and increased cost and contact volumes. DaveGPT, powered by Aisera, works to overcome these challenges by combining the power of Aisera’s conversational interface with Generative AI and domain-specific LLMs tailored for the banking and financial services industry.
One big flaw with most gen AI bots: not enough workflow automation. A bot is only as good as its data. Add to that: a bot is only as good as the automations it can trigger. Aisera is bearing down on this:
DaveGPT is equipped to answer customer inquiries, set up direct deposits, advance account management, and solve customer issues end-to-end without human assistance.
Sounds like the early results are in:
By using Generative AI models, DaveGPT powered by Aisera, has demonstrated its ability to resolve upwards of 89% of member inquiries, which helps to support member support agents’ productivity by shifting their focus to more nuanced support requests.
Aisera says this "has helped to increase member satisfaction and retention as Dave seeks to level the financial playing field."
On hallucinations, industry LLMs and reinforcement learning
The only thing missing from this press release? More details on bot accuracy, and controlling bot misbehavior and hallucinations. I've seen how bad bot behaviors can be significantly reduced with a proper enterprise architecture. I hope to learn more about Aisera's results here, during an upcoming customer call about DaveGPT.
During our conversation at Workday Rising, Aisera CEO and Co-Founder Muddu Sudhaka addressed hallucinations directly, in the context of Aisera's industry LLMs:
These are what we call domain-specific LLMs - one for HR workspaces, legal, procurement, finance, sales and marketing. Their sizes are typically 10 billion [data points]. Then we did them for vertical industries. We have one for the medical industry, the financial industry, and government.
For Workday HR, [you would use key phrases] like HR policies, company organization, onboarding, benefits claims. I'm just giving a high level, but you're talking about 20-100 billion phrases in our typical LLM. Then you have one for finance as well, and procurement. This is what we've been doing for the last five years - and that reduces hallucination.
That makes sense - but I make a distinction between lack of hallucinations and overall accuracy. It's clear how drawing from a domain-specific LLM can avoid outright hallucinations, because that LLM doesn't have the muck repository of Reddit and YouTube comments, and it probably hasn't been trained on bad poetry either.
But accuracy is a different matter. LLMs can't always be accurate, because they are probabilistic, not cognitive systems. It's a matter of picking the right use cases, and escalating to humans where needed. I pressed Sudhakar on this:
Let's say that takes your accuracy closer. You're still going to have moments where the machine isn't right. So I assume you've compensated for that by designing human-in-loop processes, especially in HR, where the stakes are quite high with things like incorrect performance evaluations, or incorrect employee data?
Sudhakar explained that an accurate response has two components:
1. The bot must understand your request.
2. The bot must have a relevant action to take, or a matching resource to share - "If I don't have an FAQ, or a knowledge article, or an action to take, what good is it, even if the bot understands your request?"
Taking this into account, Sudhakar says that today, a well-architected bot can get to a 75-80 percent level of response accuracy. If the bot can't handle an HR query accurately, "In those cases, it goes to a human. It goes to an HR business partner or HR admin."
In this situations, the Aisera bot will indicate it does not have the answer, and it will facilitate a human hand-off. Sudhakar's experience is that people like interacting with this kind of bot, precisely because it is not stiff; it's not rules-based, and it doesn't spit out generic non-answers. Life isn't deterministic. What a human says today won't be exactly the same tomorrow: "the nature of being non-deterministic makes [our bots] more human." Sudhakar says a customer would much rather hear a bot say "I'm not able to answer the question," rather than receive a boilerplate link to an irrelevant web page.
My take - co-pilots go beyond GPT
Sudhakar issued a barb towards the proliferation of co-pilots trained on bloated data:
Our Co-pilot is not GPT.
In the case of Aisera's HR Co-pilot, he describes a bot reminding you to file your benefits, register for a mandatory training, or apply for a certification. "It's actually your assistant, your butler, your concierge, reminding you of what to do."
I've heard so much hyperbole from AI vendors about "no hallucinations" - much of it from vendors who haven't even built their own LLMs, as Aisera has done. In AI, it is always harder to fix a flawed architecture with duct tape guardrails than to build it for industry from the ground up. In my view, Sudhakar's honest views on these bots' capabilities will serve Aisera well - and earn the trust of customers who want to know the pros and cons and how to roll out with that in mind.
There isn't one way to get to a better enterprise AI result. In Aisera's case, industry LLMs, combined with reinforcement learning methods and a range of customer data, from help tickets to transactional systems, are getting the job done (customer data privacy is, of course, protected in the design). Sudhakar told me he wants accurate results built into the core product. He doesn't want to have to rely on prompt engineers to shoehorn queries; he doesn't want to require big data science teams in order to use Aisera - or a customer's developers for that matter. He also doesn't want to take up the time of domain experts. Notably, he doesn't want to compensate for an LLM's shortcomings with RAG either, as many other enterprise vendors are doing. But the users can help fine tune with RLHF (reinforcement learning with human feedback). ("The only time I need your help is when you come and tell me that an output was wrong.")
Aisera's approach seems to be working - they have an impressive logo collection, including customers such as Alcon, GAP, and Vistra. Aisera isn't just putting out next-gen bots; if a customer wants to customize their own LLM or automate workflows with AI, Aisera can do that too (see their official pitch at: Buy, Build, or Bring Your Enterprise LLMs and Operationalize Your Generative AI App). One welcome difference between foundational models and enterprise software like on-premise ERP? Sudhakar says Aisera can update a base model, even after a customer has fine tuned it.
I lack the space to get into the pricing, but I found Sudhakar's overview of Aisera's pricing options, which provides a low entry point for user-based consumption, refreshing as well (free trials round out the picture). I'll get more details on this next time.
If you're wondering how Aisera has advanced so far down the path of industry LLMs, bear in mind that while gen AI mainstreamed last year, Aisera has been pushing into deep learning service bots since 2017, and that maturity clearly shows.
No surprise, Sudhakar buys into my argument that smaller players won't be forced out of "big data gen AI." Rather, they could be the true disruptors. As he wrote to me before press time:
Customers and markets always embrace startups and entrepreneurs to create new solutions - this will be no different for killer apps for generative AI. The industry wants a landscape of startups that provide great solutions that improve user experiences while also offering great ROI, not just flashy capabilities that are hard to implement.
We're about to find out.