Main content

Enterprise hits and misses - enterprise AI addresses hallucinations versus accuracy, and return on trasformation gets a rethink

Jon Reed Profile picture for user jreed January 8, 2024
Summary:
This week - enterprise AI needs trust, which requires accuracy - can generative AI deliver? Are hallucinations a permanent feature? Meanwhile, return on transformation needs a rethink, as we all sharpen our BS filters for 2024.

loser-and-winner

Lead story - from hallucinations to accuracy: enterprise AI use cases and obstacles in 2024

Those who are looking for precision in our grasp of enterprise AI are right to hone in on accuracy.

Generative AI systems, by definition, are probabilistic systems not capable of 100 percent accuracy; in many instances, the accuracy is significantly lower.

But for those who want to achieve AI project success in 2024, there is some good news:

  • Accuracy needs vary by use case. A good process design can also mitigate AI accuracy issues with human-in-loop principles.
  • Generative AI results can be much more accurate - and customer-specific - with data architectures that infuse customer data, or limit output to particular types of results.
  • Some progress is being made in mitigating the worst of the "hallucination" aspects, though if you expand the definition of hallucination to mean "didn't understand my query" or "the output was off the mark," we should not expect miracles here. Instead, we should benchmark new AI processes against the imperfections of the human processes it is replacing/augmenting.

New diginomica pieces back up this point. Gary filed an instructive use case, Top UK law firm Weightmans - ‘AI gives us 98% document search and analysis accuracy’. One key insight: Weightmans' use of document automation and AI with Litera's software has gradually expanded in scope and sophistication, as wins were notched. How did Weightmans' start? As Gary explains, they were compelled to give AI-driven document automation a go based on a highly-pressurized client deadline He quotes Weightmans' CIO Stuart Whittle:

To be honest, the client was very skeptical that a piece of technology could replace human intervention here. But given the lower price point and timescale, we got the go-ahead to try.

Over time, the scope of AI use expanded. Gary writes:

The product can now be taught much more abstract concepts - for example, not just finding clauses relevant to a particular clause, but now ‘knowing’ what the jurisdiction and relevant law actually is, in either the US or UK.

And about that accuracy question: for some of the AI processes Gary notes in the article, the AI's accuracy rate is 98 percent (humans also review some of this output material). Accuracy isn't just about the result; it's also a non-negotiable hurdle to achieve trust in an AI process, given the legal context. Gary says "Whittle has such confidence, as audits consistently show an accuracy rate via this process of 98%." More from Gary:

Whittle jokes:

'I would venture to suggest that if we’d had people working long hours on this and getting tired, we wouldn't always hit 98% accuracy doing it manually.'

As for hallucinations - that's beyond the scope of resolution anytime soon. But George added another view in How SRI research could de-hallucinate AI:

There are a few things that stand out for me. First, most of the existing research has focused on whether AI hallucinates or not. However, there can be vast differences in the rate of hallucination depending on whether you are seeking to retrieve facts, explain concepts, or draw a connection between related data.

George points out that multi-modal AI, across enterprise applications and data types, will make addressing hallucinations more important - and more difficult. I've heard some talk from generative AI enthusiasts that "hallucinations are a feature, not a bug." No offense, but that's ridiculous. While I agree there are times where a certain hallucinatory quality is actually beneficial for creative/brainstorming exercises, if this were truly a feature, the user should be able to turn it off and on. No chance of that anytime soon - but on we go.

Diginomica picks - my top stories on diginomica this week

Vendor analysis, diginomica style. Here's my top choices from our vendor coverage:

  • A peek at Workday's roadmap for Flex Teams to assemble and manage cross-functional project teams - Phil digs into to Workday's collaboration moves: "It's interesting to see all the dots gradually being connected as technology opens up the old functional hierarchies and job roles — so that instead of recruiting people to jobs, enterprises can help them match their various skill sets to roles and projects that cut across traditional boundaries."
  • How BT Group performed “open-heart surgery of financial estate” with SAP - Madeline filed an instructive S/4HANA use case, with notable benefits: "With the new finance platform in place, the business has seen four key benefits: user-friendly systems, so staff know exactly how to get things done; the visibility gives them a greater efficiency; reduction in errors and improved data quality; and better insights thanks to the integrated solutions."

Jon's grab bag - Chris provides an early view of UK's 2024 AI dilemmas in Is the UK Government planning a voluntary national AI code of conduct in 2024? Brian, meanwhile, seems to have some leftover vinegar from 2023 for HR leaders: What Santa didn't leave under the HR Christmas tree this year. Finally, I revisited the issue of fake news in the enterprise, but with a generative AI twist - in Does the enterprise have a fake news problem - and will generative AI make it worse?

I think what we're all after, really, is an enterprise context - one that helps us absorb data and apply it.  We want a context that is flexible enough to shift quickly, one that can wade through noisy news cycles - and one that balances a hefty dose of skepticism with curiosity for what proper/bold innovation can accomplish.

Does generative AI dramatically change this? I don't think so. If we form discerning networks of smart colleagues and hone those BS filters, I like our chances.

Looks like we'll need to polish those filters frequently in 2024...

Best of the enterprise web

Waiter suggesting a bottle of wine to a customer

My top six

  • Hackers discover way to access Google accounts without a password - these types of non-password hacks are not necessarily new, but this does seem to take it to a new level: "The researchers who first uncovered the threat said it “underscores the complexity and stealth” of modern cyber attack."
  • New-business building: Six cybersecurity and digital beliefs that can create risk - McKinsey weighs in on risk mitigation: "The basics are not difficult to implement, but they do require experience and expertise."
  • Generative AI Has a Visual Plagiarism Problem - The latest in a series of training data concerns around IP, and the legality/ethics of gen AI training: "When an end user generates something with a LLM, can the user feel comfortable that they are not infringing on copyright? Is there any way for a user who wishes not to infringe to be assured that they are not?"
  • Big Idea: Return on Transformation Investments (RTI) - Constellation's Ray Wang issued an important post on how to quantify business transformations, which have a tendency to elude short-term measurements: "The long view is often ignored in short-term project execution. Even though business transformation projects have longer time frames, most business leaders and technology executives work from yearly budgets but live quarter to quarter. With an average tenure of 2.8 years for transformation leaders, the incentives are not in alignment with long-term benefits. In fact, most ROI calculations fail to consider the longer-term impact analysis."
  • Trying to Push Content Above the Noise - Lora Cecere hits on the enterprise fake news/BS filter problem, but from the supply chain perspective: "In companies, there are many discussions focused on driving improvement through planning, yet, when I take the client case studies on the websites of leading planning technologies and map the intersection of operating margin and inventory turns, I do not see metric improvement, sustained performance, or an increase in value." Ouch!
  • The One Question to Ask Before You Blow Up Your Customer Success Team - Want to dismantle dedicated customer success leaders/programs? Check out Dave Kellogg advice first.
  • Enterprise month in review - end of year blowout edition with Meg Bear and Brian Sommer - here's the audio version of my year end review show with Brian and special hot seat guest Meg Bear. I also broke down the year in enterprise AI and more during my latest attempt to disrupt DistruptTV.

Overworked businessman

Whiffs

Slow week for whiffs on this end - what did I miss? Well, there was this one, via Clive Boulton: Gmail 2024 Hack Attack Advice: Turn It Off And On Again, Google Says (have fun doing that across all devices all the time). Depending on your views about having your face catalogued and analyzed, there may be some whiffy stores on this list:

Well, at least there is this doozy of an crypto-executive whiff, also via Clive:

And they didn't even need an AI hologram... See you next time.

If you find an #ensw piece that qualifies for hits and misses - in a good or bad way - let me know in the comments as Clive (almost) always does. Most Enterprise hits and misses articles are selected from my curated @jonerpnewsfeed.

Loading
A grey colored placeholder image