Main content

Making Salesforce admins '10x more powerful' - Einstein Copilot enters beta on the eve of TrailblazerDX

Phil Wainewright Profile picture for user pwainewright March 1, 2024
We speak to the SVP of Salesforce AI about the work that's gone into creating the Einstein Copilot digital assistant and its likely impact on how people use Salesforce.

Einstein Copilot screenshot © Salesforce
Einstein Copilot screenshot (© Salesforce)

The promise of generative AI has been on the lips of enterprise vendors for the past year, but it's in the year ahead that we'll really start to find out how well the finished products live up to the promise. In the case of CRM giant Salesforce, the buzz about generative AI dominated its TrailblazerDX conference last March, followed by the announcement of its Einstein Copilot and Einstein Copilot Studio products at September's Dreamforce conference. This week, the Einstein Copilot AI assistant has gone into beta and there will likely be more news around that when this year's TrailblazerDX opens next week. Ahead of that event, we spoke to Jayesh Govindarajan, SVP of Salesforce AI, to find out more about the work that's gone into making a reality of Copilot and the likely impact for Salesforce users, admins and developers.

The impact we'll hear about next week is probably greatest for Salesforce admins, who make up the majority of attendees at TrailblazerDX, alongside developers and architects. Applying the Large Language Models that power generative AI to the Salesforce platform will massively expand the automations they're able to build to make their users' lives easier. Govindarajan tells me:

I think we're going to take them on an amazing journey. To tell you the truth, I think we're giving them such power tools, that I think a lot of what they do is going to become 10 times more powerful.

'A full-scale digital assistant'

The reason for this is that Einstein Copilot isn't just about finding information and helping to create content. Probably its most impactful capability is its ability to understand the workings of the Salesforce platform and how to join different functionalities together. This has been a huge focus of the Copilot development effort. He comments:

What we've learned in the last six to eight months, and what the world has learned in the last six to eight months, is that LLMs have the capacity not just to generate great content, but go beyond that to become amazing, multi-task learners, and also orchestrate actions on your behalf.

Copilot therefore is seen as an ever-present AI helper, which users can address in natural language conversations to access all the resources and capabilities of the Salesforce platform, and anything else connected into it via MuleSoft. He goes on:

This is the underpinnings of a full-scale digital assistant for an enterprise user, which is really built into the flow of work, whether it be a salesperson or service person, marketing person, a commerce professional. There's work that people do on a day-to-day basis, which can be assisted in many ways. The groundwork that we've built in the last one year, which is the retrieval augmented grounding, which enables customers to write great prompts. It enables customers to generate content, which is very focused on the job to be done, and is deeply integrated into context, which is important...

With Copilot, we're kicking that into entirely new gear, which is to say, use the large language model’s capabilities to not just generate content, but also orchestrate actions on your behalf. So in other words, users will instruct the system to do things using plain English, and the ability to understand the instruction at a very high level, and then break it into one of many tasks — which could be, go look up this field in a database, go create that action component somewhere, and actually execute that action on the Salesforce stack — is what Copilot is actually able to do.

Scoping the context of user requests

There are various components whirring away behind the scenes to make this all happen — and, crucially, to minimize the potential for errors to creep in. One of the concerns about generative AI when applied in an enterprise context has been that the technology has a habit of making up answers — known as hallucinations — instead of sticking to the source material it should be working from. Another issue is often the reliability of that source material in the first place. The Copilot team has therefore spent a lot of time working on how the platform uses a technique called Retrieval Augmented Generation (RAG) to narrow down the scope of a user request, or prompt. The trick is ensuring that the model has as much context as possible about the user's query, so that it only looks at relevant sources. Govindarajan explains:

What we've learned is that it's really important to guide the LLM by narrowing down the search space of all the actions that it can execute, by narrowing down the search space of all the data that it has access to, and narrowing it down based on the instruction that has been given to it at that particular moment to go get something done. The way that works is, just like we ground in that particular context based on what the user's instructed, much the same way we ground on actions that are available to the user based on who they are, where they're logged in, and the access they have in the system.

Here's an example. Let's say you're talking to the Copilot, and you say, 'Help me upgrade my customer to a new product tier.' It understands 'my' to be you. And then it narrows down the space on who your customers are, the customers that you have access to. That helps reduce the degree of errors by orders of magnitude, in fact, because you're not looking to the whole database. And then, for those customers, it's saying what are the product tiers for these customers? That may not be infinite product tiers, it might be two or three product tiers. So now you're basically reducing the search base.

Basically what the Copilot is doing is, in essence, looking at context very, very deeply. Where are you logged in? What page are you on? If you're on a product page, it's likely you're talking about products in that vicinity. You are not talking about some marketing campaign related to some other product. [But] you may be if you're using the Copilot on the marketing page — which is where context is so, so important in reducing the search space, to reduce hallucinations.

Accurate execution

The same principle applies whether Einstein Copilot is retrieving information, creating content, or taking actions. In the latter case, it uses the multi-tenant runtime execution engine in the Salesforce core, along with MuleSoft APIs as needed, to orchestrate actions. Govindarajan says that having that metadata-driven execution engine already native in the Salesforce platform has made the job a lot easier, along with the accompanying infrastructure governing user access rights and execution rights. He explains:

Just like you need access to good quality content to do great RAG, just the same way you need access to a runtime execution to actually go execute those actions, which is very close to where the plan is being generated.

When a user wants to get something done, instead of having to remember which menu item to pick and what thing to do next, they can simply ask Einstein Copilot for what they want in a back-and-forth conversation. He elaborates:

Part of the Copilot stack is something called a reasoning engine. It’s coupled with an intent understanding component, so it understands intent. When the intent is ambiguous, it'll ask you more questions to clarify. Once the intent is very clear, it's handed off to the reasoning component, which then generates an execution plan.

The execution plan essentially looks like, ‘Go retrieve this data from the database, go make this API call with that data that you retrieved from the database, wrap it into this new object and then write in,’ as an example. In this example, it's orchestrating three actions to go get that broader task done. And I think being near an execution engine, such as Salesforce core, such as MuleSoft, is super key to actually going and executing that action and going the final mile, if you will.

Feedback loop

Of course this is not a guarantee that things won't go awry, and one purpose of the beta is to continue to refine the AI assistant's performance and accuracy, as well as refining how to keep humans actively in the loop so that they catch any remaining errors. He goes on:

Part of the Einstein 1 platform is the ability to actually collect feedback when the AI is wrong. Because there is an expert human in the loop, they will edit the output before it lands at the end customer. There is all kinds of friction that is built in to ensure that people read that, and actually edit the responses before they send it back. That — in addition to the thumbs-up and the thumbs-down signal, which is something we're training users to actually indicate — is important, because the more the system gets used, the more these errors are found, the more they're corrected, and the more of that learning goes back into the system. With more utilization, we are collecting signals, feedback signals, that make both the models better, both the prompting better, and the RAG better as well.

This feedback is currently evaluated by people in the development team, although it's likely that in future iterations of the platform some aspects of the process will be automated too. The fine-tuning isn't limited to retraining the LLM — an expensive exercise because of the processing power required. More often, it takes the form of comparing different ways of prompting to evaluate which ones work better, or creating better RAG components to provide more precise results.

The Einstein Copilot beta is now available for Sales Cloud and Service Cloud, with Commerce Cloud and Marketing Cloud available later in the year as well as a version for Tableau coming in the second half. It is currently in English language only and data residency is limited to the US initially. Enterprises can buy it bundled as part of any Einstein 1 Edition or as an add-on to Enterprise or Unlimited Editions.

My take

There is a lot going on here and it's significantly expanding the impact of generative AI beyond the most obvious use cases of finding information and drafting content to become a conversational interface through which users can access all the information and functionality of the underlying systems. This will have a massive impact on how people use Salesforce and other enterprise systems, although it's important to remember that the technology is still in beta at the moment and there are remaining wrinkles to iron out, not least in detecting and excluding errant answers or actions that the AI throws up. At diginomica we're always alert to the risks around AI projects but the careful approach that Salesforce is taking as it rolls out this technology is the right one to take.

I'll be at next week's TrailblazerDX event and will report back with further product news, executive comment and customer stories that emerge there.

A grey colored placeholder image