For those enterprises wondering how to safely apply Large Language Models (LLMs) like ChatGPT to their own stores of documents and other textual information, content teamwork vendor Box introduces a potential answer today. The announcement of Box AI promises the ability to interrogate and summarize individual files stored in Box and to auto-generate new content in Box Notes, using the same OpenAI technology that powers ChatGPT. The beta announced today is just the start of much more to come, according to Aaron Levie, CEO of Box, who I spoke to ahead of the launch. He says:
This is going to be a many-year journey — we're just at the beginning — but the use cases are just mind-blowing ...
Now, for the first time ever, we can synthesize our unstructured data, our content, in a way that just was never possible before. We can ask questions of a document — 'What's one of the most important insights from this file?' Or, 'Summarize this content.' You can ask a question of a large set of data, and it can return an answer in a natural language way.
We're just incredibly excited about where this potential is, and we're seeing breakthrough use cases that we never would have imagined before. That's why Box AI is such an important announcement for us.
LLM technology is particularly powerful because it can be applied without having to be trained to a specific use case. He explains:
The big breakthrough on these large language models is really the horizontal nature of AI today is just so much wider. The fact that in a single model, you can have use cases for life sciences and financial services and media & entertainment — every single industry can be impacted by AI now with the same model. That means that you get a significant boost in the efficiency of building software, leveraging AI, that wasn't possible before, when you had to have very bespoke narrow AI models for every use case you had.
As an enterprise content platform, Box is particularly suited to harnessing that power. He goes on:
What we've always believed is, what's inside that content is knowledge. It's business value. The energy of the company is inside of that data. It's your financial information. It's your marketing assets. It's your product launches. With AI, it's the first time you can unlock that information. We think we stand to benefit more than the vast majority of software, because it can unlock that knowledge and intelligence from your data for the first time ever. That's why we're so excited about this.
Selected customers will initially have private beta access to Box AI, ahead of general availability expected later in the year. While OpenAI is the launch partner, the plan is to open up connections to other AI models in the future. Levie says:
Our whole approach is, how do we connect content from Box and connect it with basically any AI model that customers want to be able to work with? Right now, we're launching with OpenAI, but in the future you can imagine us working with a variety of different vendors.
Unlike ChatGPT and other use cases based on publicly available information, the Box offering will apply the GTP3.5 or GPT4 model to an enterprise's private content store without the risk of it being exposed to other users. Levie explains:
The big breakthrough is, how do you take any AI model and connect it with enterprise in a very secure way that obviously doesn't leak any data. It keeps all the data very sensitive and private, but can leverage the understanding that the AI model has of language, and be able to reason through questions about documents, in a really profound way.
As well as building on the existing Box infrastructure to protect access to enterprise content, Box AI also builds on the company's previous investments in AI, while adding new capabilities specific to LLMs, says Levie. The company has also published a set of principles for responsible use of AI today.
Restricting the content that the model works with reduces the risk of introducing fake information when answering a query — a common LLM phenomenon known as hallucination. Levie explains:
The way that we are sending the data to the AI model, we're instructing it to not answer anything outside of what's in the content. That's dramatically improving any risk of hallucination. Now there's still a risk, to be clear — there's definitely some scenarios where that can happen. But for the most part, it's dramatically reduced this hallucination issue, because you're not asking the AI model the question. You're asking your document the question, using the understanding the AI model has of language, as its ability to reason through the data. That dramatically reduces some of the normal hallucination risks.
Initial use cases will be producing instant summaries of documents, extracting key insights or information points, or rewriting content in a different style or tone. Content can also be generated from scratch in Box Notes, for example turning a set of bullet points into a report, a specification, a blog post or a press release, generating a meeting agenda, or brainstorming new ideas. Levie says:
You could ask a set of documents, 'Help me to brainstorm a way to save money in the company,' or 'Write a sales pitch, using our new product information.' Or you could look at a contract and say, 'What's the riskiest clause in the contract?' Or you could be looking at HR data and say, 'Hey, we want to improve our HR benefits? How would we do that?' It can leverage all of its understanding in the AI model, connect it to your content, and answer that question in a very specific way, just to your business.
That's the breakthrough, being able to ask questions of content that just never was before possible. That's what Box AI is going to be able to do — really unlock the full value of the information in your enterprise.
Initially, however, the implementation is restricted to working with single documents at a time. Further use cases will come when the model can work with collections of documents or the entire content library of an enterprise. Levie comments:
What you're initially seeing right now are just the first foray of use cases. I think the real breakthrough category is going to be asking questions of an arbitrarily large set of data, and then getting back a natural language response ...
Imagine the use case of, you go to an HR portal or a sales portal, and you ask a question of, 'Okay, I need a particular pricing for a product.' You just ask the question, and you get an answer. As opposed to, I go to a file, I open the file, I find the right part of the file. That's the probably really big use case. We're going to be doing that on an individual document level right now. But the big potential is when you do it across a set of documents.
Future use cases
The first iteration also doesn't provide any guidance for users in how to write prompts. It will be up to users to figure out the best prompts to achieve the results they're looking for. In the future it's likely that the platform will introduce templates and filters that will help users construct prompts. There's also no option to add a training feedback loop into the current version, which an enterprise might use to fine-tune the results the model produces. "Over time, you can imagine that would definitely make sense," says Levie.
Future use cases are likely to include incorporating Box AI actions into workflow automations, and helping to automatically build classifications for information sets. Levie elaborates:
You can give it a document, and then you have a prompt that says, 'Please classify this document and explain your reasons why.' It does an incredible job of any kind of granular level of classification you want ...
Let's say you're a film studio, you could have it classify movie scripts, of this genre, of this length, and it will classify that. That's just never been possible before. Because previously, you would have had to train an AI model just on movie scripts. And then you'd have to kick off to the movie script classifying engine. And again, nobody ever did that. It just took too much work.
What you can now do with these AI models is, basically if it's a college graduate level of intellect, now apply it to any kind of document or content process, we can now solve that problem in an automated fashion.
Today's announcement is therefore an important step towards building AI right across the Box product suite. He sums up:
You can imagine AI really being at the centre of our entire architecture. Our content lifecycle is everything from ingesting data to protecting it, classifying it to collaborating on it, that entire lifecycle. We think AI will be infused into how we work in everything that we do.
Box as a company is already well versed in the use of AI to help enterprises manage and make sense of their content stores, but adding LLM technology certainly takes this to a whole new level, while still building in important guardrails to protect the integrity and security of that content. It's good to see Box taking this one step at a time to ensure that the technology produces results that are meaningful in an enterprise context while managing the risk of rogue outcomes.