Main content

Box adds AI to automate content analysis of cloud files

Phil Wainewright Profile picture for user pwainewright October 11, 2017
Cloud content vendor Box adds AI from IBM, Microsoft, Google to automate content analysis of cloud files for enterprise collaboration

Box Skills image 740px via Box
Content collaboration vendor Box is bringing artificial intelligence into its platform with the launch of Box Skills today. Available initially in beta, the framework allows customers to apply AI resources from IBM Watson, Microsoft Azure and Google Cloud to their content, and to build custom 'skills' that apply machine learning within a business process. The vendor also introduced Box Graph, which maps relationships to power new applications such as Box Feed, which surfaces relevant content and updates within Box, personalized to a user's activity.

The aim is to add value to content stored in Box, explains chief product officer Jeetu Patel, who briefed diginomica yesterday ahead of today's announcement during the opening keynote at the vendor's annual BoxNotes user conference in San Francisco.

Our goal is, when a piece of content is put in Box, that should be infinitely more valuable to you than if that same piece of content was kept outside of Box.

While Box Graph will be built on the vendor's own algorithms, the Box Skills framework allows Box to harness external AI resources being developed by other vendors.

If you look at folks like IBM, Google or Microsoft, they're spending billions of dollars on AI and machine learning technology. What we feel is, that should be used as a tailwind to really go out and enhance the experience for the content in Box.

We largely want to make sure, when the industry is innovating at such a rapid pace, that we should be able to take all the advancements happening there and build a framework in such a way that you can leverage all of that.

Applying AI to content

Box showed off several examples of how AI skills can be applied to content to support business goals, and will also encourage customers and partners to build their own skills using a developer kit.

  • Analyzing an image library to automatically apply metadata tags that describe elements of the image, to optically recognize text, and — using a company’s proprietary data stored in Box to train its own machine learning model — to recognize the company's own products in images. This makes the library automatically searchable for images with specific characteristics and products.
  • Analyzing video to automatically apply metadata tags, to transcribe speech into text, and using facial recognition to identify, frame by frame, when each individual's face appears in the video. This can be used, for example, to find clips within a video where a specific person is speaking.
  • A skill that combines IBM’s Watson Speech to Text and Natural Language Understanding services on the IBM Cloud to process customer service recordings and automatically surface priority issues or identify product names that generate positive or negative customer responses. The skill presents visual sliders for each sentiment which allows the user to go directly to a specific part of the recording based on sentiment. This could be used, for example, to help train agents to be more aware of positive and negative ways that they present information on a call, says Patel.
  • An example of a skill developed by a partner comes from Ephesoft. This detects information in a form or document, such as a legal contract, and extracts it into a custom metadata card in Box that helps automate a process, such as employee onboarding or a loan application. For example, this might detect documents that contain a 'wet' signature and therefore can identify executed contracts.

Implicit in some of these examples is the option for customers to train the framework's third-party algorithms to recognize their proprietary datasets, says Patel.

Most algorithms have had a public data set that they have access to. They have not had private data sets they have access to. What our customers can now do is, if they so choose to, they can enable their private data set being used, so that the machine learning algorithm can be optimized to their dataset, in addition to all that they've done with the public data set.

Graph - not a 'me-too'

The announcement of Box Graph comes more than three years after Microsoft first introduced its Graph. But Box's relationships database, which is still currently in development, will not be a 'me-too' of the Microsoft Graph, says Patel:

Our Graph will also take into account the integrations that we have. So if you're going out and putting content in Box from Slack you will actually know that, this was content that's put in from Slack, this was content that's put in from Facebook Workplace.

So our entire value proposition is neutrality — and one of those points of neutrality integrates Microsoft's technology. But it's not only Microsoft, we also integrate with Google, we also integrate with Facebook, and Slack, and NetSuite, and Salesforce, and 3,000 other companies. Microsoft primarily is building out the entire stack by themselves.

In any case, he adds, every vendor should have a graph:

Over time there shouldn't be a company that's around that's not going out looking at the telemetry of data and how it's actually being utilized. Just because Microsoft has a graph doesn't mean that we shouldn't have a graph. But the way that we have implemented this is very different.

My take

The reasons why enterprises put their content into Box has evolved over time. In the beginning it was simply a response to employees discovering the convenience of cloud compared to how they had shared files in the past — IT teams turned to Box so that they could gain some oversight of their employees' use of cloud storage.

Box's investments in security solutions then paid off. A few years ago, enterprises would question the security of cloud storage, but nowadays they realise their files are far less secure in ageing enterprise content management systems that were not designed to withstand the security risks of today's connected environment.

Today there is a third reason. Box's platform strategy initiated several years ago is paying off in ways the vendor may not have anticipated at the time. Content in Box can now be exposed to some of the world's leading AI resources. This now begins to build a compelling case to move enterprise content into the cloud — not for defensive reasons but in order to embrace new opportunities.

Alongside of this comes Box Graph, which even if it is an imitation-as-flattery response to Microsoft Graph is something that Box has to produce. Being able to match relationships is one of the advantages that content and collaboration vendors must leverage.

The one question mark I have is around messaging. There is a sense in which documents are the artefact of the old, paper-base ways of doing business. In the digital era, business takes place in real-time messaging and conversation, which is captured in platforms like Microsoft Teams, Slack and Stride. How Box can make sure that its graph also captures that real-time conversation is a question I'll be exploring further during my time at BoxWorks.

A grey colored placeholder image