Main content

AI for AI - evaluating the opportunity for embedded AI in data productivity tools

Neil Raden Profile picture for user Neil Raden January 9, 2020
AI-for-AI is gaining attention - but is the capacity for embedding AI for data productivity overlooked? Let's do a gut check on the views of industry experts.

Young Man with Magnifying Glass

In a recent article by the BCG Henderson Institute, How to Win with Artificial Intelligence, the authors posed the question:

Among the many companies investing in artificial intelligence, there is one surprisingly exclusive group: companies that generate value from AI. And right now, at least, the odds against gaining admission are sobering. According to a survey of more than 2,500 executives - conducted for a new report by MIT Sloan Management Review, BCG Gamma, and BCG Henderson Institute - seven out of ten companies report minimal or no gains so far from their AI initiatives. Why do some efforts succeed, but many more fail?

The new report by MIT Sloan Management Review, BCG Gamma, and BCG Henderson Institute offers six suggestions to improve the rate at which organizations can succeed with AI:

  1. Integrate AI strategy with business strategy
  2. Prioritize revenue growth over cost reduction
  3. Take on large projects with big impact - even if they're risky
  4. Align the production of AI with the consumption of AI
  5. Treat AI as a major business transformation effort
  6. Invest in AI talent, data governance, and process change

That's all pretty obvious. The only one that struck me was #3. Not that is is so novel, but rather that the authors take a stand on what is still a hotly debated issue: aim-high-but-start-low, versus start-low. I wouldn't consider #3 a given. It depends on too many factors.

One area where AI has quietly made verifiable and useful progress is not applications. We see a rapidly expanding use of AI embedded in process at the other end of the application pipeline: managing, interpreting, and provisioning information.

The point is, AI for digital transformation and creating gee-wiz customer-facing apps is a lot harder than engineering embedded AI into tools. It’s a win-win because failure to provide clean and AI-ready data is the most-often heard reason for lack of progress. Doing so, getting the data ready, at the scale and cadence needed to push AI through, has become an inhuman task. Using AI to get prepared for AI; it has a beautiful sound to it.

Scanning an article by Gil Press in, 120 AI Predictions For 2020, I’m getting some support for this idea. (note: there are only sixty comments in the article, it is Part 1 I suppose). I include these as examples. This is not a complete industry survey. Here are some highlights from the respondents, and my reactions.

Joe Hellerstein, Co-Founder, and CSO of Trifacta. Joe is a Professor of Computer Science at the University of California, Berkeley, whose work focuses on data-centric systems and the way they drive computing. He is an ACM Fellow, an Alfred P. Sloan Research Fellow, and the recipient of two ACM-SIGMOD "Test of Time" awards for his research. Hellerstein sees the role of AI behind the data:

Expect also to see increased investment in data preparation—an integral component in any data project that is still often regarded as the biggest bottleneck for many—driving improvement in data quality and relieve IT from the pressures of preparing data.

Haoyuan Li, Founder, and CTO, Alluxio:

What used to be statistical models now has converged with computer science and has become AI and machine learning. So data, analytics, and AI teams can't be siloed from one another any longer. They need to collaborate and work together to derive value from the same data that they all use. In 2020, we'll see more organizations building dedicated teams around the data stack.

MyPOV: And they’ll be using tools with AI-assisted development. These dedicated teams will need to drastically reduce the amount of code they write for data management, version control, one-click to the cloud of algorithms (or tensors). AI/ML doesn't get enough attention for what it can do for the productivity of AI engineers.

Philippe Vincent, CEO, Virtana:

…enterprises … will require AIOps-based solutions that integrate infrastructure monitoring, workload automation, and capacity planning into one platform. As such, vendors who fail to adopt an AIOps model of service and enterprises who fail to invest in end-to-end infrastructure visibility will be unable to deliver on customer requirements and performance SLAs

MyPOV: Good thought, but doesn’t really talk about AI.

Dan Sommer, Senior Director, Global Market Intelligence Lead, Qlik:

It's easier now than ever to do in-database indexing and analytics, and we have tools to make sure data can be moved to the right place. The mysticism of data is gone: consolidation and the rapid demise of Hadoop distributors in 2019 is a signal of this shift. The next focus area will be very distributed, or 'wide data.' Data formats are becoming more varied and fragmented, and as a result different types of databases suitable for various flavors of data have more than doubled.

MyPOV: No human will be able to organize this smorgasbord without tools with embedded machine learning, deep learning, and NLP.

Sanjay Srivastava, Chief Digital Officer, Genpact:

We’ll see the rise of Digital Ethics Officers, who will be responsible for implementing ethical frameworks to make decisions. This includes security, bias, intended use, and built-in governance

MyPOV: This is the only comment that mentions ethics; otherwise, it has nothing to do with the topic. I just wanted to point out fifty-nine out of sixty prognosticators overlooked the one thing is going to be red-hot in 2020.

Yaffa Cohen-Ifrah, CMO and Head of Corporate Communications, Sapiens:

AI enables insurers to better utilize the troves of data at their disposal to benefit from vital client insights that maximize their services and products. This results in satisfied customers and a more efficient business.

MyPOV: Actually, it depends on which type of insurer. Long-tail risks, like life insurance, have data in aging applications, with records forty, fifty or more years old. Utilizing these “troves of data” is very difficult, and ripe for solutions using AI, rationalizing a difficult mix of semantics.

Sanjay Jupudi, President, Qentelli:

2020 will see more focus on explainable AI, to reduce any bias in the predictions. Data scientists will become an integral part of the product teams and work closely with them to create a data-first approach to app development, instead of focusing on making sense of data generated by apps

MyPOV: I don’t understand what he means by “data-first” exactly, but I think it alludes to data pipelines informed by AI, so good.

Laurent Bride, CTO, and COO, Talend:

Whether being used to automate repetitive tasks (data prep, etc.) or connecting pipelines through contextual information from you and your peers, AI will begin to infiltrate all areas of business functions.

MyPOV: I don’t believe anyone else mentioned pipelines. The pharmaceutical company GSK demonstrates in public presentations that the consolidation of all clinical trial data operates with more 10,000 pipelines, all orchestrated by StreamSets, but I have not heard explicitly that StreamSets is using AI in their product, but I suspect they do.

Carl Vause, CEO, Soft Robotics:

With respect to Artificial Intelligence, just because data exists within an organization doesn’t mean that data is in a usable, transferable format. 2020 is the year that businesses will begin to understand that their data is not AI-ready, rendering their business processes inefficient, ineffective or inaccurate.

MyPOV: Not mentioned is the need to get that data usable - demanding AI solutions.

Notably, a few examples of technology providers that are infusing their DataOps and/or information integration offerings with AI are:

Informatica: Introduced three years ago, Informatica announced a product called CLAIRE, which purported to be a suite of AI capabilities infused in its extensive product platform. It took another to years for me to understand more or less what they were doing. CLAIRE is now a standalone product with devoted engineers, marketing, and management, which provides AI (ML, various types of Neural Nets, NLP) to support the entire product platform.

UnifiSofware (now part of Boomi): Unifi OneMind AI technology underlies the functionality from data prep and data catalog recommendations to the discovery of similar datasets, to Natural Language Query support. Based on a Knowledge Graph, it employs Recurrent Convolutional Neural Network, Hidden Markov Model, and Gene Sequencing algorithms.

Trifacta describes itself as a “data wrangler” (which so far is not an official, technical term, but maybe watch this space). It uses a combination of machine learning with human nudging to query and organize data to produce various insights.

Paxata (now part of DataRobot) Paxata has AI-like capabilities though not explicitly described as an an AI engine. It possesses the ability to auto-discover dependent data preparation projects and data sets and automatically creates multi-project data flows. It goes without saying that DataRobot, a company conceived as an AI company, will add AI to Paxata’s data integration and catalog capabilities.

Tamr: There are two main places where Tamr uses machine learning: entity consolidation (deduplification) and entity classification. An interesting aspect of this is that Tamr uses reinforcement learning when there isn’t sufficient training data to build a model initially. Entity classification (record classification) is a multi-step process, with ML parts.

My take

The bulk of conversations about AI involve people, applications, and automation – the end of the application train. Opportunities for AI to energize the invisible parts of that train are maturing and will provide a boost to the AI success (failure) experience.

Maye a little tangential to my suggestion, Hackernoon opines:

When AI is applied to how we develop applications, it will transform the way we used to manage the infrastructure. AIOps will replace DevOps, and it will enable your IT department staff to conduct precise root cause analysis. Additionally, it will make it easy for you to find useful insights and patterns from massive data set in no time. Large scale enterprises and cloud vendors will benefit from the convergence of DevOps with AI.

I’d have to ask Irfan Ahmed Khan what happened to DataOps, which seems to have come and gone in no time, but I suppose AIOps is just smarter DataOps.

A grey colored placeholder image