Graph analysis - some off-the-wall use cases

Profile picture for user Neil Raden By Neil Raden June 4, 2019
Summary:
Graph technology is on the rise with some interesting applications.

Image of a human brain abstracted with AI

The relentless march of digital technology (and I have to admit I’m getting really tired of saying that) assures that all organizations will adopt AI in some form to stay competitive. The accumulative effect of Moore’s Law, an exponential phenomenon, pushed computing to a tipping point. The rapid transformation of computing of all kinds, but especially for analytics and prediction, put to rest the prevailing model of managing from scarcity, a lingering gestalt that has been around since computing was invented.

In reality, we don’t have unlimited resources, but they are plentiful and economical enough that their  prominence in the calculus diminished to point where we can finally think about the solutions first. What’s on the top of the heap today is the capability to manage and analyze data unimaginable only a decade ago. The effects of this are not trivial.

Though it is difficult to generalize, most organizations have not kept up. A combination of factors, a sort of perfect storm, is bearing down on industry that will have substantial effects on how digital business is conducted.  This includes  InsurTech (for insurance companies), Big Data, AI-driven Customer Experience, Edge Computing, Hybrid Cloud Computing and an explosion of DIY (Do-It-Yourself) AI.

Rather than utilizing modern analytical workflows, such as Intelligent data management tools and highly functional analytical tools offering frictionless continuous intelligence from ingest to final results, we find organizations  still mostly dependent on personal tools such as Excel, Access and to a certain extent, data visualization tools like Tableau (this is counter to the messages you no doubt hear that everyone is charging ahead with data science and AI).

There is, of course, a proliferation of data warehouses and data marts, but they are designed to provide curated data at a scale and diversity unacceptable to the demands of AI. Many organizations are stuck with BI, or worse, which inefficient and limiting. More importantly, they lack the capability and promise that current environments of predictive analytics, data science, and AI provide. Of course, the introduction of these disciplines will come with critical ethical issues of bias, privacy, security, and transparency.

People in organizations with titles like data steward, data engineer, data analyst can have a broad knowledge of the intricacies of certain domains of the business, but these silos are counterproductive. What organizations need is a comprehensive AI-driven platform that extends from data ingest all the through to a an active data catalog, driven by a single knowledge GRAPH.

Graph Analysis

Graph theory is a branch of topology, first invented by Leonhard Euler in the late 18th century that is having a renaissance today as a result of the abundance of computational resources. It is not necessary to learn algebraic topology to use graph theory, and there are quite a few commercial and open source products as well as embedded graph capabilities' in packaged software. Graphs are essential because it is impossible to navigate through the ocean of unlike data available for modeling and analysis without some tools to illuminate the process. Graphs are abbot relationships and provide the ability to traverse faraway relations easily and quickly, something for which relational databases are quite limited.

The Important part of managing unalike data is finding relationships. Manual methods for finding relationships in unalike data is too limited to be effective. Technical metadata like column names is useful, but the magic is investigating the actual (instance) data to determine what it is. Without robust relationship analysis in extensive collections of data, error and bias are unavoidable.

This is not a primer in graph analysis demonstrates with some interesting examples of a capability that most practitioners are  not likely to utilize yet but will be bedrock capability when then they fully utilize AI. The following are some applications to illustrate the unique power of graphs to facilitate AI:

The Panama Papers - massive criminal fraud detection

This is an example of graph analysis to unravel an impossibly complex set of events, relationships, and circumstances (not to mention government complicity) and untangling the relationships of the Deutsche Bank subsidiary, Regula Limited. The Panama Papers. hidden in 11 million secret files, 140 politicians from more than 50 countries connected to offshore companies in 21 tax havens was clearly beyond human capability to sort out.

The documents contained personal financial information about wealthy individuals and public officials that had previously been kept private. While offshore business entities are legal (see Offshore Magic Circle), reporters found that some of the Mossack Fonseca shell corporations were used for illegal purposes, including fraud, tax evasion and evading international sanctions. Graph analysis was able to do what  million SQL queries could not. There is a query language for graphs, GraphQL, which is somewhat similar to SQL and easy to learn, provide the query builder understands the structure of the graph.

In the case of the Panama Papers, the relationships, money laundering, cross-international transactions, corrupt banks and shell corporations only became illuminated what a massive graph was able to provide almost instantaneous answer to questions.

Who is this guy?

When an auction house puts together a collection they evaluate each piece, take pictures, evaluate its condition and put it in a catalog with an estimated bidding range. In this case the auction was a collation of autographed and inscribed first edition books and various other letters and written items with the author’s signature. One of the auction employees, let’s just say not the sharpest knife in the drawer, came across a book, “A Connecticut Yankee in King Arthur’s Court,” not one of Mark Twain’s most memorable books. He graded it B+ based on condition and inscription, “To my good friend Nicholas, Mark Twain” and pried it at $600-$750. As it turned out, the auction house used a massive graph database (as a service) and when the unwitting novice entered the information he was, let’s day, stunned. The inscription was interpreted by the graph database as, “To my good friend, NIKOLA, Mark Twain,” as in Nikola Tesla, one of Twain’s closest friends.

The catalog was quickly adjuster to $5,000 - $10,000.

Provenance

Another example involves the auction of a racecar. It demonstrates the ability of graph analysis to provide the investigator with more information than requested, potentially unveiling an extensive set of circumstances overlooked. This is a good example where someone unknowingly used the power of graph analysis to avoid a very embarrassing mistake:

An Auction employee, knowing nothing about race cars or Paul Newman, in constructing the catalog for a collectible auto auction, looked at comparable prices for obsolete race cars that couldn’t be converted to street use, such as the Porsche 935 race car and  was about to publish the listing for $300,000- $650,000 based solely on comparable sales. 

Luckily the auction house maintained an exhaustive graph database of relationships and provenance.  At the last minute, realizing that the car was far more valuable, driven to first place in the 24-hour race at LeMans by a beloved Oscar-winning movie star and philanthropist almost 60 years old in one of the most grueling auto spots races, he quickly revised his estimate to $1,400,000.

It sold for $4,400,000

Agent compensation

In financial service and life insurance, “…shared compensation between insurance agents or financial service company representatives can be interpreted as a mathematical graph with points (nodes) representing agents and the lines (links) between the agents representing shared compensation…to provide additional input variables to predictive models that provide insight into agent behavioral aspects that other variables may not capture. Overlap ratios can be calculated directly and input variables derived from summary statistics between agents and the other agents with whom they shared compensation…" Second-degree connections for an agent can be summarized to provide variables that provide additional behavior insights. Useful for fraud, obviously, but also useful for financial services firms to under the often intricate network of agent and third- parties.

My take

It is not necessary for AI engineers to understand the mathematics of graph theory. It is now a readily available tool to use for analysis of relationship's, and it is introduced in commercial software to build knowledge graphs automatically. These examples are only to illustrate some situations that benefit from it.