While collecting, collating and analysing data directly related to managing COVID-19 and its immediate impact, what can analytics tech do to help with the aftermath in a post-pandemic economy? This was a question I put to Michael O’Connell, Chief Analytics Officer of Tibco, and Sharon Daniels the CEO of natural language processing specialist and Tibco partner, Arria NLG.
The two companies have combined their efforts to not only aim Tibco’s Spotfire analytics tools at providing some of the most detailed and granular real-time analysis of global statistics relating to COVID-19, but also use Arria’s natural language AI-based technologies to provide equally real-time detailed explanations of what the statistics refer to and what they actually mean.
As recent events have demonstrated, one of the increasingly common bones of contention about what is actually happening in the on-going struggle to manage the pandemic is what exactly the statistics are showing. There are many different ways of slicing the raw numbers and Tibco has shown it can not only cut them in a number of different ways, but also use the Arria tools to provide the necessary explanations. This is a capability which, when taken outside of the pandemic context, could offer capability in democratising the availability and exploitation of the biggest of Big Data at a time when that Big Data itself runs the risk of becoming a silo which only the elite with the right skills can make use of.
As Daniels explains it, the combination of the two technology platforms is an example of what will be needed in everyday business, where data is changing, in real time, all the time. And from an experimental perspective, which is at least in part where both companies are coming at this particular project, it’s a universal example that resonates with just about everyone on the planet, simply because there is so much information and such a great need to make at least some kind of sense of it:
With more and more data, the ability to understand the data very quickly in nearly real-time enables businesses and individuals to take action.
The confusion and distortions of very Big Data
According to O’Connell, Tibco has gathered data from as wide a range of sources as possible, wherever it has been published and made available. This includes getting every Tibco office around the world to scrape together all relevant data from their locations, giving a surprisingly comprehensive view of the global state of play with the virus and its development, using a Friedman ‘Super Smoother’ non-parametric regression technique. This means it is possible to move from a global view of the data down into a very local view, which in this context would mean the ability to go to the Zip Code level in a specific county of a specific state in the US.
Depending on what data is available about a location, it would then be possible to identify the medical facilities available in that area, what resources they can provide and quite possibly how long it would currently take to drive there. It estimates trajectories that are indicators of the cases and fatalities that are going to show up, as well as the impact of social distancing. This been shown to work well across different states in the US. It also allows for comparison between different countries across the full, range of metrics analysed.
One important advantage here is that with complex, multi-levelled datasets being talked through by a data scientist skilled in the subject, it is too easy for the non-specialist to lose track of the information transfer. This is where the natural language tools from Arria can play a vital role by producing relevant explanatory information as changes in the data displayed are made. As Daniels points out this can have some significant benefits:
With what’s going on right now, there’s a lot of misinterpretation. There’s a lot of lack of trust because people might write one thing and it’s biased, or it’s leaning toward a political view, or whatever it might be. But when the narratives are driven based on the data, you remove all of that.
One of the key things here is being able to look at one set of data – in this case healthcare stats - and correlate it with unrelated data to unearth possible unexpected correlations that might be either useful or urgently need stopping.
For example, using this pandemic data, is would seem obvious that population density would be a key driver of infection hot spots. But other indicators might suggest a poor economy (a lack of money), the political attitude of people, or any number of other indicators of why a hot spot exists or should be expected soon. The work put in by Tibco’s 4,000 worldwide staff, seeking out local numbers and details of local actions/inactions in terms of quarantines and sanitary behaviours, greatly improved the breadth and depth of granularity added to the analyses.
Opening the economics
O’Connell explains that this extra dimension to the analysis process is already starting to be used to look at possible opportunities in opening up economic activity. Because of its granularity it allows local governments and businesses to look for and identify small, targeted opportunities that can get economic activity going in a manageable way, rather than grand, state or nation-wide schemes, which sound good, but lack the possibility of easy and fast implementation:
We’re starting to look now at these county levels. I’ve got a project now with a big retailer, a big Tibco customer, and we’re helping them to understand which stores they should open and in what sequence. They’re giving me their online sales data, which has really ticked up over this last couple of months, and their retail store data from before the outbreak. Weather is also a factor - the company sells outdoor clothing and related items, so in different regions they’re driven by the weather and the length of the winter and stuff like that.
Being able to see the hotspot at a very local region, we can overlay the retail stores on top of that, overlay the weather, and overlay the previous store data from before the outbreak. If your sales were down before the outbreak in a certain region and the weather isn’t favourable, and you’re doing ok in that region with online purchases, then maybe that’s a lower priority store to open. So the question is now, how we take that forward now into different use cases that can really help us try to figure out how to get back to work and so on?”
This does beg an obvious question: could this be used to guide individual states as to what they should do? Given that there are still arguments between states and the US Federal Government about whether they should open up their economies or at what rate, this does seem an area where decisions based on the most granular data could give the best input on possible plans for action. O’Connell acknowledges that it could indeed be used as a guide:
We do have some conversations going on through senior executive leadership teams, contacts with governors of different states, and we are talking to people about that.
There is certainly a learning here for most businesses around keeping close tabs on a much wider range of information than just what a known marketplace is doing today. The wider that market reaches, the greater the depth and breadth of information that needs to be part of the everyday dataset on which a company bases its decision-making. The Tibco decision to make local information gathering part of the ‘job-spec’ of every employee is perhaps an object lesson all companies should adopt.
It has allowed O’Connell’s team to analyse the pandemic from a wide range of perspectives, including longer term business planning. He is well aware that there are now many services available for analysing the nuts and bolts of the pandemic itself, but holds the opinion that wider issues such as business and employment have not been well thought through. The retail company Tibco is helping to sequence stores re-opening is just one example:
Other companies are trying to figure out when they can they safely bring people back into the offices. And so we’re working on systems that help on the public health side, but also on the employer side, as people are trying to figure out how to get back to coming into the office or wherever else that might be.
What are the relevant data at a local geospatial level, at a predictive level, commercial transaction sales data, weather data? It all comes together, You bring those disparate sorts of data together, and try to make sense of that complex situation. We can do all the data science and that’s great! But if then you want to make that more broadly applicable to a casual user, then natural language can help in bringing that together.
Here is a good example of the fact that the problem of dealing with huge volumes of data is two-pronged. One prong is obvious – wrangling the data in order to squeeze the knowledge and the sense out of it. The trouble here is that the results can, in their own way, be as deeply obscure as the original data. So the essential second prong is the use of AI-based natural language tools with the capability to explain what the answers mean in ways that democratise them. It is one of the small wonders of the world that those who end up knowing a great deal about the minutiae of data are sometimes less able to see the possibilities of connection between datasets, whereas those that know little can, once a suitable explanation is provided, prove capable of making giant connective leaps. It is both prongs, together, that are most likely to poke the post-pandemic ‘new normal’ economies into action.