Too many organisations today are, to a greater or lesser extent, DRIPs - that's data rich, information poor. Older organisations are particularly vulnerable, having legacy applications that are well-entrenched in their operations. What is more, the risks, costs and time involved in attempting to rip-and-replace all that legacy software with shiny new cloud native services is just too horrendous to contemplate.
At the same time, as the Scottish Environmental Protection Agency (SEPA) discovered, so much of our world is changing at an ever-faster rate. And while the raw data monitoring those changes is available, the tools to turn that into meaningful information that many can understand have not always been available.
SEPA is a classic example of DRIP in action. It has generated a vast amount of data, particularly scientific and observational data, over the years. But it found itself with the equally classic double-bind of being asked for more detailed justifications and explanations, while at the same time lacking the resources needed to quickly extract the required data or prepare it for consumption.
Yet this is a particularly important part of its business, especially for its Informatics Unit. Its job is to provide the regulatory part of the Agency with the necessary scientific evidence on which to both build and defend Scottish environmental policy. To this end the Unit covers a wide range of subjects, including the traditional sciences of chemistry, ecology, hydrology, and oceanic and meteorological science.
Under the leadership of its manager, Mark Hallard, It also broadens out into integrating these data sets in order to target Environmental Quality, using information from all of those other sciences, as well as data from external sources on areas such as environmental monitoring, meteorological information, river flow data, and impacts on agricultural land:
We try and bring it together and answer questions around how the environment is performing: is it good? Is it bad? Could it be better? And what do we need to do to get it better? Part of that problem was finding the extra time that was needed to answer these questions.
Give me the big picture
His answer was to find an analytics tool with a strong bias towards visualisation capabilities – something that could take in the existing (and still growing) data collected on all these areas and present the results in graphical forms that were easy to assimilate and understand by not only the senior management of SEPA, but also those to whom SEPA management ultimately report.
The choice ended up being Tibco’s Spotfire, whose visualisation capabilities has allowed the science department to not only analyse the data faster and more thoroughly, but to demonstrate extrapolations of identified environmental trends, and identify appropriate solutions. For example, Tibco’s Colin Gray, points to an example where data from graphical information systems was combined in Spotfire with data about river pollutants. Gray, a data scientist who is heavily involved in the SEPA project as a data scientist for the last eight years, pointed out that this allowed the organisation to track the flow of pollutants and the effect of water treatments works had on them. This has allowed SEPA to work with farmers on their future use of fertilisers and pesticides, he explains:
It's definitely providing the big picture stuff across the state of the environment. But it is also helping to identify the measures that can be taken to help improve things.
SEPA does not have the budget to pay for large cloud clusters or indulge in wholesale rip-and-replacement of existing applications software that, at least in the data handling part of its role, is still working satisfactorily. These are predominantly Oracle databases and applications, coupled with a fair amount of Microsoft Excel, which has often in the past been the primary data presentation tool. There is also a goodly selection of what Hallard calls "weird and wonderful systems" on the science side, many of which have their own idiosyncrasies when working with them.
The big issue was therefore the struggle to get access to all the relevant data, pull it together, then join, transform and clean it up and get it into a format that would make sense to those that had to act upon the information, as Gray observes:
Instead of doing a `rip and replace’ they are using Spotfire pull it together to create new views and better views of the data. The nice thing is, they can merge an Excel file with an Oracle database and then connect to a completely different system, and they could never do that before. The Spotfire in-memory processing is really fast, which does a lot of all the technical work in-memory. The actual visual presentation is the nice fun part, and also the smaller part. Another key thing is that, it can call out to specialist statistics engines such as R, which is very popular in economics.
Hallard sees the use of Spotfire as bringing some almost diametrically opposite benefits. For example, its ability to work directly with its legacy Oracle applications and datasets means that SEPA does not have to face the risks and traumas of changing its current environment, with all the associated costs and disruptions. At the same time however the output from Spotfire affords SEPA a far wider range of options when it comes to publishing analysis results. This now ranges from the traditional report model, through a wide range of graphics presentations, including interactive predictions, and on to direct publication to the public if that is deemed necessary. It can even be formatted to appear directly on a user’s mobile phone and is, of course, ready to work with any browser as its default output medium.
Having said that, it does also give him the option – at some future time – of adding other specialist applications, or indeed going for that rip-and-replace strategy. Tibco has comprehensive tools available for working with APIs, so interfacing Spotfire with just about any applications environment is not seen as an issue.