One is the emergence of a limited set of use cases that customers are enquiring about as they start to get their heads round the issues and possible applications. The top two are getting a 360 degree view of the customer, and the other is the Internet of Things (IoT) and, more specifically, the Industrial Internet.
Both are being driven by the fact that these are applications areas where potential customers can most easily understand where ROI might be found.
Top of the pile is gaining that elusive 360 degree view of the customer. Here, the need is to blend existing structured customer data with unstructured data from clickstreams, social media, emails and a range of other sources. Gallivan says:
For these the company is an arms merchant selling the necessary platform and the domain expertise needed to get an RoI.
The Internet of Things is the second most popular use case, and Gallivan characterises this as users needing a 360 degre view of the sensor:
Here are customers that are the most transformative. This is because we are working with old, stodgy industrial companies that are looking to transform their businesses by making the sensors sing.
With IoT and the Industrial Internet the main use case is the area of preventive maintenance. For example, in the oil and gas production environment, Haliburton uses Pentaho to collect all the machine data coming from its rigs and put it into Hadoop data lake. From here it can then blend all the unstructured machine data with relational, structured data holding specifications, maintenance data, warranties and service data about the rig components, such as pumps and valves.
This is then fed into a Haliburton application used by that company’s specialist engineers to initiate preventative maintenance programmes that schedule not only maintenance before the components fail in production but also ensures replacement components are available on site when the work is done.
As even the uninitiated will know, unexpected failures in the oil and gas business can be expensive, as well as messy both politically and ecologically, so the potential advantages of preventive maintenance is easy to comprehend:
We are seeing preventive maintenance as the first tap off the ring, and for us it is becoming the biggest IoT use case. And this is where the acquisition by Hitachi Data Systems comes in.
This is because those advantages only come from a much more complex infrastructure of information monitoring, gathering, analysis and management systems, and it exactly that which HDS is pulling together and building and of which Pentaho now plays a significant part.
It is hardly surprising that Gallivan sees analysis and data integration as key factors in both pulling such infrastructures together and oiling the flow of data between sensor responses and meaningful management of process outcomes on which business decisions can be made. The factor underpinning such a potentially complex infrastructure is then, he suggests, analysis, and Pentaho’s key contribution is its Streamlined Data Refinery:
What we are doing is putting the analytical power into the hands of the analyst trying to do something with the data. We’re automating that ETL process and putting data into a data refinery where can be blended.
This is the process of combining and making sense of data from unstructured sources such as a wide range of sensors, plus data from structured sources such as historical maintenance records and component expected performance and lifecycle data.
This, according to Gallivan, is a capability made more important by the batch processing nature of Hadoop, which means it is not yet ready for the real time processing requirements essential to industrial IoT applications. Pentaho gets round this problem by being able to prepare the data for analysis and managing the data flow through faster, real-time capable systems such as MongoDB or SAP HANA.
These can give users the much lower latency required to provide the real-time preventive maintenance information that is increasingly required. It can, of course, also be extended to the management of more complex tasks such as the optimisation of automated manufacturing environments.
If HDS is serious about going outside its own extensive IoT requirements and to service the wider industrial marketplace, it will have to work with whatever systems, and in particular well-established relational, structure data systems, that customers are already using. So that does mean working with SAP’s HANA in-memory processing technology, though in Gallivan’s view it is not in big data right now:
Data integration in big data is where Pentaho sits, dealing with the unstructured and data-blending worlds and we feed into the next generation data management tools such as Hadoop or NoSQL.
For the analytical work the common flow is to take data out of Hadoop, have it blended, and put into a faster, semi-structured databases such as GreenPlum, Vertica or AstraData.
We see HANA still in the relational world. It’s a helluva product and they should be playing in big data world. It has a rightful place at this table and we’re hoping to help them.
He did not dismiss the notion that SAP had managed to side-track HANA by its efforts to get all its existing applications running on the environment. He also acknowledged that the relational sector is still a $60 billion market, and a sector where the majority of the revenue is to be found at this moment. In his view, it is no longer where the potential is to be found, however.
At the high end of the Industrial IoT infrastructure, where business managements are making decisions based on industrial outcomes, policy management capabilities become far more important. Pentaho’s contribution to this requirement is the application of what he called data lineage.
In essence, this is the auditing and authentication data’s ancestry and veracity – that it is what it purports to be – coupled with on-demand blending capabilities. This, he suggests, is particularly important to maintain the healthcare of the data and provide and auditable chain of custody for the data:
Preventive maintenance is the first market for this but it applies in many markets and areas. The next one we see is security, particularly in the home. Safety in the home, especially for older people, is an important opportunity.
As with many emerging markets, customer education is also proving to be an issue to be addressed. The Pentaho solution is a set of Big Data Blueprints, essentially reference architectures on how to achieve a desired result, such as obtaining a 360 deg. view of a customer or building an IoT infrastructure. That will include suggestions for other vendors offering tools and/or domain expertise in the specific area a customer is looking at.
Further out, and quite possibly an opportunity for service provision revenue streams by other HDS businesses, is the notion of providing sector-specific data refineries or lakes. Though the company has no designs on monetising the data directly,
Gallivan does see an opportunity in monetising the means of monetising the data. He sees this as the next wave of industrial internet and the company is already doing some of this in specific areas.
For example, in the financial services market, the US financial industry group, FINRA, provides members with analytics services based on a centralised data lake of 30 billion financial trades a day:
It is not something Pentaho would do itself. But as with FINRA, it would be down to an organisation being willing to charter the service. They would be the ones to monetise the data.
360 degree views of customers in retail, and preventive maintenance in industry are both obvious targets as starter markets with huge and readily definable potentials. But IoT also opens up fascinating new opportunities.
For example, the mind-boggling volumes of data IoT will be producing offer tantalising prospects for those organisations looking to analyse out industry or market sector-specific information resources – for example real-time data relevant to insurance actuaries.
That in turn offers interesting prospects to those businesses capable of servicing and hosting such facilities.
Disclosure - at time of writing, SAP is a premier partner of diginomica.