Harte Hanks picks MapR over Cloudera for Hadoop Big Data engine
- Summary:
- After trialling Cloudera last year, the marketing services company went on to sign a deal with its rival MapR in April 2015 for technology that will underpin sophisticated new analytical capabilities for clients.
Total Customer Discovery will allow Harte Hanks clients - which include some of the world’s best-known brands such as FedEx and Samsung - to get a complete view of customers, by tracking their interactions across different websites, devices and offline channels. Its aim is to address the challenge posed to marketers by disconnected customer-profile content.
Delivering this kind of service requires Harte Hanks to be able to aggregate a stack of structured and unstructured data on behalf of its clients and serve it up to them in ways that give them insights that can be quickly put into action - in the form of a new email marketing campaign targeting a particular customer group, for example, or a new online advertising strategy.
It also requires the company to deploy some serious big data technologies, which is how it came to implement MapR’s distribution of the Hadoop big data framework and Splice Machine’s relational database management system (RDBMS), which sits on top of Hadoop deployments and makes it possible for users to pose SQL-based queries against the data they contain.
This combination of technologies, however, wasn’t arrived at without a few detours along the way.
Last year, Splice Machine proudly announced that it was working with Harte Hanks and published a case study outlining how its technology would sit on top of the company’s Hadoop infrastructure.
But at that time, Splice Machine’s database was sitting on top of Cloudera’s distribution of Hadoop, a deployment for which Harte Hanks scooped a business technology innovation award from Ventana Research.
This deployment, Splice Machine claimed at the time, had enabled Harte Hanks to replace the Oracle RAC databases previously powering its campaign management solution...
...which were just too expensive even for the existing data volumes, let alone future growth. Harte Hanks evaluated whether to continue scaling up to larger and more proprietary servers or to seek solutions that can affordably scale out on commodity hardware.
That’s clearly why it settled on Hadoop, an open source framework that can run across huge clusters of low-cost commodity servers.
But by April this year, the situation had changed somewhat, with Harte Hanks signing a deal with Cloudera competitor, MapR. So while Splice Machine remains a key element of Harte Hanks’ big data engine, the underlying Hadoop framework has changed.
What happened?
What happened, I asked Donna Belanger, head of partner tools at Harte Hanks?
The swap was made for two reasons, she says. First, Harte Hanks felt more comfortable working with MapR than with market-leader Cloudera - or as Belanger puts it:
We found MapR to be more responsive to our needs as a customer.
Second - and what really swung it for MapR - was its proven support for multitenancy deployments. In other words, what Harte Hanks needed was a Hadoop infrastructure that could separate data belonging to different clients into their own separate, secure ‘pools’, which could individually scale up and down, according to the needs of that client. Says Belanger:
At the time that we made the decision to partner with MapR rather than Cloudera, the latter was less able to demonstrate enterprise-level multi-tenancy in a production-ready environment, while MapR provided us with the confidence in their capability of this functionality.
Plus, she adds, MapR also demonstrated data-mirroring capabilities that would give Harte Hanks better disaster recovery options. So, in summary:
While Cloudera is a solid Hadoop distribution product for Harte Hanks’ purposes, MapR gave us both the flexibility of multi-tenancy and the responsive support we were seeking.
In a statement from MapR, Harte Hanks’ head of technology and development Sean Iannuzzi elaborates on this multi-tenancy issue. He says:
Other distributions of Hadoop would be more expensive, because they would require individual clusters per client. MapR allowed us to leverage the same investment in infrastructure for multiple clients without having to create multiple clusters.
This argument may be deemed contentious - especially by MapR’s larger rivals, Cloudera and Hortonworks, who both make their own claims around support for multitenancy. But either way, MapR is now in place at Harte Hanks, enabling its clients to store, integrate and analyse massive quantities of data. Says Belanger:
What we’re trying to do is look ahead. Our clients’ business problems are changing and the way they use our solution is changing, too. It requires us to be able to house bigger and broader sets of data for decision-making.
Much of that now comes from social and other digital sources. As we bring those streams in, data volumes multiply very quickly and the ability to scale that is something we need to look very closely at, so we can do it in a cost-justified and methodical way.