Back then, analysts were giving products such as Informix, Ingres, Sybase and Rdb as much chance of becoming the clear leader of the then-nascent sector as SQL Server, DB2, and the ultimate winner Oracle, which today commands around 50% of the market (according to Gartner), with RDBMS revenues greater than all the others in the top five combined.
It is those kinds of numbers and the chance to mold the next Oracle that brought Gary Bloom (himself a 14-year veteran of the Redwood Shores, CA software giant) back into the database fray in 2012 after a dozen years in storage, security and smart meter management software. And even at this early stage of the NoSQL market, there are some signs his bet may not have been misjudged.
“It has been at least 25 years since there was a transition of database technology, and I looked at [the current market dynamics] and concluded that if Oracle became this dominant force by focusing on managing about 20% of the data [the fifth that’s structured], what could I do with the next generation of database technology that deals with the other 80% of the world’s data that is highly unstructured.”
Coming from a purely enterprise software background, his view on the NoSQL field was mixed: while recognizing it as the next generation of database technology, none of the players took the fundamentals of enterprise software seriously — except one. “At the time, MarkLogic was the only company paying attention to enterprise-class issues: the notion that says you have to have security, high availability and transactional consistency, and all these things that real data centers demand if they are going to run your technology. Just because you’re thinking of changing database technologies (and there are lots of advantages to doing so) doesn't mean you throw away all of the requirements for things like data protection and proper back ups that recover after a disk failure. Those are table stakes qualities,” says Bloom, who became CEO of the company two years ago.
That differentiation aside, it didn’t escape his attention either that MarkLogic was already the biggest player in a young, crowded market.
According to market analysts at Wikibon , MarkLogic led the $542 million market for NoSQL and Hadoop software and services in 2012 with a 13% share, ahead of Cloudera’s 10%, IBM with 9% (mostly from services) and MongoDB with 7%. Wikibon reckons the market doubled last year and will surge by another 70% this year to reach $1.7 billion, while rising at a 45% CAGR though to 2017, with NoSQL the slightly larger of the two segments.
Unlike many of its rivals, MarkLogic’s pole position draws on a historical base that dates back to 2001. Its core product fuses together database, search-style indexing and application server operations, using XML documents as its data model. That naturally meant early conquests were in document-centric industries like content publishing. As Bloom points out, its technology offered the capability to manage documents based on their actual content by indexing their words and values, rather than just managing at the document level. That gave it a competitive edge over traditional document management tools like OpenText and Documentum. Over the last decade it has consolidated that position, and in content publishing its product is now something of a standard.
Dow Jones, for example, is in the process of moving one of its major revenue streams, the Factiva financial information resource, from an Oracle-based system and an Autonomy document management system to MarkLogic. “They are doing a complete standardization on MarkLogic across all of their digital properties,” claims Bloom.
The other mainstream area where MarkLogic has seen success over several years is in anti-terrorism. “That is a massively heterogeneous data use case. The US federal government has information sharing agreements with numerous other agencies in the US and elsewhere. But the US teams receiving the data don’t control the formats so they need to ensure they have a very flexible technology if they are going to be able to work with those. There are whole global anti-terrorism programs that are completely dependent on a MarkLogic database today,” he maintains.
Those were the early adopters. But the number of companies who have realized they have a content management problem has swollen. “From auto manufacturers with service guides going back 30 years to public authorities managing real estate licenses, pretty much everyone has a content management problem,” he says.
But in recent times the company has been aggressively pushing outside of that content-centric world: “One thing that has dramatically changed in MarkLogic’s business is the realization that not only is NoSQL technology great for all that unstructured data, it is also extremely powerful for heterogeneous data.”
From banks to healthcare
Two examples of that currently dominate his thinking. One involves a large international bank based in London which has come under regulatory pressure to bring its trading data together from 20 different systems so it can be analyzed in a ‘trade store.’ Those transactional systems — built in Sybase, Oracle, mainframe databases and others — were designed by the bank to operate independently, ironically for regulatory reasons.
“The financial authority is now saying, ‘We’re going to regulate you as a single entity.’ That kind of heterogonous data problem is very difficult to solve with relational technology; you’d have to build a data model that describes all the different systems, all their different versions, and create a layer which transforms everything into a standard format. The issue is that whenever one of those source databases changes — adds a column, changes a table structure — you’d have to rewrite all those layers above it.” That is not necessary in a NoSQL environment, he says.
“So it has turned out that not only are we really good for this unstructured data — rich media, video, and so on — we are also very good for traditional data too.”
Another large-scale project where unstructured and structured data come together in huge quantities is HealthCare.gov, the ‘ObamaCare’ US Federal market for affordable health insurance, which uses MarkLogic as its underlying database system.
While Bloom acknowledges there were serious problems at launch, the system was reconfigured last November and has been stable ever since, with high uptime and good response times. “We have all this unstructured policy data coming in from insurance companies and agencies across the US, none of which has a standard format. If they change their format and you were in a relational mode you’d have to change all your table structure to deal with those new formats that you don't control.”
But MarkLogic also handles the transactional side of buying insurance, registering policies and passing them on to the insurance companies who manage the customer relationship. “We have now signed up several million US citizens to healthcare polices through the Healthcare.gov system, and the database is the hub for all the related IRS, tax data, immigration data, credit data.” The system is handling workloads of thousands of concurrent sessions, says: on a normal day 35,000, although peak periods have seen 80-85,000 users online.
Relational staying power
Bloom’s not saying companies should be thinking of abandoning their relational products — far from it. “It is just that there are some modern problems that relational is not so hot at. If you’re running your straight SAP general ledger on top of an Oracle dataset, I won’t recommend you use NoSQL for that. If you have purely unstructured then a NoSQL database is very good for that. If you are a cross between the two it depends on the application, but there is a huge class of applications that NoSQL is better for.”
While relational rather than other NoSQL companies are MarkLogic’s primary competition today, Bloom says the NoSQL vendors — especially the majority whose products are based on open source NoSQL code — are actually the source if its best leads.
Open lead generation
That’s because MarkLogic is focused on serving corporate customers’ needs, he highlights. “How well do the open source NoSQL products service them? Some of them have backup, but that is about it. And they are going in with a story that there’s no security, there’s no high availability, there is no transactional consistency. What the open source guys have been for us is by far the best lead generation engine one could ever hope for.”
“Open source companies like Cloudera and MongoDB have spend hundreds of millions of dollars evangalizing the market. Most of that money went on persuading people they need another database, and we’re a huge beneficiary of that,” he says.
“They have helped people get familiar with NoSQL technology. They’re products are easy to download, run really fast, and allow you to build an application quickly. It is a great experience for developers who can solve a problem in a couple of hours that they may have been working on for six weeks with relational technology. But when they want to run that application in production, they find they don’t have [the enterprise fundamentals] expected in the Oracle, SQL Server, DB2 and Sybase world.”
As Bloom emphasizes, we are just at the beginning of a market-reshaping trend. Companies are finding more and more challenges that are not well served by relational technologies. As someone who was once tipped as a possible successor to Oracle’s Larry Ellison, that profile as potential of giant slayer is very much to Bloom’s liking. “We are already the biggest in the market and the fact that the incumbents are so dismissive of what we are doing, well I love it. I’ve had a great career — a great time at Oracle, a really interesting time at Veritas — but this is certainly where I’ve had most fun.”