How to put data storage on autopilot - use software abstraction layers and ML augmented data analytics

Profile picture for user kmarko By Kurt Marko January 16, 2019
Managing storage is no-one's favorite IT task but a fresh set of newcomers are automating the core processes using ML as augmented analytics. The cost and efficiency benefits are clear.

© Oleksiy Mark -
The IT zeitgeist is defined by two overarching themes: intelligence, whether artificial, superficial or just feigned, and automation. Many of the articles on diginomica either implicitly or explicitly touch on these topics. The interest of business leaders like many diginomica readers is in how such technologies can be exploited to increase sales, build new products, improve operations and reduce overhead.

In the world of consumer technology, the relevance is product differentiation through better, more relevant user experiences and new features. However, as I recently discussed, IT is also benefiting from advanced data analytics via an emerging discipline colloquially known as AIOps, where the combination of massive streams of system and application telemetry and machine learning is being applied to solve a growing list of operational problems.

AI systems and programmatic automation have a long history within the field, applying machine learning to reveal subtle connections and correlations between events and predictive analytics that in turn proactively identify and solve problems before something breaks. Although still clouded by ambiguity and exaggeration, AIOps has emerged from the marketing hype to become an increasingly important way of improving IT operational efficiency and infrastructure performance.

Although that column focused on applications in network and application performance management, the application of data analytics and automation is changing other areas of the data center, notably storage systems. Storage is particularly ripe for such improvements given continued explosive growth in data stored and generated by IT systems, intelligent devices, applications and customers.

Storage management and configuration is ripe for automation

The systems tasked with holding the geometrically expanding datasets critical to machine learning and other enterprise applications are themselves ripe for data-driven disruption.

Storage systems are famously tricky to configure, optimize and maintain. That problem explains why most enterprise storage systems have incorporated some form of quantitative automation into their management software. Like any trend, the market is stuffed with hyperbole and debatable claims. But I was reminded of the opportunity for significant improvements during a recent conversation with Guy Churchward, CEO of Datera and Marc Fleischmann its co-founder and President.

Datera emerged from stealth in 2016 with $40 million in VC funding to develop storage software that runs on commodity servers and is designed for scalable private or hosted cloud deployments.

Its Data Services Platform works with any underlying storage technology, disk, SATA or NVMe flash, or newer forms of persistent memory like Intel Optane, to provide logical storage devices suitable for a wide variety of price-performance requirements.

Fleischmann co-founded the company in 2013 to develop an automated storage system that can dynamically and bidirectionally scale in response to workload and capacity demand while delivering performance as defined by broad policies, not low-level volume or system parameters. Churchward, a storage industry veteran and former EMC executive, became Datera’s CEO in December to shepard its growth both organically and through industry partnerships.

There are many software defined storage products. Datera’s differentiator comes from using policy-based automation based on machine learning. It is designed to maximize usage of the various storage resources while meeting the throughput, IOPS and cost requirements of a diverse set of workloads.

Its system automates manual, error-prone storage administrative tasks, while automatically shuffling data between systems using different storage technologies to meet predefined performance and reliability policies even on multi-tenant systems. Administrators can change policies after deployment and the system will rebalance storage volumes without disruption. Like some other storage products, Datera also collects and anonymizes system telemetry to feed its machine learning models and improve the prediction of system bottlenecks and operational problems.

Fleischmann was inspired to develop a self-tuning, policy- or intent-based storage system by work he did more than two decades ago at HP on its then-revolutionary AutoRAID product. Unknown to both of us until recently, Fleischmann and I have a shared history on that product as I was one of the first internal beta testers of the AutoRAID, using it in an IT test lab I co-built and operated. Indeed, the research paper that introduced the AutoRAID hierarchical storage system sews the seeds of future software-based storage automation (emphasis added).

AutoRAID, automatically and transparently manages migration of data blocks between these two levels [RAID 1 mirroring and RAID 5 striping] as access patterns change. The result is a fully redundant storage system that is extremely easy to use, is suitable for a wide variety of workloads, is largely insensitive to dynamic workload changes, and performs much better than disk arrays with comparable numbers of spindles and much larger amounts of front-end RAM cache. Because the implementation of the HP AutoRAID technology is almost entirely in software, the additional hardware cost for these benefits is very small.

AutoRAID hardware: top, power supplies, left, disks, right, controllers.

As a user of the technology, AutoRAID was truly a revolutionary system that, while simple by today’s standards, inspired Fleischmann to develop the Datera scale-out storage fabric and policy-based storage automation.

Although Datera can provide both block volumes and S3-compatible object storage while using any underlying storage technology, it primarily competes with high-end all-flash arrays such as the Dell EMC VxFlex (formerly ScaleIO). Ironically given its technological heritage, Datera has just announced a partnership with HP Enterprise that will resell Datera as part of its HPE Complete Program.

According to Marty Lans, a storage GM at HP Enterprise that works with Datera, the partnership resulted from demands by customers and HPE sales reps for a product that could compete with VxFlex and compliment HPE's existing storage portfolio. It doesn't hurt that Datera runs on standard servers such as the HPE ProLiant line. Indeed, part of the partnership agreement includes four SKUs of ProLiant Gen10 systems paired with Datera software and storage devices solde as an integrated product by HPE.


Predictive analytics spreads to other storage vendors

The irony surrounding the HPE-Datera deal is the fact that a previous HPE acquisition, Nimble Storage, was a pioneer in using predictive analytics to reduce management overhead. Now that Nimble is part of HPE, the company has added similar capabilities to its 3PAR product line via the InfoSight management software.

Tintri, now part of DataDirect Networks, also incorporates predictive analytics into its products and has developed a SaaS management product that uses Apache Spark and Elasticsearch as part of the machine learning backend. Tintri Analytics addresses three areas of storage administration:

  • Performance and resource monitoring including storage metrics for particular applications
  • Resource and capacity planning to project the need for storage capacity, throughput, CPU and memory out up to 18 months
  • System redesign and experimentation via scenario modeling to predict the performance and resource implications of various application deployments and load profiles

Indeed, using machine learning to improve management software is rapidly becoming table stakes in the storage industry, with Dell EMC (CloudIQ) Hitachi Vantana and IBM (Storage Insights) among those exploiting data analytics.

My take

Storage system automation and optimization is a natural application of data analytics and products like Datera show the evolutionary path from simple cases like automatically setting configuration parameters to more complicated storage decisions. As I wrote in my column on AIOps,

The glut of data finding its way into every corner of the enterprise is only useful when it is analyzed, summarized, intelligently extrapolated and ultimately, acted upon. Although the impetus for incorporating data analytics and various ML and AI techniques into organizations has rightly been to improve business results, we’re entering the second phase of usage in which the same approaches are being used to enhance internal operations. As one of the primary sources of enterprise data, IT shouldn’t let it go waste.

The marriage of storage virtualization, which creates the necessary software abstraction layer, granular data measurement and sophisticated predictive analytic models not only can take the drudgery out of storage administration, but significantly improve resource utilization, thus lowering costs, and improve application performance.

Such capabilities should be on the shortlist of any organization evaluating products for a storage infrastructure upgrade or expansion.