How data sprawl chokes innovation — and what to do now

Profile picture for user Erik Duffield By Erik Duffield November 19, 2021 Audio mode
Summary:
Decades of data sprawl is choking innovation. Erik Duffield of Hakkoda explores five strategies to help data flow faster and remove friction, for Tercera.

Data engineering construction innovation concept Snowflake © wan wei - Shutterstock
(© wan wei - Shutterstock)

Companies are in an existential race to leverage data inside and outside their business, and every business leader knows it.

That’s why every leader, fully aware of how vital data is to success, is taking data initiatives into their own hands.

The result is an uncontrolled epidemic of data sprawl: the unplanned result of layer after layer of individual initiatives in which data is copied, organized and modeled in various applications creating duplication or worse, inconsistent logic.

It’s the natural — but costly — byproduct of organizations chasing opportunity.

To understand the state (and cost) of data sprawl, Hakkoda commissioned Dimensional Research to survey 300+ business and IT leaders responsible for data and analytics initiatives at mid-to-large size companies. The results were eye-opening, to say the least.

We knew data sprawl was an issue in the enterprise, but respondents confirmed just how expansive — and costly — the problem is. Consider a few of these stats:

  • One in four data leaders have more than 10 BI and reporting apps.
  • 35% use five or more data warehouses.
  • 25% had over 500 data analysts in the organization.
  • Nearly a quarter rely on 10 or more service providers.

When asked where costs were highest, the top answer was BI and reporting (54%). This isn’t surprising given the number of apps and the sheer number of analysts using those apps. A quarter of respondents had more than 500 data analysts in the organization. If each data analyst earns between $80,000 and $100,000 a year then those organizations are spending as much as $50 million dollars per year just trying to keep up with data sprawl.

However, the real cost of data sprawl is not as easy to calculate but even more impactful to a company’s future success. The vast majority of respondents (94%) reported barriers to innovation in their data programs.

Decades of data sprawl is choking innovation.

When our researchers asked what’s making it hard to innovate, the number one barrier was a lack of internal expertise. 97% of respondents reported problems finding talent. The hardest role to hire? Data scientists skilled in Machine-Learning.

Unfortunately this talent shortage will only get worse. Spending on big data and business analytics (BDA) is forecast to exceed $215 billion in 2021, an increase of more than 10% from the prior year. Forrester predicts by 2021 over 60% of B2B sellers will be enabled by AI and automation.

Even when leaders can spot innovation areas, they have no way to pursue them. That’s no way to do business. The bottom line is you can’t innovate with data you can’t analyze, and you can’t do data analysis with engineers you can’t hire. So what should leaders do in this situation? The first thing is to change their mindset from hiring new talent to doing more with the talent they already have.

Here are five things you can do right now to get back to innovation:

Remove friction via automation

The most obvious way to cope with a talent shortage is to push machines to do more of the work. The right approach can help you automate processes for data ingestion, transformation and validation to increase the speed and accuracy of data moving to your analytics. When data flows faster to the right places, the effects are felt quickly. Insights happen faster and innovation thrives — without adding headcount.

Empower your data scientists

Help your existing data experts work smarter, not harder, by giving them better tools and workflows that will help them perform complex transformations on larger datasets. This will not only improve productivity and throughput, it also should improve retention and motivation. The best data scientists want to do interesting science, not dull prep.

Here are three things you can — and should — do now as you begin to scale your teams and data science capabilities:

  1. Automate data pipelines;
  2. Implement MLOps processes and tools which include continuous deployment and continuous training; and
  3. Move to a feature store to centralize feature management and access.

Cloud-based data platforms such as Snowflake, and emerging data science tools and environments have opened up so many possibilities for innovating with data. If you’re not investing in these next gen platforms and tools, it’s going to get harder to keep up with the competition.

Create Innovation Pods to push capabilities beyond existing constraints

Innovation can’t happen when internal teams are bogged down on daily ‘run’ operations. Consider creating Innovation Pods, with the help of outside partners. Combining internal depth with external skills, techniques and resources not only gives you a multi-discipline team to focus on the big, hairy challenges, it can also provide flexibility to adapt skills and focus and needs change.

Modernize with rich data applications

Nearly every company is in the process of modernizing dozens or even hundreds of internal and external legacy applications. Don’t treat it like a chore — make sure you leverage it to drive innovation. For example, if you leverage the right rich data services you can feed AI-driven insights into these applications. You can leverage no-code application platforms like Unqork that are built to serve the data.

Get the help you need to get your data house in order

Going it alone is hard. Working with an outside data engineering partner can provide the guidance, support and solutions you need to get your data house in order. The other big advantage of using partners? As 'insider-outsiders' they’re free to challenge orthodoxies and help you drive the organization to exciting new places — no matter where you are in your data journey.