Humans landed on the Moon over half a century ago, despite the limited data processing capabilities of the time. NASA’s scientists and engineers made critical calculations using slide rules and scraps of paper. Astronauts like Neil Armstrong, and Buzz Aldrin in particular, were experts in maths and orbital physics as much as pilots and adventurers. They had huge support teams, but the computers they used were still crude and prone to error.
Today, Earthbound data scientists have access to vast computing and information resources, analytics, and artificial intelligence (AI), yet still the capabilities these technologies promise are not filtering deeper into organisations and helping them to make better, faster decisions. But why?
Dean Stoecker is CEO of data science and self-service data analytics provider Alteryx. He believes that one of the reasons is that technologists have focused too many of their solutions on the ‘astronauts’ – the scientists and PhD-qualified analysts and statisticians – who are out there space-walking. Meanwhile, the ‘passengers’, the bulk of people who are onboard for the mission, are being left out, he says:
Sometimes we talk about the audience for data science and analytics being made up of astronauts, pilots, and passengers. But it’s often the passengers who need to make the decisions every day. Historically, we've had data science in most business sectors, but it has been limited to PhD-trained statisticians. We're woefully short around the world in having enough trained statisticians to solve the world's problems.
That includes today's problem number 1 - Coronavirus:
For example, you would think that with all of the expertise in the world that we would have been able to predict the curves associated with the Coronavirus by city, by calendar, by population, but we're still in the early stages of that kind of data science. Alteryx is busy helping healthcare organizations here in the States figure out who needs masks and ventilators. You keep people from panicking in grocery stores by optimising supply chains and making sure the stores have the right things and that people know about them. It's a good call for people to amp up their data science skills so that we can react more efficiently next time something like this happens, because there will be a next time.”
That begs the question of how close we are to creating truly data-led organisations, and how do we reach that critical point? Stoecker says:
Around the world we've seen renewed interest in amplifying the skills of ordinary data workers to emulate the skills that trained statisticians have. There are 54 million data analysts around the world, most of whom don't like their jobs much, because they're just not productive or efficient. Our own platform is designed to liberate thinking and reward people for thinking, and makes it easy to work in a drag-and-drop, click-and-run environment to solve complex problems. I think the reason why we've been growing so fast is we created an environment where ordinary people can solve complex data science challenges, without having to be a trained statistician.
But data scientists and analysts are not happy and productive at the moment, posing a challenge for organizations, he suggests:
Trained statisticians aren't happy because a lot of the models they build never get deployed. Meanwhile, the ordinary data worker has been locked out of powerful platforms that allow them to move up the analytic continuum from basic descriptive analytics that would end up in a dashboard, for example, to prescriptive modelling, spatial modelling, and cognitive services.
The world's supposedly going to get taken over by machine learning, but it will never happen unless we amplify the skill sets of 54 million analysts, and allow scientists to actually work on edge cases. We still have autonomous vehicles crashing. We need data scientists to be working on all of these edge cases, like the Coronavirus, like autonomous vehicles, like all the other things that we're challenged by in this world.
But at the same time, businesses need to run more efficiently. They need better insights to leverage the data they hold, both behind the firewall and out in the cloud. And in order for that to happen, you need a platform that is code free for the non-coder, who maybe had a statistics class but wasn't trained to build algorithms, and a code-friendly platform for the PhDs that makes it easy for them do their work, to deploy algorithms as a service.”
You can’t handle the truth
In 2018, the World Economic Forum produced a report looking at the employment impacts of AI, machine learning, robotics, and other Industry 4.0 technologies. That report suggested that the job market would shift strongly towards data science and analyst jobs. Stoecker can see this beginning to happen:
There are lots of technologies that are getting people to think and use data differently. We see how people become very social in their use of analytics. A few years ago, no one would ever have thought that analytics would be social, but people want to learn from each other, they want to amplify their skills, they want to collaborate. It's that old adage that power is not in what you know, it’s in what you share. And we're seeing that in the data science world. And I think, historically, what's kept this from happening has been the lack of easy-to-use software.
A few years ago, we would talk to C-level executives about data science and machine learning, and they would always point you ‘down into the hole’, to the PhDs using last-generation tools like SAS or SPSS. But I think that ‘selling out of the human’ is beginning to dissipate.I think people are beginning to realise that everyday data workers, if they're given a platform that allows them to solve any challenge against any data – big, little, structured, unstructured – without writing a stitch of code, could liberate thinking.”
It's all about creating the right user experience, he suggests. Computer scientist and theorist JCR Licklider said in his seminal paper, Man-Computer Partnership (1965), that if we can reduce the friction between the human and the computer, people will get to the thinking stage sooner.
But how big a challenge in this data-centric world the lack of standardisation in many data sets - and the lack of interoperability - is remains to be seen. If an analyst spends 80% of his or her time just organising the data and only 20% actually looking at it, how big a problem is that? Stoecker reckons:
If you look over the last 20 years or more in the tech world, from the vendor side most of the attempts of innovation have been around creating the enterprise data warehouse, the single version of the truth. Now it’s a great notion, but it’s not gonna happen. It didn't happen then, it's not happening now, and it will never happen in the future.
The enterprise data warehouse is dead because most of the disparate data you need for complex challenges is never going to be standardised. Take a use case like hyper-local merchandising in retail. With coronavirus, retailers are scared because they’re trying to figure out how to keep the shelves restocked – the right way to create value not just for consumers, but also for the business itself. But to actually do hyperlocal merchandising, you need six or seven disparate databases and they're not all in the data warehouse. They're in SQL stores, Hadoop data lakes, EPOS systems, and price books in Excel. It's all over the place, and we are never going to get to a stage where we have a single version of the truth in the dataset.
Banks in Canary Wharf are spending tens of millions of dollars trying to build Hadoop data lakes, but they’ve forgotten that all the data is going to change again tomorrow. They’re going to create new data sets from a cloud service, a data aggregator, or a social media pipe. None of that data is going to end up in the data warehouse. What we really should be focusing on is letting the data live where it lives – whether it’s structured, unstructured, or semi-structured. And let’s put the power back in the hands of the mere mortals in the line of business departments. People who can actually grab all that data, integrate it, standardise it, enhance it, build models from it, and create value.
This is where Alteryx’s tech comes in. Stoecker explains:
We are a platform, not a point solution. Just like JCR Licklider said, if you take out all the friction between the human and the machine, all bets are off and any question is up for grabs. My intention has always been, if you want to get to machine learning and artificial intelligence, you have to amplify the human first. We have to address the human skills. If you go back to Alan Turing’s day, Enigma, and so on, those problems would never have been solved without human beings programming those machines.
I think we're now focused on the right aspects of data science now, and that's the human element. And the hardest part in all of this – in data science, in machine learning – is knowing what questions to ask or what tasks you want to automate. If we free up people's time, the drudgery of getting data ready to process, we allow them to wonder about all of these other questions.
The disaster of digital transformation
So with all that taken on board, what is limiting our ability to solve society's biggest problems? Why aren’t the right tools more accessible to more people? Stoecker can point to some possible culprits:
I blame the tech community more than anything. I think all humans want to be able to use their minds to ask big questions and get big answers. But most of the technology capabilities over the last 20 years have been built for the scientist, as I said.
We see this in the digital transformation efforts that are occurring around the globe. Version 1.0 of digital transformation was a complete disaster, a complete failure by anyone's standards. The last report I read said something like, only three percent of digital transformation efforts since 2010 have seen any sustainable levels of success. And that's because we forced these tools, these efforts, onto the scientists, not the ordinary users.
If that was Version 1.0, what lies in wait for the next iteration? Stoecker suggests:
I think what we're seeing now in digital transformation 2.0 is that executives are beginning to realise that if it's not just the scientist that's going to solve challenges, it's going to be the data worker who just needs easier-to-use tools. Digital transformation 2.0 is all about collaboration and up-skilling. It’s not like people don't have the raw intelligence, they've just been absent the tools.
General Electric spent billions of dollars investing in a data science platforms built for scientists. We've seen the challenges that IBM Watson has had, and that Salesforce has had with Einstein. These tools were built for the smallest audience of all – the trained statisticians. But if you want to solve problems, you have to broaden the aperture of who can use these capabilities. It's liberating because it gives the data worker more options to build their careers. I've seen data workers move from office to finance to supply chain, and from supply chain into FPGAs and into marketing operations. That never used to happen, because people were stuck in a rut, hating their jobs.