I first learned about Alteryx in the summer of 2011. July 19th to be exact, in Chicago. I didn't know what Alteryx was all about, but I wanted to see my good friend, Mark Smith, founder of Ventana Research and one of the speakers at the Alteryx "lunch and learn" event.
Some Alteryx people who were there, or who I met with shortly afterward, were:
- Rick Schultz, Sr. VP Marketing
- George Matthew, President & COO
- Laura Sellars, Senior Director of Product Management
- Bob Laurent, Director of Industry Marketing
- *Brandy Baxter, Director Communications and Brand
- *Dean Stoecker, Founder, and CEO
*Baxter and Stoecker are still employees of Alteryx
One of the invited speakers was a statistician from a Chicago company. Her presentation was entirely about the "democratization" of analytics. She demonstrated the capability of Alteryx to drag and drop statistical and AI icons to build models. I was surprised by this, and asked her afterward why she thought this was a good idea. Her premise was that by letting non-statisticians or data scientists experiment, it was possible for them to invent models that the isolated statisticians hadn't considered. Still, they would have to be vetted and would not go into production into the professionals cleaned them up.
I was never able to follow up to see if this scheme panned out. In previous versions of this with Business Intelligence, my experience was that there is an initial surge of interest, followed by disillusionment when business analysts found out how difficult it is to develop credible models, especially when they don't understand the underlying theory. Typically, a handful has success, but the "pervasive" or "democratized" analytics promise doesn't materialize.
In the ensuing months and years, I had quite a bit of contact with Alteryx, but I was always puzzled by their go-to-market strategy. In 2012, I understood them to be a data science platform, but they always had a separate product line for data blending. In fact, at a Tableau conference a year later, George Matthew told me he didn't need a sales force. He had Tableau. So closely did the two organizations work together that an Alteryx data blender + Tableau was almost universal. Of course, Matthew was tongue-in-cheek, but at a Tableau conference, the analytics product was de-emphasized.
At some point, the Alteryx message was for both aspects of the product suite - and is today. Alteryx has bulked up its capabilities through acquisitions such as ClearStory Data (data analysis and discovery), Feature Labs (feature engineering), Yhat (deploying models), and Semanta (metadata management and governance). Despite my confusion some years ago, Alteryx is a "data science and analytics vendor engineered to make advanced analytics accessible to any data worker," in their words.
Chris Middleton of diginomica wrote recently about Alteryx CEO Dean Stoecker - data science and the disaster of digital transformation. In the article, Stoecker takes the position exactly opposite of mine. Don't get me wrong, I like Stoecker, and admire what he's built. He has worked tirelessly and successfully to build an outstanding company. Alteryx was released in 2006, but the predecessor company, also founded by Stoecker, SRC LLC, was started in 1997.
Where I diverge from Stoecker is when he uses terms like "ordinary workers." I've spent decades working with, I'd rather say, "talented analysts who do not apply linear algebra to their work." Sure, some are ordinary, but many are not. So between "PhD-trained statisticians" and "ordinary workers" is, as they say, a distinction without a difference.
But it is undoubtedly true that there is a yawning gap in quantitative skills and experience between data scientists and business analysts. Is that gap an insurmountable problem for the so-called digital transformation? 'What is the potential for creating data-driven or data-led organizations? Here's is Stoecker's response, which is his central premise:
There are 54 million data analysts around the world, most of whom don't like their jobs much, because they're just not productive or efficient. Our own platform is designed to liberate thinking and reward people for thinking, and makes it easy to work in a drag-and-drop, click-and-run environment to solve complex problems.
That statement about his platform is undeniably true, but it does not address the situation on the ground:
Fact #1: Eliminating code and replacing with GUI interfaces with mysterious effects discourages people at the outset.
Fact #2: It's questionable how many people in an organization are interested in analytics.
Fact #3: Long after the idea of a wider audience for analytics was suggested, daily work in organizations has become punctuated, multi-tasking, situational, collaborative. Developing a model requires uninterrupted time to conceptualize, experiment, iterate, and proof.
A little later in the article Dean uses the term "easy-to-use software:"
Everyday data workers, if they're given a platform that allows them to solve any challenge against any data - big, little, structured, unstructured - without writing a stitch of code, could liberate thinking.
I can't agree with that, based on Fact #1, Fact #2 and Fact #3.
Let me wrap up by vigorously agreeing with Stoecker when he says:
The enterprise data warehouse is dead because most of the disparate data you need for complex challenges is never going to be standardised. Take a use case like hyper-local merchandising in retail. With Coronavirus, retailers are scared because they're trying to figure out how to keep the shelves restocked - the right way to create value not just for consumers, but also for the business itself. But to actually do hyperlocal merchandising, you need six or seven disparate databases and they're not all in the data warehouse. They're in SQL stores, Hadoop data lakes, EPOS systems, and price books in Excel. It's all over the place, and we are never going to get to a stage where we have a single version of the truth in the dataset.
So here we are on the same page. Quoting myself from a recent article, Solving data integration at scale - DataOps, knowledge graphs and permissioned blockchains emerge:
There is no such thing as context-free data; data cannot manifest the kind of perfect objectivity that is sometimes imagined. At a certain level the collection and management of data may be said to presuppose interpretation. "Raw data" is not merely a practical impossibility, owing to the reality of pre-processing; instead, it is a conceptual impossibility, for data collection itself already is a form of processing. As an industry we made stumbling and inadequate progress to apply data to solving problems.
Is digital transformation dependent on 54 million analysts learning to build models of classification, categorization, regression, and stochastic models like Monte Carlo, Markov processes, and Bayes Nets? No. I believe the gap between the rare avis of data scientists and business analysts will shrink somewhat. Still, like that question I asked the statistician at the Alteryx meeting in Chicago in 2011, the roles will blend, and those without the credentials will be supervised and vetted by those who do.
The reason is that it is still too dangerous to let data speak for itself. If you need a model for the shop floor, you need to know something about the shop. And God help us if we begin to rely on Machine Learning for medicine without good bench science.