Health researchers need better data - Talend joins the fight against Coronavirus with an open source ETL tool for COVID-19 data
- Summary:
- My first talk with new Talend CEO Christal Bemont hit on the surge in their cloud growth and their role in business transformation. But we kept coming back to the imperative of data quality - and that's exactly how Talend is pushing into the fight against COVID-19.
I don't seek out many vendors for content. Talend is one exception; they provoke the right data conversations. As in my 2018 piece: IT must get unstuck from a legacy cycle to turn data into a business asset.
In 2019, we pushed into multi-cloud and serverless hype in Cloud data migration is taking off, but why hasn't multi-cloud lived up to its billing?
But change has a way of being relentless, does it not? In January 2020, Talend named long-time SAP Concur exec Christal Bemont as its new CEO.
And I'm willing to bet: this year is a tad different than what she signed up for. This week, I got Bemont on the phone to find out how Talend and its customers are faring. My timing was good: they are joining the COVID-19 fight with a public data project.
Joining a cloud transformation in progress
Rewind for a second: what prompted Bemont to take on this role? Former Talend CEO (and current board member) Mike Tuchen brought Talend a long way in seven years. Why the change? Short answer: Tuchen was ready to hand the baton, and Talend's cloud pursuits loom large. As Bemont told me:
I spent many months with Mike and the board. That was a great opportunity for me to really get under the covers and look at the area I really specialize in, which is scaling and building companies around cloud. When I joined Concur, we were an on-prem company. Talend was going through the same process of on-prem to cloud.
Talend's cloud revenues are surging. At the time of my last chat with Tuchen, it was at 14 percent. Bemont told me that it's been growing 100 percent year over year, with 179 percent growth in cloud revenue last year. "I think it just became a snowball effect," says Bemont.
But hold up - vendors can get into trouble here, patting themselves on the back about cloud revenue, without following the cloud business model into deeper change. That's exactly what Bemont wants to drive at Talend:
Cloud go-to-market is not just about one thing. It's not just about a delivery mechanism for a product. It's not just about how you line up your salespeople. It's about orchestrating an entire business front to back.
Yep - it's about supporting your own internal transformation, and the one your customers embarked on. The pursuit of a new data architecture without a business imperative is a non-starter. Of course, thanks to COVID-19, the circumstances under which this plays out is dramatically different. But one thing hasn't changed: you start by listening to your customers. The questions Bemont asked when she took on the job are still amongst the right ones:
What are the attributes of the business we need to pivot on to really start thinking about: why are people buying? What's important to them? What are the things in the ecosystem they're thinking about? Then we need to make sure that we're lining up around something that pivots from a tool that is in the hands of a developer, to a strategic business partner - one that's in the hands of an organization that has a reason to want to change their outcomes for a number of different use cases.
Talend joins the fight with a free ETL tool for COVID-19 datasets
We can't put the brakes on change, but we can - and should - put the brakes on the happy talk of business growth. Now is the time to help, not to sell. So how is Talend helping? Start with this: Talend joins the fight against COVID-19: unlocking the best data for health researchers. A joint team from Talend, Bytecode and developers from the Singer open source community built an ETL tool for COVID-19 datasets. As Talend's Thomas Bennett explains:
We standardize the data, augment it with metadata, then route the results to a data warehouse or data lake: Amazon Redshift, Amazon S3, Snowflake, Microsoft Azure Synapse Analytics, Delta Lake for Databricks, or Google BigQuery. Data engineers and scientists can run the tool on their own infrastructure or use Stitch for free.
Why the project? Because the major COVID-19 data repositories, from Johns Hopkins to the New York Times, lack a standard data format. This is a classic Talend obsession: even minor variations in data structure can pollute data sets, or distract data scientists from important missions. Bennett:
The data stored in these repositories lacks a common format. For instance, the EU Data comprises data from different countries, and the header names for the same type of data differ. Even slight changes like these require data professionals to take extra time and steps to cleanse and standardize data. Having these datasets processed through our ETL gives users guaranteed consistency for this data so they can focus more on their models or visualizations and make faster and more confident decisions.
This project traces back to Talend employees who want to impact what we are facing down. As Bemont told me:
We had some folks on our team who found out that Johns Hopkins and NHS and a few other organizations were collecting all these different sets of data. They were trying to make it meaningful and usable for researchers to try to solve for coming up with vaccines and so forth. So when we were thinking about what we can do for our employees, we started this COVID-19 response.
Internal teams rallied:
We said, "What could we do to leverage the services we offer?" People came up with the idea of "Let's go take Talend, and let's go do some of the work on behalf of these datasets that are being produced." So hopefully, researchers can get to an answer faster.
The wrap - employees want to make a difference in the COVID-19 fight
I talked to Bemont about the multi-cloud issue, and Talend's cloud-agnostic position. We covered the potential of "AI" in data cleansing - something Talend is heavily investing in right now; I expect them to have more to say about that later this year.
But Bemont kept bringing the conversation back to data quality. That practical bent makes sense - after all, AI gets exposed when data sets are flawed (Neil Raden just wrote another piece which hits on that).
COVID-19 is pushing us all, and there's no looking back. Bemont says Talend customers like Domino's are:
Tying to look at things in a different way - how do they instrument their business, because they have a whole different set of circumstances. It's not about looking back. It's about trying to figure out, based on all the new variables, how to look ahead.
Suddenly, in a sense, Domino's is on the front lines. But if they treat this only as a crisis or a stress test, they miss a chance for deeper change. Bemont:
If you look at the example of Domino's, they were already moving from a brick and mortar to an e-commerce driven business. But the most recent thing is: contactless payments. How do we deliver pizza without having contact with our customers?
COVID-19 is forcing people to not only look at their business differently, but reshape it and reform it. We're right in the middle of that, making sure that those new data elements are there. You've talked about how Talend is the data pipes. But it's that second part you talked about in that last interview with Mike - this is where data quality and strong trust that you have all the data you need comes in. You need data you can use to get to the right, predictable outcomes for the future.
Times are too uncertain to paint a rosy view of tech pulling us through to a better place. But it sounds like Bemont and her Talend team have found their mission:
It's a really strong role that we can play, at a moment in time that is so critical to help empower these businesses to see around the corner... When we were looking at how we can help our employees, this emotional and mental piece of it - people are looking for a way to contribute; people are looking for a way to play a role and do their part.