Data management with a crowdsourced human touch

Profile picture for user pwainewright By Phil Wainewright October 31, 2013

People silhouettes
Computer aided design software company Autodesk has to manage millions of customer records. Even when working with such massive volumes, data quality remains a priority.

Dissatisfied with the results from traditional data service providers and offshore professional services, Autodesk has turned to crowdsourcing provider CrowdFlower to enhance its data. The results have demonstrated the credibility of crowdsourcing this type of micro-task, says Patrick Booher, director of enterprise data management:

"We're seeing higher results with the crowd in terms of accuracy than we saw with offshoring ... I see this disrupting offshoring business process outsourcing for less complex tasks.

"The second disruption is with traditional data service providers. They need to catch up — they're already losing market share to this. They're leaving money on the table."

Booher is responsible for ensuring the quality of Autodesk's business data. That means checking and adding to the information it holds for each organization in the database, not all of which can be acquired when a prospect fills out a registration form. For example, linking up subsidiaries to parent companies, or allocating the customer to an industry sector, which at Autodesk influences the sales compensation paid.

autodesk crowdflower data
CrowdFlower was originally brought in to help with the task of classifying businesses by industry sector. At the time, this information, sourced from traditional data service providers, was known for around 70 percent of customers. That left almost a third where the incentives to salespeople could not be accurately determined.

Autodesk had tried improving this hit rate by using an offshore outsourcer and bringing in temporary staff on-site, but neither route had proved successful. The breakthrough came when the company hired CrowdFlower, says Booher:

"We scaled up with them quickly. They got us up to about 95 percent and we used human labor on site to get us to 100 percent."

Since then, Autodesk has put around 3 million business customer records through CrowdFlower. As well as categorization, other tasks have included data enhancement, such as infilling missing information or investigating possible duplicates surfaced by automated tools. The company is now looking beyond the data management area to other tasks such as localization and sentiment analysis.

Algorithmic accuracy

Booher has found crowdsourcing more reliable than other forms of outsourced service because of the more precise performance measurement and cross-checking built into the platform, which rewards contributors for accurate task completion.

CrowdFlower CEO Lukas Biewald crop
Lukas Biewald, CrowdFlower

CrowdFlower's CEO Lukas Biewald told me that, paradoxically, it's because the company has no structure for taking its workers on trust that it has had to build other ways of assessing output:

"That forces us to have really good procedures. We have more data scientists than any outsourcing company ...

"The reason we've been successful is we've set up incentives for people to do higher quality work — that's really crucial to making the model work."

Booher sums up:

"It's using technology to open up a much wider pool of labor and help manage, measure and improve the quality of work coming from that labor pool."

This means it can yield quite valuable information. Booher cites the example of identifying company URLs, for which data service providers couldn't deliver a result that should have been in their domain:

"You would think it's easy to get a URL for a company ... The best we could get [from specialist providers] was five percent. [So] we crowdsourced that. It cost us a lot of money. It wasn't about cost efficiency. It was about getting data that we really needed to have and were willing to pay a premium for it."

One advantage for a global company like Autodesk is the ability to easily source contributors with diverse language skills and local knowledge. "We have people in every country. They speak any language you can think of," said Biewald.

Understanding how to define tasks and set up automated processes for evaluating the accuracy of results is crucial to the success of crowdsourcing. It means learning a new skillset, says Booher:

"Your results are only as good as the jobs and training that you write. How do you break down your job into very simple tasks and how do you train the crowd to do those tasks? ...

Biewald says that many of CrowdFlower's customers have relied on the provider to help them define tasks in the most effective way to get results:

"Our model only works if it's completely clear whether someone did the task well."

The vendor found it had to intervene a lot in the early days to prevent customers running poorly designed tasks, he told me.

"When we started we had to present ourselves more like an outsourcing company. Over time we learned best practices and now after a day we're able to get people up and running and building jobs for themselves."

As a further check on job design it collects feedback from contributors after each task. "If jobs score too low we take them down," said Biewald.

Direct integration

Several customers are now starting to take more direct control, Autodesk among them, as Booher explains:

"Where we're moving to in our relationship with this specific vendor is we will license their platform, we will write our own jobs now and distribute them to CrowdFlower's labor pool.

"It puts more of the control in our hands. Looking at the platform now, it's a channel, an access point to labor and not a data service provider."

One of the attractions of directly licensing the platform is the ability to automate integration between Autodesk's data stores and the crowdsourced labor pool. So for example, the company is currently working to build a link from its master data management tool to the CrowdFlower platform:

"Every time a new customer record comes in, if it's a potential match [to an existing record] we want to just route it out to the crowd and ideally we want a one-hour service level response time, where the crowd gives us a response and it flows back into the tool."

Another benefit of working directly with the platform is that Autodesk can do much more fine tuning of the jobs it does:

"It's a very scalable labor channel. We can do many things very fast ...

"We can do lots of little test jobs and get immediate results back and tune and tweak as we go — get a much more iterative, fast cycle."

The human insight that crowdsourced labor can bring is a useful complement to automated data management tools, he added.

"It allows us to test and audit our automation. We're improving it using the human touch. We could never do that before ... I think it'll help our automation efforts a lot."

Photo credit: Silhouette figures © mario beauregard -; Lukas Biewald headshot courtesy of CrowdFlower.