Reddit is a world-famous US-based social news aggregation, web content rating, and discussion website. Composed of a network of communities where individuals can find experiences built around their interests, hobbies, and passions, it has more than 430 million global users ("Redditors") monthly and over 4 billion monthly engagements from users, including votes, posts, and comments, it also claims to be home to the most open and authentic conversations on the Internet.
As a result of its popularity, however, it was beginning to have a problem: 15 years of content at this level of interaction is a lot. As Jon Gifford, Principal Engineer at the group puts it, while user-supplied content is as interesting, useful, insightful and funny today as it was a year ago, or five, or ten, there is simply no way to present the breadth and the depth of all of the content as a feed.
Instead, enterprise-class search tools are offered:
By using search, Redditors can discover the best content regardless of what their preferences are or what community they're visiting. Search gives us the ability to display the many different types of content that you can expect to find, and is more than just another feed that keeps you up to date. It means we can help our users answer questions or conduct research on almost any topic imaginable, from buying a new car or gadget or jacket to deciding between makeup products, from discovering the hottest K-pop band to learning about molten salt nuclear reactors. And since Reddit is an online platform, we want to ensure that we're using the best available technology to deliver our product.
But for most of its early life, Reddit search functionality was so bad it became its own meme ("Where do you see Reddit in 10 years?" "Reddit search might work by then"). Indeed, a common workaround, used by Gifford himself before he joined the team, to use Google to search for content on Reddit because it was quicker.
That was seen as a problem, as search is really important for a website:
First, good search helps our existing users fall more deeply in love with a product they already adore; I can't think of a better advertisement for a product than having everyone who uses it say good things about it. Second, as we improve our ability to deliver results that can be tailored to the type of experience that users want, we increase the number of people who want to use the platform.
Helping moderators better serve their communities
Just over a year ago, with the platform continuing to grow, management agreed search finally needed to be taken more seriously at Reddit, with a dedicated internal team set up to take charge. A key decision was then taken: to see what AI (Artificial Intelligence) could bring to the table, says Gifford. This led to it working with Lucidworks, which claims to be able to apply key AI techniques like Natural Language Processing, clustering and classification algorithms and Machine Learning to improve the search experience. A big attraction for reddit to this particular vendor was its commitment to key Open Source technologies such as Lucene and Solr.
It's a move, says Gifford, that has allowed Reddit to improve both the site's search functionality and the overall user experience. This mainly comes through what he calls a focus on relevance, which means indexing more of its data while also "taking another look" at how it's currently indexing the data it has.
We have a rich set of metadata available to us from our data science teams, who have been working on understanding more about our various communities and how our users engage with the product. Adding this data to our search systems will significantly improve the quality of our search results.
At the same time, the new tool is also addressing some more mundane but still important tasks, e.g. around spelling corrections and offering better search suggestions. There's also been some hard work put in to help support its large set of volunteer moderators with a search tool upgraded with better search functionality.
That's actually invisible to most users, but moderators have heaped praise on this improvement since it makes it significantly easier for them to serve their communities. And as we mitigate potential risks, we are designing safeguards to protect against errors and making sure we are optimizing for the right metrics.
Has this improvement been measured? Gifford says it has, with a default search results page being able to support a big increase in search volume (a five-fold increase in 12 months) at the same time as a very welcome drop in latency, with the worst performers, the so-called 99th percentile results, coming in at under 100 milliseconds instead of 300. There's also been a marked dramatic increase in uptime, he adds.
Gifford says there have also been some very positive signs from recent work in Machine Learning, with active plans to incorporate said advances into search. But how will this new improved search improve the user experience going forward? In the short-term, Gifford states, the team will work to continue to grow search volume and relevance, but it will also be looking to roll out improvements around the tool handle queries. The motivation here: allow it to do a better job of determining the intent of each and every query, plus make decisions about how to handle the query to get the best results.
This would be a significant upgrade to the system and help the brand respond more quickly to any changes it sees in user search behavior, ultimately allowing automatic adaptation.
Layered on top of that, we are also looking at how we can use ML to fine-tune our search results, though this is a long-term project, with search and ML operating in a symbiotic relationship. We'll be fleshing out our approach here in the next six months using internal prototypes for validation.