Viber migrates from MongoDB to Couchbase halves number of AWS servers

SUMMARY:

The popular voice and messaging application needed a more scalable NoSQL solution to grow over the next few years

Hot on the heels of the likes of Skype and WhatsApp, Viber is an incredibly fast-growing application that allows users to make free calls and send free messages, providing that they are hooked up to some sort of decent internet connection. Although the service only launched in 2010, Viber already has 300 million users that send billions of messages and talking minutes across the platform every month. All very impressive – which is why it isn’t surprising that the ‘start-up’ was acquired by Japanese e-commerce platform Rakuten in February this year for a tidy sum of $900 million.

Viber also shows no signs of slowing down. It is currently adding 1 million users to its service every single day and it expects that next year it will need three our four times the infrastructure capacity than it currently has today, which is forcing the company to viberthink about its database requirements in the near future. Although only a few years old, Viber is currently migrating to its third generation database architecture – where it is moving from MongoDB to a NoSQL competitor, Couchbase.

Back when Viber launched and started to see traction in 2011, it had a few thousand users and was using an in-house in-memory database (although it wasn’t specified which one). Amir Ish-Shalom, system archiect at Viber, was speaking at a Couchbase event recently in London, where he explained that the company quickly realised that it needed to find a new solution.

“The backend is in charge of sending billions of messages, sub-second latencies, millions of users. When we started Viber it was much smaller and we had our Viber clients connected to our application servers, with a simple in-house in-memory database. This was fine for the first few months, but we soon realised we needed a much more scalable solution – we needed a NoSQL solution that was easy to implement.

“At the time in 2011 there weren’t many solutions, so we chose to use MongoDB and we had to grow with them.”

A second generation architecture

So, Viber implemented MongoDB for its second generation database architecture, where it also moved all of its application servers to Amazon Web Services. It also then added a Redis cache, an open source tool that works in a similar way to in-memory databases, to solve some of the performance issues with MongoDB – Viber found it was having to use Redis to process the bigger datasets it had and eventually began moving some of the datasets out of Mongo and into Redis permanently.

viber second genThis set up lasted the company until about a year ago, but Viber realised that it needed a more scalable solution to ease some of the performance problems it was having with MongoDB. Ish-Shalom explained:

“MongoDB got us this far, which was very important – it supported us through three years of high growth and usage. We never lost any data from MongoDB, we had downtime and we always got our data back. The performance of Redis was very good, always gave us the speed we needed. 

“But this whole system was not working very well and we needed to look for a different solution. There were problems, first and foremost the performance of MongoDB – it only gave us tens of thousands of operations per second, whereas we needed hundreds of thousands, if not millions, of operations per second. It had problems with very large datasets – we have datasets that are in the billions of records and we found the performance wasn’t good when using these. 

“We have hundreds of application servers that are connecting to our back-end noSQL database – each of our application servers have multiple threads connecting to database clusters. MongDB had a separate stack and thread for each of these connections, when you have hundreds of application servers, this is very wasteful in terms of CPU and memory.”

A third generation architecture

Faced with these performance and capacity challenges, the Viber team set out to spec a new architecture and began to migrate away from MongoDB and onto Couchbase. Ish-Shalom explained that Viber had the following requirements from the new solution:

Needed to be able to support close to a million operations per second, large datasets with billions of records and be scalable enough to deal with ‘exponential’ growth.

Needed to be robust – Ish-Shalom said that although AWS is good in terms of scalability, it has stability challenges. Viber needed a system that is able to cope with server failures and can continue working without any downtime. It must also be possible to upgrade the system without any disruption to the service.

Needed to be able to back-up the system on a daily basis and store in something like S3.

Ish-Shalom said:

“This brought us to Couchbase. We still have our clients connecting to AWS application servers, which connect to our Couchbase clusters. This time we chose not to use a single cluster, but several different ones – we think its better to have different clusters using separate types of operation, we also didn’t want to have too many nodes in each cluster. At the moment 60 is our biggest one.

“In addition we have a back-up cluster, which is being useed to replicate some of the more critical parts of the clusters, so we have a live update of the database in case of any failures. Each one of the clusters is uploaded daily to a local drive and then uploaded to S3, just in case of a very big failure.”viber third gen

With the second generation architecture Viber had one single MongoDB cluster, with three copies of the data, as well as three different Redis clusters. In the new Couchbase architecture, Viber will have seven different clusters, a number of different replicas and has seen an increase in performance and capabalities, whilst still managing to halve the number application servers it requires from Amazon Web Services.

Ish-Shalom said:

“Migrating between has been an interesting task. We were already using noSQL, had to migrate a live system which means that the system was continuously processing hundreds of thousands of operations per second, introducing millions of new users, all through the migration phase. 

“We are not allowed any downtime during the migration and the system has to continue running without even a second of downtime. We have to make sure no data is lost and we are expecting node failures during the migration process, so we have to make sure the migration continues working with these failures – which is not very easy. 

“We also have to make sure the data is consistent, so making sure the data we are migrating is being constantly updated even as its being migrated.”

Viber has already migrated five clusters and hopes that it will complete the project by the middle of this year .

    Leave a Reply

    Your email address will not be published. Required fields are marked *