Databases with meatballs - IKEA's migration from flat(pack) data model to open source Postgres

Profile picture for user slauchlan By Stuart Lauchlan July 14, 2020
Summary:
Global retail icon IKEA has seen a rise in online customer activity, prompting an overview of the firm's database landscape and ultimately a shift to an open source platform around Postgres.

IKEA
(Pixabay )

With its famous Big Blue Boxes store design present in 30 countries around the globe, IKEA is one of the world’s most recognizable brands for millions of customers in search of kitting out their homes with trendy furniture and accessories. Behind the scenes, the Swedish retailer has undergone its own modernization makeover, one centered on a migration to the Postgres open source database. 

It’s a shift that’s been driven by wider trends in the retail marketplace and changing customer needs, says Dinesh Adhikari, Infrastructure Manager at IKEA Retail, particularly in terms of increased customer meeting points as people visit online via the company website or through the IKEA app. The popularity of online options increases the firm’s reach, but also adds to the complexity of demands on the systems underpinning the retailer’s day-to-day operations. There’s already a lot going on, says Adhikari: 

There are supply chain systems behind it, there are financial systems behind it, there are customer solutions and then co-worker solutions to run such a large operations. Most of the [operational] information is stored in the databases and sent out to different systems, received from different systems, sent out to external partners that we have in supply chain and other areas. 

Delivering those databases is something that the Infrastructure team at IKEA provides, he adds, powering and supporting the rise of e-commerce, for example. But managing the database portfolio was becoming a bigger ask, he recalls: 

We have been using databases for quite a long time. We have built different database versions over the years. Running our database operations is a bit of an ‘old task’ that we are doing within the company. Since such a large number of stores have grown up and the business operations have grown, we have seen quite a lot of challenges that have occurred. Over the years we have created quite a lot of complexity to do with how we work with databases or what our databases can provide. 

We have worked with mostly proprietary databases in our portfolio, so that every time when there is a need for [a user] coming to a database, we just go, 'OK, you're going to use this database and this database only' and that has worked for us so far. That has worked quite nicely, but we see that there are gaps, there are shortcomings. There has been quite a lot feedback we have received that our operations, which have worked so far, probably will face [more] challenges. We are not quick enough, we are not doing enough to meet what is required from us [in terms of] how we are changing our application landscape around us.

The result of this feedback was to kick-off an assessment of database strategy to determine what changes could be most beneficially made. Adhikari explains: 

It was a huge review that we did for our landscape. We have been to every corner, basically lifted up every stone. We wanted to know what we have, how we are running, before we made any decision saying, ‘This is what we want to do’. So in this review, we looked at our deployed base of how many databases do we have, where these databases are running? Do we run in our data centers? We have stores where we have some solutions which use databases - what kind of databases do we have in place [there]? What are our policies? What kind of processes do we have existing today when it comes to deploying those databases, removing those databases, making modifications on these databases? How are they  fulfilling the needs we have and where they are in terms of meeting the need in the future?

The review also included lifting the lid on life inside database operations and examining the workloads of the DBA teams internally at IKEA: 

What kind of challenges do they face? What kind of task categorizations do they have? This whole assessment is to take down each part of this for the operations and then the whole state…What is actually done inside the database? What kind of logics are coming in, what kind of codes do we have in place? Then we did also a little bit of mapping of that onto our applications to see what kind of applications we have. But the main usage part was more [to get] insight from from a DBA point of view, from a database operations point of view, from a database capability point of view. But at the same time, we also looked at, what kind capabilities do we need? So we had some discussions with architects, we also spoke to some of the application owners. We were just looking into what is it that we are missing? What do we need more of? How can we make better services or applications to use?

And inevitably the final element of the review was around Total Cost of Ownership:

It's not just about our database licensing cost or one particular cost in compute. What is the the total cost in terms of how much it costs to run a DB? How much it costs to build on a DB? How much does it cost to do life cycle management?  These are the costs we actually look from a different aspects to get an understanding of our whole state. 

A catalogue of outcomes 

The entire review ran for a little over two months and threw up some interesting outcomes that would go on to shape future decisions, says Adhikari:

There are quite a lot of facts that that came out which we probably would have ignored if we just keep looking at the database from a holistic point of view. We found out that we have more development test databases than actual production. I can't remember exactly what the ratio was, but it was quite a high number compared to the number of production databases. So that tells us we do quite a lot of testing - and that's a good thing - but then we spin up a lot of instances, probably more than what we need. We also learned that our provisioning is quite slow, backed up by a heavy process. That's where we have a lot of handovers between the different teams, so the provisioning process was extremely slow. 

Unprovisioning was even slower:

It takes quite a lot of effort to get a database in place and then it takes quite a lot of effort to get the database off the place. And just because of this complexity of doing provisioning and unprovisioning, then the rate of uncommissioning was quite low, meaning we were provisioning more, but then we were not removing enough. That is adding complexity to our operations that is adding to the total TCO to run those databases in place. For the period of time those databases are in large test environments, they start to live their own life cycle, so the state is becoming so contaminated that even though it's a test and development databases, they cannot be used, because they need to be fresh. They need to be deleted, they need to be recreated again. That was that was quite an interesting finding from a practice point of view. 

Other learnings included the conclusion that DB operations is a bit of a mundane role: 

DBAs are basically engaged in creating storage, they are involved in creating data or creating table-spaces, extending table-spaces. They are involved in creating data refresh cycles, back-ups, monitoring back-up jobs, so it's become quite a mundane operation to run. What we also learned is that our compute utilization is quite low, so whatever compute capacity we are provisioning, we are not using as much. So we create databases, we create an compute infrastructure, we allocate storage, we allocate CPUs and memory to secure quite a lot of things, but then we are not utilizing so much in the end.

The nature of the underlying database tech itself also threw up some unexpected points, he adds: 

Once we looked inside the databases, we found many of them had a flat data structure. Some applications were just using the databases to store data in tables and columns. There were no or minimal [relational elements], so there was very little code inside. That was that was actually quite an interesting finding. We would have not usually go inside the database and look at what we are running inside. We also found out that a large number of the databases are around 30GB which we call quite small size. We have big databases running multiple terabytes, but a large number of databases are quite small. 

Planning a makeover 

Based on all this information, IKEA’s next step to consider what was happening in the wider database industry and how that could be mapped onto the retailer’s changing needs: 

What we have bought [before] has worked for us, has worked very good in the past, but now the time is coming when changes are happening quite rapidly. We have a 46% increase in sales through online channels. That means where e-commerce is picking up, we need to do faster deployment changes, we need to have a better testing capabilities in place. Having those long lead times or having a contaminated test database is probably not a very big help. 

The decision was taken to go for a database services provision approach, rather than rolling out standard database-based deployments. There were several drivers behind this move, explains Adhikari: 

We wanted to consolidate our databases. We want to leverage what we are provisioning in a much lighter way, so we have enough of everything - not too much, not too little. And we want to also create basically an API and database platform services, where we are able to do provisioning on-the-fly. Where people are able to create databases on demand, they should be able to remove those databases or clone databases on demand. There are quite a lot of HR operations that we want to run on our databases and we want to be able to scale up whenever we have need or able to scale down if the need is not there anymore. So creating models which basically create flexibility and agility in our database platforms.

We also wanted to create fit-for-purpose databases. As I said before, we have worked with with proprietary databases. We have worked with relational databases over the years, but given the facts that we had [uncovered] in the assessment, some of those enterprise grade databases were just used to store tables and columns. So, what does that say? It says maybe we are not using our database technologies in a right way? So, we need to have a better database offering where it basically fulfils that need and there is a balance between the cost and the performance we require, the availability and scalability that we require. 

Automation and cost reduction were also critical factors in the new thinking: 

We also wanted to create a DB platform with efficient operations, meaning that [we can create] tasks, create automations, create an environment where things can happen through machine learning, through creating capabilities which trigger certain responses and fix those mundane jobs in a much more better way. We also looked at the new [capabilities we need to] have. How should we handle interruptions in a better way? How should we handle data applications in a better way? How should we work with with our test data?  How should we back-up? How should we recover? How we are able to work with these these new capabilities in this target outcome?  And of course, it's the TCO reduction. If we are able to put a lot of automations in place, then how does it impact our TCO? We want to reduce the TCO if we are not using our compute quite as much [as when] we are deploying proprietary enterprise grade databases. 

Enter Postgres

Conversations with infrastructure architects on how to turn those needs into reality included debate around selecting a relational database provider solution, recalls Adhikari, but the focus shifted to Postgres: 

In the discussion we basically spoke about certain database names and why should we choose them, where should we go? But in the end, we basically said, ‘Let’s have a test-and-learn activity around Postgres; What Postgres can do for us, what is Postgres? - let's dig down into it’. So we spent a few weeks learning about Postgres. We basically installed Postgres. We created some use cases and we tested Postgres around those use cases and then had an a quite a bit of activity around this so that the technical team that worked with it saw it is a really good product, something we can use and we can use here and now.

There were a number of key reasons for coming to this conclusion, he says: 

It's basically an open source relational database. From the experience point of view, [it was] the simplicity of the product. There's quite a lot of confidence in our applications team and our technical team to start using Postgres. One piece of feedback I got from from my colleagues is that the [Postgres] community..is strong. There is quite a lot of contribution from [the community], there is quite a lot of development and a lot of knowledge sharing happening around the community. So community was was another reason for us going towards Postgres. It also has an architecture which basically meets our scalability needs -  How we are able to create clusters?  How we are able to create and highly available environments, highly scalable environments? How we do mission critical applications we are able to bring into Postgres? - so that the architecture itself and all the extensions or frameworks that are coming with Postgres is definitely a big plus.

But despite the appealing simplicity aspects of the product, Postgres was going to have to undergo what Adhikari calls “enterprise level upscaling” and that required some additional support needs: 

We basically needed a few capabilities in place. How we are able to create this infrastructure? How we are able to monitor it? How are we able to run operations behind it? And how are we able to secure whatever data we bring on this platform? When we started this journey, we had some experience. We did test-and-learn, but we did not have all the capabilities to create Postgres as an enterprise service that's something we can use within Ikea Retail. So then we reached out to [service provider] EDB [formerly EnterpriseDB], and that's where we got quite a good support in terms of creating Postgres as a service at an enterprise level of scale. We had quite a lot of sit downs with them on how we were able to use the different tools that come in from EDB to help us to create a Postgres database service. 

IKEA is now using one of the Postgres Community Editions and also has an Enterprise Postgres Advanced Server. To date, the migration has been successful, but there’s still more to be done, says Adhikari: 

It is basically a journey. That's something we continue….Postgres is definitely a plus and quite an addition, but our journey does not stop here. Now we are engaged in quite a lot of initiatives to see how we are able to create a better service to use for IKEA. I think it is going to take quite some time before we get there, but we are definitely on the right track.