The cloud and how to make a mess of it

Profile picture for user mbanks By Martin Banks April 24, 2020
Summary:
How do you know when enough is enough in terms of cloud utilization? Don't overfeed yourself!

rain cloud
(via Pixabay)

There is more than a touch of scare-mongering to be had in the suggestion that the cloud is going to ‘run out’, its resources used up faster than they can be added to. But with the surge in home working that has become the new normal in Coronavirus lockdown, the demand on Internet resources is undeniably under pressure.

But there is a problem to be faced here, one that both users and service providers need to address, sooner rather than later. In sum, a large amount of resources are being wasted, frittered away by, at best, sloppy management on the users’ part  and, at worst, a somewhat callous attitude held by service providers of the level of understanding and expertise many current cloud users have concerning that particular tool of their trade.

The bottom line is that cloud utilization has reached the point where service providers are faced with the question: if users can’t manage their use of the cloud properly, do we just take the money anyway?

This issue emerges right at the beginning of any company’s transition to the cloud, according to Simon Ratcliffe, Principal Consultant at managed services provider, Ensono:

I suspect most of the business users making that transition have no idea what questions to ask or what to compare against, because apart from anything else their background is going to be on-premise installations, where you build in redundancy to make sure that you cover for any peaks and failures. So, you're always ordering twice as much as you need.

Buying small, medium or large

This approach then has to be set against what the Cloud Service Providers (CSPs) offer – normally a range of packaged-up nominal sizes of resources and service capabilities that are the equivalent of selling small, medium or large. In Ratcliffe’s view users then always trends in the one direction – buy large just in case.

This can be compounded by another factor that a good percentage of Cloud transitioning businesses adopt, especially if moving to public Cloud providers. This is the notion of doing a straight ‘lift and shift’ of on-premise applications into the Ccoud, only to find that it doesn’t always work that well and is often more expensive than it was on-premise. There’s always have the intention of moving to cloud-native implementations but there never seems to be time to do that. As a consequence, it seems there is a growing trend for businesses to rethink the whole transition process. Ratcliffe says:

There was a piece of research on the back end of last year which suggested that over the next three to five years, quite a substantial percentage of people are going to exit the public cloud, effectively come out, put everything back where it was, rethink what they are doing, and start again.

The base line, according to Ratcliffe, is that businesses need to take a much closer, deeper look at what they have and what they want to achieve. They probably do want to build on a cloud-first strategy, but their reality is still significantly different. They still have a great many legacy applications and while there are options open for some of it – Microsoft server applications moving into Azure,  SAP or Oracle apps into their respective Cloud environments for example, can make economic sense if the whole package deal is right  – for most users much of that legacy software is not going make the transition well.

For a significant slice of the marketplace those potential cloud customers will still have a mainframe in the back office. While IBM has recently talked about the cloud potential of its latest mainframe machines, Ratcliffe remains adamant that those machines and applications are going nowhere, and certainly not into the cloud. That produces an economic commitment to on-premise that can skew the economics of cloud transition.

But if the move is made, then the other factors follow on. Resource planning is always based on specifying N+1 environments providing more capacity than the maximum workload could ever reach. That is repeated out into the cloud. In addition, of course, systems running on-premise tend to be left on, so the notion of shutting down cloud instances when not in use, releasing the resources for other users and reducing the resulting costs, is still a hard one to learn.

Another lesson users need to learn is that their usual approach to managing an on-premise environment – following the OSI model and managing from the bottom up – does not work well in a cloud environment, says Ratcliffe:

They need to realise there is a different way of operating in the Cloud, you need to reverse that model and come down through the application and how it is all hanging together. And unless you get that piece right you will never even know how you are managing. You can only know that individual elements are working as they should, but it can still be a mess.

It's counterintuitive. You need to take your time to get it right. People think the cloud is about speed because it's about speed when you arrive, but the journey there is not that fast.

Ratcliffe sees one of the solutions for users as being the adoption of multi-cloud infrastructures. This is the obvious longer term goal for every cloud user, though it will probably only exacerbate the service management element to begin with. But getting there will ensure that users learn to ask the right questions, particularly in terms of the trade off between operating cost and business performance.

It will also, as he notes, move them away from the current predominant thought:  - the answer is AWS, now what was the question? But different CSPs offer different capabilities and specialisms that can be of value to specific users. So choosing a CSP becomes a task of mapping workloads onto CSP skillsets and value propositions.

The gotcha of ‘Purchasing as Code’ = 50% waste

If management is the issue the next two questions for many users new to the Cloud will be what to manage? And how to. manage it?

One of the key targets for management, according to Chuck Tatham, the CMO and Senior Vice President of Business Development with AI-based service monitoring vendor, Densify, is the developer community, and especially DevOps  where there is, he suggests,  now a rash of over-provisioning of resources:

The way software is now built in the cloud the developers have really no idea what the real resource requirements are for their new `children’, Their prime directive is to build valuable business functionality quickly and make sure it runs damn well.

Couple this to the growing use of containers and it leads to the growth of ‘Infrastructure as Code’, where the developers embed the required infrastructure definitions into the application code, in the container. Products like Hashi Terraform, AWS Cloud Formation and Red Hat Ansible allow developers to not only deploy their applications, but also include the code to provision the infrastructure needed.

At one level this does make some obvious sense, but it does also have a serious flaw within it, as Tatham observes:

So you've got this perfect storm of the developer being able to hard code what they want for their ‘children’, and they are going to ensure that those children are well fed, without really knowing how much food their children actually need.

Managing that potential for a Cloud resources feeding frenzy through the developers’ natural tendency to supersize their children is the issue Densify is aiming at, using analytics to determine how much food each child actually needs.

The other part of this problem is that in the traditional on-premise environment there was always a degree of control which came from someone having to sign off the payments for new hardware and software. Now there is no annual purchasing cycle, in fact there are no real systems at all, and what is required is ‘rented’ as part of the infrastructure definition encoded into the application code in the container. Tatham says:

We use the term ‘Purchasing as Code’. Micro-purchases are happening every day, and it should be sending a shiver down the spine of finance organisations because they don't have the same control points that they once did. Now companies have to worry about the true consumption quantity, not the discount on what they're buying but how much they're actually consuming on a monthly basis. It's like a supply chain of any of any other type.

What makes it worse is that those infrastructure definitions end up being hard coded into the application code. Now add in the fact that if there is any problem with that code, it will be the developer that gets called in to sort it out. So, what is any developer going to do? Make sure the child is over-fed and make sure the food is always there by hard coding it. The bills keep mounting because the resources, though rarely if ever used, are clearly specified in hard code.

Tatham reckons that around 50% of Cloud resources are, therefore, being paid for by users, but never used.

Densify’s pitch to solve this problem is to use AI tools to monitor the actual resource utilisation of applications and couple it with ‘soft code’ infrastructure definitions – using live utilisation data to adjust the definitions to more tolerable levels while not impeding the performance of the applications. It can then be used to find other applications that can exploit the resources now made available.

Ultimately, it opens the possibility for a business to know its resource requirements and gain that chance to move from paying for ‘two smalls and a medium’ when it comes to CSP bills, and instead opt for a single, well-managed `large’ with head-room to spare.

My take

During my conversation with Simon Ratcliffe he made a passing reference to the cloud being a bit like the Wild West and one can see what he means: everyone there can see the potential but without a sensible set of rules and behavioural guidelines the chances of being ripped off are high. That is not to say that any CSP is doing anything deliberately underhand, but it is fair to say that their service charging models veer towards the simplistic when in practice they should be fiercely complex. The majority of users, it would seem, still know no better and end up managing themselves poorly. They also fail to notice that what constituted OK practice in an on-premise environment is very definitely ‘not Best Practice at all when it comes to operations in the cloud.