KubeCon + CloudNativeCon 2024 - AI means it’s Springtime for open source, says CNCF

Chris Middleton Profile picture for user cmiddleton March 21, 2024
Summary:
On day one of its biggest conference to date, the Cloud Native Computing Foundation claimed natural partnership with the AI revolution.

An image of a flower blossoming in Spring
(Image by [email protected] from Pixabay)

Cloud-native and AI are the two most critical technology trends today, with the former powering the AI movement as the “only infrastructure” that can keep pace with innovation in that sector. So said Priyanka Sharma, Executive Director of the Cloud Native Computing Foundation (CNCF), as over 12,000 delegates gathered at KubeCon + CloudNativeCon in Paris. 

The four-day conference – featuring speakers from Google DeepMind, Microsoft, Oracle, AWS, Huawei, NVIDIA, Ollama, CERN, Intel, VMware, Bloomberg, BT, Goldman Sachs, Red Hat, Shopify, Heroku, and Zeiss, among others – is the largest in the CNCF’s history, as open-source deployment automation platform Kubernetes celebrates its first decade. 

Indeed, the dominant theme of the first full day of the event was the key role it has to play in AI inference and training, with Sanjay Chaterjee, Engineering Manager at hardware giant NVIDIA, observing that AI represents “the Linux moment” for Kubernetes.

On the subject of Linux moments, enterprise distie SUSE used KubeCon to announce a range of enhancements across its cloud-native and Edge portfolio, with new capabilities for Rancher Prime and SUSE Edge 3.0. 

Meanwhile, Red Hat enhanced its hybrid cloud app platform OpenShift, adding Testcontainers to it. It also announced upgraded versions of its Advance Cluster Security, Red Hat Quay, and Podman Desktop products, and launched a new State of Applications Modernization Report in partnership with research company Illuminas.

In other news from the event, Intel has begun a limited offering of a new Kubernetes managed service in the Intel Developer Cloud – the Intel Kubernetes Service – which it says provides developers with clusters for application development, AI/ML training and inference, and more.  

And Microsoft announced enhancements and new features in Azure, Azure Kubernetes Service (AKS), and its open-source projects, most of which have the aim of helping developers “adopt Kubernetes with confidence and convenience”.

For example, AI toolchain operator KAITO can now run specialized machine learning workloads, such as Large Language Models (LLMs), on AKS “more cost-effectively and with less manual configuration”. Microsoft also trailed new features designed to enhance the security and scalability of AKS clusters and nodes.

The sense that this is Springtime for cloud-native and open-source was heightened as Paris basked in over 20 degrees of heat for the first time this year. Giving the opening keynote, the CNCF’s Sharma said:

Just six years ago at KubeCon in Berlin, OpenAI told us that the future of AI was going to be powered by cloud native. Fast forward to today, and you see that the world of AI has expanded and we’re entering what Alan Greenspan, former chair of the Federal Reserve of the United States, would perhaps call an ‘age of irrational exuberance’.

But she added:

Nothing important has ever been built without irrational exuberance. And in this latest AI era, we are the people who are building the infrastructure that supports the future.

Bold words, alongside a prediction from the keynote stage that the cloud-native development market will be worth $2.3 trillion worldwide by 2029, up from $547 billion in 2022 – a 320% increase in less than a decade.

Speaking later, Sharma explained:

Cloud native has enabled businesses to move faster, build applications, change applications that deliver and deploy resiliently – and have confidence. So, a whole world has become the technology [sic]. I am confident and proud to say that we [the cloud-native community] can take credit for that.

Like peanut butter and jelly

The CNCF, which announced 45 new members at the event, described itself as the “vendor-neutral home” of 183 graduated, incubating, or sandbox open-source projects, including Prometheus and Envoy, alongside Kubernetes itself. It now numbers more than 233,000 contributors, with the US, China, and India being the big three developer hotspots. 

In Sharma’s view, cloud-native and AI technologies are developing an increasingly symbiotic relationship in 2024:

Gen-AI is prompting cloud-native to rethink infrastructure paradigms to accommodate AI workloads, improve platform engineering’s focus with AI insights, and ensure AI-ready systems. This integration represents a significant shift in how we design, deploy, and manage cloud-native solutions.

Cloud-native and AI are like peanut butter and jelly. One can exist and thrive without the other, but together they create something extraordinary.

For Americans with a sweet tooth, perhaps. But are things really that simple? She continued:

We both have best practices for handling data. We also support open-source tools for data validation and storage. We both favour ethical AI, prompted by community-led projects that favour it. 

We both aim to set new standards for responsible AI development in the cloud-native landscape by bringing the community together and, most importantly, working together in public to bridge gaps between AI technologies and cloud-native principles.

A question of sustainability 

Accordingly, the CNCF’s AI Working Group launched its new Cloud Native AI white paper at the event. 

The paper echoes KubeCon’s upbeat assessment of the mutually supportive roles that the two technologies play, but notes that challenges remain. Among these are managing large data sizes, ensuring data synchronization during development and deployment, and adhering to data governance policies.

The white paper also acknowledges the environmental challenges involved with running large AI workloads:

Resource proper sizing and reactive scheduling to meet varying workload demands are even more compelling in the context of accelerators such as GPUs, which are expensive and limited in supply. It drives the need to be able to fractionalize GPUsto utilize them better. 

Reducing the carbon footprint during model-serving can be achieved using an autoscaling serving framework, which dynamically adjusts resources based on demand.

Sustainability can be significantly improved by various means, such as using smaller, more specialized models, using a mixture of experts, and techniques such as compression and distillation. Distributing ML serving into geographical regions powered by renewable or cleaner energy sources can significantly reduce carbon footprint.

Responsible development of ML models can include metadata on carbon footprints to aid in tracking and reporting the impact of model emissions on the environment.

In the main hall, these were among the themes picked up by NVIDIA. Distinguished Engineer Kevin Klues said:

At the heart of this [AI] revolution are our GPUs and the platform that provides applications access – for many, Kubernetes has already become this platform. But we still have a lot of work to do before we can unlock the full potential of GPUs to accelerate AI workloads on Kubernetes. This includes changes to both the low-level mechanisms used to request access, and the high-level processes.

Google’s equivalent, Distinguished Engineer Clayton Coleman, added:

The GPU-only approach isn't necessarily sustainable. We need something that's affordable, available, and easy to use.

With NVIDIA’s market cap currently exceeding Google’s ($2.2 trillion vs $1.85 trillion), the search giant has reason to feel aggrieved, but the worldwide shortage and expense of GPUs was a constant theme at the event. 

So, for those using GPUs for large AI workloads, how can they get the most resource-efficient use of their hardware? Ricardo Rocha is Computing Engineer at nuclear research centre CERN in Geneva, home to the Large Hadron Collider. He said:

That’s something we've been looking at for quite a while. […] There are some things that we can do to try to optimize this pattern, which is to try to share, plus better partitioning of the GPUs so that we can make the best of them. And there is a lot of work in the community as well to better support this.

Meanwhile, media giant Bloomberg shared how it uses open-source, machine learning, and CNCF Projects to help it manage – and share – a vast amount of data: 300 billion market messages, 200 million trusted documents, and 450 billion daily data points served. 

The company handles 1.5 million news stories in 30 languages, ingested from 125,000 sources, explained Team Lead Yuzhui Liu and Software Engineer Leon Zhou.

My take

An inspiring first day from a rare event that feels like a mutually supportive community, rather than the usual scenario at big conferences: corporations preaching to their stans and converts. 

More user stories to come in my next reports from Paris.

Loading
A grey colored placeholder image