Back from KubeCon & CloudNativeCon Europe 2021: key learnings and takeaways

Co-authored by Christian Alonso Chavez Ley

At Deezer, we think that taking part in the community is of utmost importance. Contributing to open source projects is the best way to do so, but participating in community events should not be overlooked. And when you work on Kubernetes platforms like we do, what’s better than going to the KubeCon + CloudNativeCon conference?

For those who aren’t familiar with KubeCon, it’s the main event organised by the CNCF (i.e. the foundation that hosts a lot of the cloud native open source projects). The conference spans over a whole week, with the first day being reserved for co-located events (like the PromCon) and sponsored events. Each year, the conference unites around 3,000 to 4,500 open source and cloud native enthusiasts.

This year, three of us participated in the KubeCon Europe 2021 (from May 4–7, we didn’t attend any co-located events). Like last year, this KubeCon was virtual due to the sanitary situation.

Some of us had already attended virtual or IRL KubeCon, and one of us had never gone. We were all very excited because, from experience, KubeCon is always a place where you learn a lot in a very short time. Everything was ready:

  • Individual sched.com KubeCon schedules filled out and shared with colleagues? Check!
  • Private Slack and Mumble channels to exchange views between sessions? Check!
  • Shared, collaborative notes? Check!
  • Mountain of coffee? Check!
  • “Do not disturb” signs everywhere and empty work agendas? Almost checked, we’ll come back to this later.

There is obviously too much to talk about to fit in one article, so we’ve written a best-of compilation of what we learned there, and identified some key takeaways — should you want to participate in such an event in the future.

We feel this year’s conference was built around three main topics, which are CI/CD, Security and Networking. We will focus on them here.

Are you ready to dive in?

CI/CD in Kubernetes

Putting Chaos Into Continuous Delivery to Increase App… Juergen Etzlstorfer & Karthik Satchitanand

The continuing maturity of the ecosystem means that we are moving from modeling pipelines (with tools like argocd and tekton) to resilience hardening with techniques like chaos testing.

This all goes back to the seminal 8 Fallacies of Distributed Systems and the fact that applications running on K8s are, by their very nature, distributed systems, even if they were not conceived as such. So the question we now face is: “how can we easily test these assumptions in our platform?”

Litmus Chaos is a K8s native, declarative oriented, experiment based (whew that’s a lot of adjectives) tool that allows the community to share chaos tests in a portable manner, which can then be executed during a stage in a CD pipeline or against production systems (for those who feel like living dangerously).

In order to illustrate Service-Level Objective (SLO)-based quality gates with chaos tests, the speakers used Litmus Chaos through another tool called Keptn, which does it natively, by using Prometheus and other types of datasources. Nevertheless, it shouldn’t be too hard to plug them into your current CD pipelines, whatever they may be.

We particularly like and wholeheartedly agree with their key takeaways:

  1. Establish a process of continuously evaluating resiliency
  2. Do chaos testing in addition to performance tests
  3. Evaluate based on Service-Level Objectives (SLOs)

Those are good rules-of-thumb to follow independently of any tool you are using.

Security in Kubernetes

Another great topic at this year’s conference was security in Kubernetes. Even though securing it has always been a concern, Kubernetes is gaining a lot of attention (especially from hackers), and the CNCF ecosystem is responding accordingly.

A lot of projects have been included in the CNCF landscape on this topic and there were of course talks about those. But aside from these very specific software related talks, there were many (very good) beginner talks, including:

From those talks, you can feel that the audience is a bit late on security and that’s a bit worrying.

One of the best talks we’ve seen on security this year was Uncovering a Sophisticated Kubernetes Attack in Real-Time, by Jed Salazar & Natália Réka Ivánkó. After brushing a very (very) quick picture of the security best practices everyone should (woops 😨) already have in place in a Kubernetes cluster, they demonstrated how an attack could take place in a “supposedly safe” cluster and how to track malicious actions with Cilium (using eBPF programs).

The main idea there was “trust, but verify”, which is just applying DevOps logic to security.

Another great talk to better understand the magic behind Falco, Cilium and all related runtime security tools in Kubernetes is the talk on eBPF by Quentin Monnet: eBPF on the Rise — Getting Started. If you intend to work with any tools using eBPF (and there are a lot of them now), you really should watch this session.

Networking in Kubernetes

How to Break your Kubernetes Cluster with Networking — Thomas Graf, Isovalent was one of the most fun presentations we saw during KubeCon.

Thomas Graf is a Cilium maintainer so it could have been incredibly detailed (which may have lost a part of the audience) but instead it was structured as an introduction, with a very nice “avoid common pitfalls” style.

Because let’s face it: for all the benefits that Kubernetes brings to development teams (and we firmly believe it does bring a lot!), operating a K8s cluster is no joke. Even when using a managed service (like GKE, EKS, AKS, etc.), you still need to understand how components work together and react to each other, particularly when it comes to networking.

To use another old reference: Kubernetes is an example of a Leaky Abstraction, and ignoring or misunderstanding the abstractions and your chosen implementation can come back to haunt you.

Some of the most notable examples he talked about were:

  • The ndots default (basically the number of lookups DNS does before resolving) is set to 5, which may cause you to DDoS your DNS server if you have a lot of outgoing traffic.
  • When migrating to a “deny all egress traffic by default” approach while using Network Policies, you should not forget to whitelist the dns pods (them being coredns in kube-system by default) or else your application won’t find the pods you’re actually interested in.
  • CRDs are stored in etcd and often watched so they are frequently being transmitted between the api server and the nodes. At scale (imagine thousands of nodes), this potentially means gigabytes of network traffic that can DDoS an unsuspecting api server.
  • Take a look at the presentation for much more 😉

Even though many of these cases are well documented, most of us operating Kubernetes clusters in production have been bitten by them in one way or another.

This is why we like the message in one of the last slides so much:

Words to live by 🙂

Takeaways

Once again, KubeCon + CloudNativeCon proved to be the place to be if you work with Kubernetes or Cloud Native tools. This year’s sessions were clearly focused on CI/CD, observability, networking, as well as security.

If we had to describe this KubeCon with only one word, it would undoubtedly be eBPF.

Either in emerging or maturing projects in the CNCF ecosystem, eBPF was everywhere! From Intrusion Detection Systems to CNI plugins, there were a lot of presentations about it. We think that it’s a signal that the technology (which has been around for some time now) is mature. You should really take the time to get familiar with it if that’s not already the case.

From a less technical point, we also want to share that attending a virtual conference is hard, but not in the way we thought. After more than a year of forced full-remote work, maintaining attention to conferences for 3 days straight was not so challenging after all (no “Zoom-fatigue” syndrome felt).

But (there’s always a catch) the downside of being on your computer at home, like a “normal day at work”, is the temptation to work in parallel to attending the conference. Our advice is to book your agenda for the whole 3 days (or 5 if you do the whole gig) and more importantly, to stick to it!

Another takeaway is that KubeCon is really tightly packed. This year, there were no breaks from 10am to 5pm besides the 15 minutes between each talk (no lunch break). That’s rough and we hope that they’ll be back next time.

Last but not least, one of the advantages of conferences is the possibility to have impromptu meetings with strangers. This is a great way to talk about problems you are facing and discuss solutions other people have already found for them, in ways you may not have imagined otherwise. Even though there are open channels on the CNCF Slack, we found that having “hallway conversations” is way harder in remote conferences. And that’s too bad.

Let’s hope that we can safely meet again soon!

Do you wish to learn and share with our community of developers, and make the Deezer experience a better one? Check our open positions!