Blog Cloud Native Infrastructure Azure

Update scenario's op AKS

In this article, I am going to talk about a very important topic: updates. When I say updates, I mean not only Kubernetes updates but also the worker node OS updates. Many people forget about patching the work OS believing that it is taken care of by Microsoft. This is not the case.

Richard Hooper

Author

Richard Hooper Principal Azure Architect

In this article, I am going to talk about a very important topic: updates. When I say updates, I mean not only Kubernetes updates but also the worker node OS updates. Many people forget about patching the work OS believing that it is taken care of by Microsoft. This is not the case.

It is currently the customer's responsibility to keep the cluster up-to-date. Below I will answer a question I am asked a lot: why should I update my cluster? I will also explain in this article the different update options, what exactly you need to update yourself, and the easiest way to do this.

Why should I update my cluster?

Kubernetes is being developed rapidly. A new release comes out every three to four months. Until recently, each version was only officially supported for 9 months; with Kubernetes 1.19, this has increased to 12 months. Within these small releases, you can expect new features and improvements and sometimes groundbreaking changes, such as API version changes, that make your implementation stop working. Patches are released more frequently, sometimes weekly, to fix critical bugs and security vulnerabilities. There is also no Long Term Service (LTS) release of Kubernetes. This sounds like a nightmare and you are probably thinking right now; why update when there is a chance of things breaking?

First of all; support. Azure only supports the last 3 versions of Kubernetes that AKS has made generally available (GA). What this means is that once AKS has released a version of Kubernetes for all supported Azure regions it is classified as GA. So, say version 1:19 has become GA for AKS, only version 1.19, 1.18, and 1.17 minor versions are supported. AKS supports only two patches. So for version 1.19, you can only use the last 2 patches, 1.19.3 and 19.1. 1.19.0 was supported until patch 1.19.3 came out.

AKS control plane update - AKS control plane update

In AKS you have something called the control plane. This is actually just the master nodes and is provided by Microsoft, well apart from the updates. There is no supported automatic way to update the AKS control plane and that is probably a good thing. You should always test that your application still works and is still deployable on a new version before updating your production systems.

You can update the control plane via the Azure portal or Azure CLI. Of course, it's a lot easier via the portal because it's just a drop-down list (see image below).

When performing a control plane update, Microsoft takes care of everything by updating your cluster's control plane components. Not much is known about what they do, but if you wait a moment you will see that it is updated in a moment.

AKS Node update

Work nodes, on the other hand, need different types of updates. Below, I'm going to tell you more about the Kubernetes version updates. In the next paragraph, I'll go into more detail about Node OS updates.

As with the control plane, you can use either the Azure Portal or the Azure CLI to update nodes. In AKS, each node sits in something also called a Node Pool (think Virtual Machine Scale Set). AKS also supports multiple node pools, and they can be Windows or Linux, but not both in the same node pool.
You can only upgrade a node pool to the version of the control plane or lower. So, suppose your control plane is running on Kubernetes version 1.18.10. Then you can only upgrade your node pool to 1.18.10, but you can also upgrade them to 1.18.8. You can't upgrade them to 1.19.3 like that, only if you upgrade the control plane first.

When you upgrade, AKS will add something called a buffer node. This buffer node, normally one, can be configured using a function called max surge, which is created in your cluster. This buffer node runs the latest version of Kubernetes.

This cluster will then empty one or more of the older nodes, depending on the max surge setting, to help minimize the disruption of running the applications. When the older node is completely empty, it is reimaged with the latest VM image from Microsoft with the selected Kubernetes version. This reimaged node then becomes the buffer node. This continues until you have one node left, once it is completely emptied it is removed, keeping the number of existing worker nodes.

This whole process can take some time, depending on the workload. Each node has a total allowable time for upgrades which is 10 minutes. So when you do the upgrade do 10 times the number of nodes you need to upgrade to make sure you have enough time for the update.

AKS node OS updates

You now know more about control plane updates and node updates for Kubernetes versions, but you still have the operating system that needs to be patched, just like any server you would normally run. Fortunately, you don't have to install these patches yourself. Microsoft takes care of this, at least for the Linux nodes. All you have to do is reboot the Linux Node for the updates to take effect. Windows, on the other hand, is a little different. First, let's look at the Linux nodes.

Linux nodes are configured to check for updates every night. If a security or kernel - update is available, that update is automatically downloaded and installed. Some of these updates, such as kernel updates, require a reboot. When a node requires a reboot, a file named rebout- requested under /var/run/ is created. You can create your own solution to monitor this file or you can use an open-source tool called KURED (Kubernetes Reboot Deamon) from Weaveworks. If you want to install this follow the instructions from their GitHub repo. It even allows you to schedule and message to Slack or Teams.

For Windows node pools, you need to upgrade node OS image. Every week, Microsoft makes a new node image available for both Windows and Linux. You can then use the Azure CLI to upgrade the nodes in your cluster. You are able to do either the entire cluster or just a node pool. Right now there is no way to automate this, hopefully, one day this will come, but until then you can look at using logic apps or Azure automation to do this.

The last option you have is to update the Kubernetes version on your node pools. As mentioned earlier, when you upgrade Kubernetes, it creates new nodes with the latest image version of your selected Kubernetes version. So that you get a new node OS image with all the security patches.

Want to read more about AKS? In the coming weeks, we are going to write more articles on this topic. Don't want to miss this? Then sign up for our Intercept Insights here. Then we'll keep you up to date with the latest news stories.

Visit our AKS workshop

Learn even more about AKS through our interactive AKS workshop. In 1.5 hours you will receive the benefits and best practices to make your environment more efficient. Through common AKS challenges, you will be ready for AKS. Click here for dates and sign up!