Blog Azure Cloud Native Cloud Costs

15 AKS Cost Optimisation Tips That Save You A Fortune

You migrate to Azure, everyone is excited, especially about Azure Kubernetes Service (AKS) – who isn’t?  

But then: the first invoice arrives… giving some shocking surprises. Life is suddenly not so colourful anymore.

This raises the question:Why is AKS costing us this much?”. 

In this article, we’ll break down 15 AKS cost optimisation tips and techniques you can use to save money. 

Niels Kroeze

Author

Niels Kroeze IT Business Copywriter

Reading time 13 minutes Published: 19 September 2025

15 Tips to reduce your AKS cost 

AKS offers a free control plane and a great way to provision clusters quickly and easily. However, because provisioning is so easy and simple, costs can escalate quickly. Minor oversights often lead to thousands of dollars’ worth of consumption in Azure.  

The following techniques and tips will lower your AKS bill: 

  1. Find the correct Family, Size and Version
  2. Stop & start cluster (Control Plane + Worker Nodes) 
  3. Stop & start specific user node pools
  4. Autoscale node pools
  5. Autoscale by combining HPA with Cluster Autoscaler
  6. Set the proper resource requests and limits 
  7. Leverage Vertical Pod Autoscaler (VPA)
  8. Combine HPA and VPA
  9. Autoscale with Kubernetes Event-Driven Autoscaling (KEDA)
  10. Remove orphaned resources
  11. Use Azure Reservations
  12. Opt for Azure Savings Plans
  13. Leverage from Azure Spot VMs 
  14. Benefit from Azure Hybrid Benefits
  15. Leverage the free SKU for Dev/Test

Let's dive into each:

 

1. Find the correct Family, Size and Version 

Rightsizing” is one of the common AKS cost optimisation tips you find anywhere – but it’s not just about picking a D8 or D12. It’s about choosing the right VM family, size and version based on your workload requirements. 

Azure VM Families 

Azure offers several VM families, each designed for different types of workloads: 

  • General-purpose VMs 
  • Memory-optimised 
  • CPU-optimised 
  • VMs with GPU 
  • Disk-optimised 

Azure Virtual Machine options visualised such as General Purpose, Compute Optimized, Memory Optimized, Storage Optimized, GPU, and High-Performance Compute.

In addition, each family has different sizes available with different: 

  • Max IOPS 
  • Number of disks that we can mount 

In addition, each VM size has its own cost and pricing. But how do you find the right family? 

Start by figuring out your CPU-to-RAM ratio. 

CPU-to-RAM ratio 

When you create an AKS cluster, the default is usually a D-series VM (e.g. D8 = 8 vCPUs + 32 GB RAM). However, not all workloads require the same CPU-to-RAM ratio. Hence, you need to know what you’re using it for to select the VMs and the node count appropriately.  

Once you know your workload needs, you can pick the right family, for example: 

  • CPU-heavy workloads → F-series (e.g. F8 = 8 vCPUs + 16 GB RAM) 
  • Memory-heavy workloads → E-series or M-series with more RAM per vCPU 

But then you'll still need to choose a version, which can impact cost. Take the D-series: it has up to six versions (v1 to v6). Newer versions usually offer better performance, but not always at the lowest cost.  

Pricing depends on: 

  • How new the version is 
  • How readily available the hardware is to Microsoft 
  • Your selected region, as costs vary between locations 

Always compare versions and regional pricing before locking in your choice. Also, you can always scale up the nodes after you get things going, but every running node costs money. Therefore, make sure to choose what would fit your AKS cluster. 

Azure Cost Management Whitepaper

Want to save on your monthly Azure cost?

Get our Azure Cost Management Whitepaper! With the best tips, tricks, and background knowledge to optimise your cloud costs.

Download for free!

2. Stop & start cluster (Control Plane + Worker Nodes) 

Each node in your Azure Kubernetes Service (AKS) cluster is essentially a standalone virtual machine, so Microsoft bills you the whole time they’re up and running. When you create a cluster and just leave it running, it’ll stay active 24/7. In other words, costs stack up regardless of its use. 

Ask yourself: Do you really need these resources running 24/7 for the whole year?” Like what happens when developers aren’t using the cluster (overnight or during weekends)?” 

Consider stopping and starting the cluster for Dev and Test environments that you only need during working hours. This is a quick way to reduce the cost of running the cluster. It shuts down the Kubernetes cluster when it is not needed (without deletion), check the visual below:  

Timeline graphic showing daily start and stop times for a cluster for Monday, Tuesday, Friday, Saturday, and Sunday.

When doing so, it will stop all the virtual machines running in the cluster, including: 

  • The virtual machines of the control plane 
  • The virtual machines for the worker nodes 

The cluster state will be saved in Azure, and then when you start back that AKS cluster, it will restore its state. 

Keep in mind

This only pauses the node pools. It doesn’t stop costs from other resources like: 

  • Load balancers 
  • Public IPs 
  • Ingress controllers 

Overall, stopping and starting the cluster is an imperative measure ensuring Kubernetes viability, where you can easily achieve quick savings on many AKS clusters.  

Note

It’s not recommended for production clusters, which must be up and running at all times. 

You can use either the Azure Portal or the Azure CLI commands (and even integrate it into a DevOps pipeline).

The example below demonstrates both start and stop AKS clusters with CLI commands:

The az aks stop command to stop:
az aks stop --name myAKSCluster --resource-group myResourceGroup

To start a stopped AKS cluster, use the az aks start command:
az aks start --name myAKSCluster --resource-group myResourceGroup

Learn more about how to start and stop your AKS clusters.

 

3. Stop & start specific user node pools 

What if you don’t want to stop and start all the cluster's virtual machines (including the control plane)? Then you can stop just some virtual machines in specific user node pools.

This only applies to the user node pools, not the system node pools. System node pools should always be up and running within a cluster – unless you want to stop all the AKS cluster virtual machines. 

To stop node pool-specific nodes or specific node pool virtual machines, use the command line: 

az aks nodepool stop:
az aks nodepool stop --resource-group myResourceGroup --cluster-name myAKSCluster --nodepool-name testnodepool
 

az aks nodepool start:
az aks nodepool start --resource-group myResourceGroup --cluster-name myAKSCluster --nodepool-name testnodepool 

Alternatively, you can use the Azure Portal: Go to the specific node pool, then click on the Stop button for that node pool. 

 

4. Autoscale node pools 

In an AKS cluster, you can run one or more node pools. Each can be auto-scaled independently.  

Independent autoscaling means: if your cluster doesn't need that many VMs, it will reduce the size of the node pool and remove some unneeded VMs.

Thus, it directly lowers the cost since you’re billed based on the number of VMs in each virtual machine scale set (VMSS) linked to the node pool. 

 

5. Autoscale by combining HPA with Cluster Autoscaler 

Trying to answer “How much compute power does my production traffic exactly need?” is often extremely hard, especially upfront.

When creating an AKS cluster, the “not too much, not too little” approach is often used to determine the number of nodes to set.

While it might work early on, as traffic increases and apps need more CPU and memory, it won’t be efficient. Before long, you’re manually scaling nodes up and down to keep things going – something you’d rather avoid.  

This is why you should auto-scale in AKS. To enable auto-scaling in AKS, two critical components play a key role: 

  • Horizontal Pod Scaler (HPA) 
  • Cluster Autoscaler 

Diagram illustrating Azure Kubernetes Service (AKS) cluster with Cluster Autoscaler and Horizontal Pod Autoscaler scaling nodes and pods.

Horizontal Pod Autoscaler (HPA) 

What the Horizontal Pod Autoscaler (HPA) does is address the number of replicas that your application needs. Put simply, HPA dynamically increases and decreases the number of pods based on resource demand.  

When you configure HPA for a given deployment, you: 

  • Define the minimum and maximum number of pod replicas it can scale to 
  • Define a metric to monitor and base the scaling decisions on, such as CPU or memory utilisation 

In detail, as load increases, what it starts doing is spinning up more pods (basically creating more replicas). Then, it places them on different nodes. But it doesn’t create any more nodes. The Pod Autoscaler simply creates more pods to address your application demand. But sooner or later, the nodes will run out of capacity. That’s where the cluster autoscaler comes into the picture. 

Cluster autoscaler 

When the horizontal pod autoscaler (HPA) tries to create more pod replicas but the current nodes can’t handle them, the cluster autoscaler steps in. It looks for unschedulable pods like “can’t schedule due to insufficient CPU or memory” and then automatically adds new nodes to the cluster to provide the needed capacity. 

Once the new nodes are up, the Kubernetes scheduler assigns the pending pods to them. So, HPA and the cluster autoscaler work perfectly together. If the load is low, the cluster stays small. When a storm of traffic hits, HPA begins scaling up pods to handle it. If the existing nodes can't cope, the cluster autoscaler adds new ones. And when the storm passes, everything scales back down – so you're not left paying for unused capacity. Instead, you only provision and pay for the resources you need at any given moment. 

 

6. Set the proper resource requests and limits 

Efficient resource allocation starts with properly setting CPU and memory requests and limits for your pods.

Can you pack in your pods as efficiently as possible to minimise resource waste with infrastructure you're already paying for? This is also known as “bin packing”. 

  • Resource requests: define the amount of CPU or memory a pod needs in order to be scheduled to a node. If your pod requests 2 GB of memory but your node only has 1 allocable gigabyte, the pod will not be scheduled. Instead, it goes into a pending state until the required resources become available.
  • Resource limits: Define the maximum CPU or memory a pod is given. If the pod’s actual usage is significantly below the resource requests – it leads to resource waste.

If limits are set too low and actual usage is above the limit, your app may get throttled or run into out-of-memory issues. Getting this balance right helps the scheduler pack pods more efficiently, reduces unused capacity, and ultimately keeps your AKS costs lower. 

 

7. Leverage Vertical Pod Autoscaler (VPA) 

Why set up up-to-date resource requests for the containers in your pods manually if you can let VPA do it for you? VPA automatically scales the pod to have more or less resources (the CPU and memory) by adjusting the pod’s requests and limits. 

This keeps your workloads right-sized and efficient without you breaking a sweat.

VPA supports 4 modes: 

  • Off: No changes are applied – only suggestions you can act on manually. 
  • Initial: Applies recommendations at pod creation only. Running pods remain unchanged. 
  • Recreate: Evicts and restarts pods if their current settings differ significantly from the recommendation. 
  • Auto: Similar to Recreate, but allows in-place updates without needing to evict pods 

 

8. Combine HPA and VPA 

You can also use the Horizontal Pod Scaler (HPA) and the Vertical Pod Scaler (VPA) together. However, there are a few things to keep in mind: 

  • VPA and HPA should not scale on the same CPU/memory metrics.
  • This combination can lead to conflicts as both autoscalers respond to the same changes.
Best practice

Let VPA handle CPU or memory adjustments, and use HPA for custom metrics to avoid overlap.  

Alternatively, keep VPA in off mode to get recommendations, while HPA handles horizontal scaling based on CPU utilisation. 

 

9. Autoscale with Kubernetes Event-Driven Autoscaling (KEDA) 

Apart from HPA and VPA there is another workload autoscaler in AKS called: KEDA – Kubernetes Event-Driven Autoscaling.

KEDA dynamically scales your application to meet demand in a sustainable and cost-efficient manner. 

Diagram showing the architecture of KEDA and how it extends Kubernetes

It can trigger scaling based on a wide range of events and data sources from Azure services, such as for example: 

  • Count of unprocessed events in Azure Event Hub 
  • Length of message queues in Azure Storage Queue 

How it works: 

  • You define a ScaledObject that specifies: 
    • The scaler you want to use 
    • The workload type to scale 
    • The min/max replicas 
  • KEDA uses a Scale Controller to monitor the ScaledObjects 
  • When scaling is needed, KEDA creates an HPA and feeds metrics into it 

KEDA allows scaling not just for Deployments, but also StatefulSets, Jobs or any custom resource you define. All in all, by leveraging KEDA, you can ensure your applications scale efficiently based on demand – leading to significant cost savings and optimised resource usage. 

 

10. Remove orphaned resources 

Provisioning new resources is pretty easy in Azure Kubernetes Services (AKS) – just as easy to oversee or forget them. Those idle resources (often hidden or forgotten) can add “quickly and quietly” to your Azure bill. Think about: 

  • Unused disks from deleted pod 
  • Forgotten load balancers 
  • Leftover public IPs from deleted clusters or workloads 

Often, organisations deal with orphaned instances – resources that “belong to everyone, but are owned by no one.” This issue is especially common in large organisations where multiple apps run at once and a centralised resource visibility is lacking. 

It’s therefore crucial to:

  • Audit and monitor your AKS environment regularly.
  • Use tags and naming conventions to trace ownership.
  • Set lifecycle policies or automation scripts to flag and clean up unused resources. 

Because in the end, what you don’t see will cost you. 

Rekenmachine

Azure Cost Scan

We'd like to help you make sense of your costs. Our in-house experts provide a professional savings recommendation, based on your current Azure cloud usage. 

Yes, I want it!

11. Use Azure Reservations 

Showing how capacity works with Azure Reservations

For consistent and predictable workloads, Azure Reservations can give you a reduction on the compute cost of up to 50–70%if done correctly.

You should consider all the variables (match them to usage patterns) and not just follow Azure Advisor blindly. 

Note

Reservations don’t always fit dynamic scaling perfectly, but if you have a baseline usage level, they’re a quick way to lower your AKS bill. 

12. Opt for Azure Savings Plans 

If reservations don’t work for you, like when your workloads are not very consistent and predictable, you can also go with Azure Savings Plans. With Savings Plans, you commit to an hourly consumption of compute and pay for that, plus any usage above.

Showing the potential savings in a bar chart comparing hourly spend on saving plans, pay-as-you-go, and unused savings over time.

If you are unsure whether to go for Savings Plans or Reservations, read this article

 

13. Use Azure Spot VMs 

For non-critical workloads, Azure Spot VMs can save you significant computing costs, up to 90%. Spot instances are Azure’s excess compute capacity offered at a discount. Therefore, the actual prices depend on supply and demand, and range from very cheap to slightly cheaper than pay-as-you-go. 

Diagram illustrating datacenter VM capacity with used and unused sections, customer VM allocations, and a graph showing unused capacity increasing with discount.

With Azure Spot Virtual Machines, you can: 

  • Set a max price → VM stops or is deleted when price goes over 
  • Or just use spot VMs as long as there’s supply 

However, this should only be used for Dev/Test or non-critical environments, not for production. 

 

14. Leverage Azure Hybrid Benefits 

If you have Linux node pools, you can skip this saving method. It only applies to Windows-based node pools.  

Part of the costs of your Windows-based node pools come from licensing the Windows OS, per vCPU – which means the larger the nodes, the higher the licensing fee.  

Windows VM cost structure for licensing cost and infrastructure cost in Azure Hybrid Benefit visualised

If your organisation already has on-prem Windows licences, you can use Azure Hybrid Benefit (AHB), which lets you bring existing licences to Azure. 

 

15. Leverage the free SKU for Dev/Test 

Another method to benefit from AKS is choosing either the Free or Standard control plane SKU. By default, an AKS cluster will be created with a free SKU for the control plane. 

Worth noting that it means you don’t get an SLA. While this might be okay for dev and test clusters, it’s not for production clusters. For them, enable the Standard SKU for production environments.

 

Closing thoughts

We’ve discussed how you can use several optimisation techniques to reduce your AKS compute costs (like choosing the right VM sizes) and taking advantage of Microsoft's discount plans. We also showed how you can benefit from various autoscaling options in AKS (workload and infrastructure-wise).

To that end, the more you lean on autoscaling to rightsize, the more efficiently you can bin pack and drive cost savings. 

Ultimately, your chosen methods will most likely depend on your application scenario and organisational needs.  

Working Jack

Get in Touch!

Are you looking for ways to save on your Azure costs? Contact us and we will happily help you out.