Eviction Types and Policies
Eviction Types
When Azure evicts a Spot Virtual Machine, the VM becomes unavailable, and any workloads running on it are disrupted. Azure will do this for two reasons:
- When Azure needs additional capacity, it must reallocate resources for higher-priority workloads (e.g. PAYG resources, capacity reservations).
- When the current spot VM price is higher than the price threshold you set.
Yet, Microsoft is flexible; you can decide on the eviction types and policies.
Concerning eviction type, you’ve got two options:
- Capacity only: Selecting this option means Microsoft will evict your virtual machines when Azure starts to grow into this previously unreserved capacity.
- Price or capacity: Microsoft lets you set a maximum price you're willing to pay, so you can retain the VMs for weeks, even if there is a slight increase in the spot prices. They’ll only reclaim the capacity when Azure runs out of unused capacity, pushing costs up, which in turn exceeds your set maximum price.
Eviction Policy
When you deploy an Azure Spot Instance, you must also define an eviction policy. You may choose how to trigger your evictions or how to determine when the eviction should happen. There are two options:
- Stop / Deallocate: When using a single VM or VM scale set, you can set your eviction policies to deallocate. If the VM is deallocated your disk and your network continue to persist. The data stored on the persistent disk(s) is not deleted. All temporary disk data gets deleted like any VM restart or shutdown.
- Delete: You also have the option to delete the VM. In this case, the VM will be deleted, and all the associated resources are also deleted. This includes all data stored on any attached disks.
When to use which eviction policy
The table below illustrates when to choose which eviction policy, comparing when to delete vs when to deallocate /stop.
Policy types |
When to use the policy? |
Delete |
For ephemeral compute and data. |
When you don't want to pay for disks. |
For minimal budget scenarios. |
Deallocate / Stop |
Use when a specific VM size is required. |
When location flexibility isn't an option. |
For long application installation processes. |
For situations with indefinite wait times. |
Not purely driven by cost savings. |
What should you consider when using Spot Virtual Machines?
You can start utilising Azure spot compute instances without re-architecting. Although it can be a flexible, scalable and cost-effective purchasing option for Virtual Machines, you must know that it has its drawbacks:
- Microsoft can take capacity back from their on-demand customers at any time.
- You’ll receive a 30-second warning before Microsoft takes back the server from you. This is subject to best efforts, and you must opt-in to receive these notifications.
- Terminations are based on the availability of capacity and the maximum configured price.
- Microsoft Azure doesn’t provide SLAs for these VMs.
- Not all VM sizes are supported (such as B-series, promotional SKUs like as Dv2, NV, etc.)
- You can't use Reserved Instances with Azure spot virtual machines
- Ephemeral OS disks are not supported.
- Capacity availability depends on region, size and time.
- Deallocated VMs can be attempted to be brought back online but will only be successful if there is sufficient unutilised capacity.
Now, you may think that interruptions will to take your servers away all the time and not allow you to complete your work. Yet, Microsoft says that less than 10% of the time, Azure interrupts and reclaims capacity.
While you must always be prepared to handle interruptions, you are more likely to turn off the virtual machines yourself.
Use Cases for Azure Spot VMs
Spot virtual machines excel in various use cases where interruptible workloads do not need to be completed within a specific timeframe, such as:
High-performance computing scenarios
In high-performance computing (grid computing or high-throughput workloads) Spot VMs are widely used. These workloads are often loosely coupled, meaning a single node failure won’t take down the entire cluster. Like big data, the work can be reprocessed if an instance is interrupted. The time lost is often insignificant compared to the cost savings you get.
Batch processing jobs and visual rendering applications
They are great for batch jobs, which are automatic tasks, often processed in large groups – “batches”. These tasks have flexible timing, so pausing or starting them won’t affect the workload.
Dev/Test environment
Useful for temporary environments where you don’t need uptime. Suppose you want to do load testing for a new web app to see how it performs under high traffic. You then need to spin up multiple virtual machines to simultaneously simulate thousands of users hitting the site.
With Spot VMs, you can:
- Spin up many test instances for a fraction of the cost.
- Live with interruptions since the test can be restarted if needed.
- Scale up instances dynamically without worrying about high costs.
Since load testing is temporary and doesn’t require uptime, spot instances let you test efficiently without overspending on cloud resources.
CI/CD Pipelines
Spot VMs are great for continuous integration and deployment (CI/CD). Since testing and build processes can handle failures, Spot instances introduce another type of failure to manage. If your pipelines can handle interruptions, Spot VMs can save you a lot of infrastructure costs.
AI and Machine Learning
You can train models using Spot Virtual Machines and achieve significant cost-savings on compute costs. Let’s say you want to test a machine learning (ML) model (AI). Then you need to test various hyperparameters to find the best configuration. This means running multiple training jobs, but each job is independent and doesn’t need to run continuously or in parallel.
By using Spot VMs, you can:
- Run multiple training experiments at a much lower cost.
- Tolerate interruptions since failed jobs can simply be retried.
- Save up to 90% compared to using standard VMs.
If an instance is interrupted, the system can resume from the last checkpoint, minimising lost work while maximising cost savings.
Large-scale stateless apps
You might think these VMs aren’t suitable for web services because websites need to be up 24/7. But if you design a stateless and scalable web architecture, Spot VMs can work. Even if an instance is taken away, existing requests will be completed, and future requests can be routed to other available instances. So, Spot VMs are suitable for large-scale load-balanced web apps.
Fault-tolerant applications
Apps with built-in redundancy and failover can use Azure Spot VMs. These workloads can handle interruptions by automatically shifting to other available instances.
Examples include:
- Distributed databases
- Microservices architectures with auto-recovery
- Large-scale caching systems