News

Secure your data science models in Azure 

In this article, we explain how to secure your data science models in Azure.

Published: 18 January 2023

This article will explain how to secure your data science models on Azure properly. By doing so, we will focus on the following topics:  

  • What are machine learning model attacks? 
  • Why machine learning model attacks exist
  • What is the impact of a hack?
  • How to prevent and mitigate data science model attacks 

 

Are you curious to explore the world of data science but don’t know where to start? We recommend you begin with our introductory article to familiarize yourself with data science. Or learn how to secure your current data environment.

 

What are machine learning model attacks?  

What’s in the name? As you might have already guessed, a machine learning model attack attacks your machine learning model. Let’s put ourselves in the shoes of a hacker to understand what such an attack is and why these types of attacks exist in the first place.  

A hacker can attack your data science models for a multitude of reasons during a variety of phases.  We can divide machine learning model attacks into three pillars, (1) Confidentiality Attacks, (2) Integrity Attacks, and (3) Availability Attacks.  

 

1. Confidentiality attacks 

This attack is about extracting sensitive information from your data science system based on your model.

An example of a confidentiality attack is a Model Stealing attack. This attack happens in the production phase of your machine-learning model. When you’ve found a competitor’s model you want to own or use, you could attack this target model by querying it with samples and checking how it responds. You can reverse engineer the target model when the responses are what you expect.  

2. Integrity attacks 

This attack wants the model to make a mistake and fool the model quietly. You will have a hard time noticing that your model suddenly classifies a red traffic light as a green one which could cause traffic issues.  
 
An example of an integrity attack is an Evasion Attack. This attack happens during the inferencing phase of your data science model. The attack focuses on creating false input for your model. To the model, it will ‘look’ like standard input. But actually, the given information is humanly impossible. This attack is common in computer vision api, where images are used. What happens is that the pixels of a picture are changed right before it’s uploaded in the computer vision API. This will ensure that the computer vision API fails to classify the result once you uploaded the ‘changed pixel’ picture. For example, A cat is no longer classified as a cat by your model, which could cause tremendous problems.

3. Availability attacks 

This attack focuses on taking your whole system down. You could poison data to the point that the way your machine learning model learns is so wrong that the model itself is garbage. Hence, there is no other way than to shut it down.  

An example of an availability attack is a Data Poisoning attack. These attacks attack the integrity of your models by modifying the impact. This attack is focused on creating inputs that are accepted as training data for your model, but that shouldn’t be taken as training data. However, as you can imagine, when you have terabytes of image data in your storage or database, when someone adds ‘wrong’ or poisoned images that your model will use for training, it’s probably not going to get noticed. Hence, your model is now trained with poisoned data which impacts the correctness of your prediction output.  

Read more about threat modeling in AI-/ ML.

 

Why machine learning model attacks exist  

Even though all these types of attacks seem scary, we might ask ourselves: why would we attack data science models in the first place? The reason why someone would attack your data science model, largely depends on the type of attack. To get some understanding of this, let’s consider a few examples: 

 

  • An attacker might want to reduce their time to develop their own models. In that case, they would use a Model Stealing attack and try to duplicate your model.

  • Some attackers have a more financial motive. Organizations pay good money for people that perform data poisoning attacks during training of your model to ensure that once the model is deployed, it’s more robust.

  • Other attackers hired by organizations have a motive of integrity that they want to accomplish. They can use evasion attacks to ensure the robustness of the deployed model.

 

What is the impact of a hack?  

  
The impact of a hack on your data science models can be both financial and reputational. 
 

Financial impact  

Firstly, a hack can have a substantial financial impact. Think about it. Who wants never-ending lawsuits from victims draining your bank account? Consider this example: you’re working in a safety-critical domain and depend on accurate classification of images (let’s automatically classify broken bones from MRI images). The evasion attacks happen, and a wrong classification of your MRI has caused a misdiagnosis for a patient, putting their health at risk.

Reputational damage  

 An attack on your model can cause you to be held accountable for many reasons, including discrimination or privacy breaches. These, of course, will lead to great reputational damages. For example, by ingesting the wrong training data, your model will only show negative ads to specific people based on their religion or heredity.  

 

Less time is needed to create models, and financial benefits and testing integrity are all motives to hack a model.
Reasons to hack models

How to prevent and mitigate data science model attacks 

Given a machine learning model hack's considerable impact, we want to try to prevent these attacks to the best of our ability!  

An ounce of prevention is worth a pound of cure. This is also true for securing data science models. So, how can we prevent attacks instead of just dealing with the aftermath? Unfortunately, there is no single solution or 100% guaranteed prevention rate. But luckily for us, there are steps we can take to make our data science models more secure. Think of the following concepts:  

  • Processes 
  • Testing 
  • Data Security 
  • Diversity 

Processes  

First, always make sure you have security processes in case of emergency and a general cyber security strategy. Ensure you have a well-developed and tested incident recovery process to know what to do when you’ve been attacked. Create controls that can delay or stop processing your model for you to start debugging classifiers. Also, consider creating impact assessments and know who to inform when things go south. Luckily for us, Microsoft has a guide we can use find the guide here. 

Testing  

Secondly, ensure you’ve tested your models extensively before putting them into production. Your model should be robust; hence, when you’re training, create inputs that trigger wrong responses from the model. Ensure you have anomaly detection mechanisms in place once you’ve deployed your model.  In addition, as you develop your model pipeline, do a penetration test to see which phase of your pipeline is vulnerable.   

Data Security  

Third, make sure that you have verified and cross-checked your data properly. During training, ensure you’re using inclusive data. In addition, make sure to properly secure your data, anonymizing sensitive datasets. Be aware of using public data sets, or open-source sets, as they’re harder to secure. Whenever possible, focus on datasets that verified companies have used.   

Diversity  

Lastly, create and run multiple models to recognize attacks. Different models can have a different focus on attacks, such as attributes, classifiers, or file types. This diversity will ensure robustness, making it more difficult for attackers to find weaknesses in your MLOps system.    

In addition to these concepts, tools are available that help assess the security of your machine-learning models. For example, you could utilize Counterfit to assess the security of your machine-learning models. Counterfeit creates a generic automation layer for assessing the security of machine learning systems. The tool is an environment (any cloud), model and data (any type) agnostic so can be used by all. It helps you perform security risk assessments to ensure that your models are reliable, robust, and can be trusted. If you want to know more, check out this.  

 

Conclusion  

This article has been your stepping stone in understanding and recognizing data science model attacks. We have given you an idea of how to mitigate and strategize a model attack. Seems like a lot to handle? Don’t panic! At Intercept, we are ready to help you find and build solutions to prevent attacks! Following the DLM approach, Intercept tackles your Data Science project step by step. Together with our Data Scientist, we can guide you through your business requirements, translating them into a Data Design. This Data Design acts as the blueprint for your Data Science project. 

Schedule a meeting below  to identify what challenges you’d like to solve with Data Science.   

Benieuwd wat we voor u kunnen betekenen?

Request a Data Design or Second Opinion now

Find out what ADF, Azure Databricks, Azure Synapse Analytics, or a combination of these tools can do for your company.