News

Secure your data science environment Azure 

In this article, we explain how to secure your data science environment. Better safe than sorry!

Published: 06 February 2023

In this article, we’ll tackle the following subjects: 

  1. Why secure your data science environment in Azure?
  2. How to secure your environment in Azure?

Note: Are you curious to explore the world of data science but don’t know where to start? We recommend you start with our introductory article to familiarize yourself with data science. Do you want to learn how to move your current data science environment to Azure? Please click here to find out more!   

Azure offers us a multitude of tools to secure your data science environment. At Intercept, the most often used tools are:   

  • Identity and Access Management (IAM)
  • Networking
  • Azure Monitor
  • Azure Log Analytics
  • Azure Policy
  • Defender for Cloud

Please read on if you want to get an idea of these tools and how to use them.

 

Identity and Access Management (IAM) 

IAM on Azure helps us define roles for our data science infrastructure users. Roles can be distributed on subscription, resource group, or individual resource level. In the portal, roles can be assigned to users, (AD) groups, Service Principals or Managed Identities.  

Please consider this example: some users in your data science team are only allowed to view Azure Machine Learning. Hence, they will receive a “Reader” role for the Azure Machine Learning resource. Administrators might be allowed to make modifications to the Azure Machine Learning resource and will therefore receive the “Contributor” role.    

All in all, IAM helps us limit users' access rights, following the principle of least privilege which prevents unauthorized access.

 

Networking 

Everything we do virtually and on Azure needs an authorized connection and ability to communicate, refusing connections and communications that shouldn’t be authorized. From accessing our data science environment to accessing and connecting data sources. But how do you get started? Two important things to consider when configuring networking on Azure are: 

  1. Azure firewall rules;
  2. Azure Private Link

Azure Firewall 

When it comes to Azure Firewalls, you oversee whitelisting addresses, for example, for who is authorized to connect to the data science environment. It’s important to understand that you have to configure this yourself, which can be changed in the future or when an employee leaves your company. When it comes to the communication of, for example, a storage account to your azure data science environment through Azure Machine Learning, the configuration is based on azure security standards, and protocols could change for communication per azure tool.   

 

Azure Private Link 

Another component of network security is securing your Azure service resources in your data science environment with virtual networks using Azure Private Link. This service accesses Azure Machine Learning using a private endpoint in the Virtual Network. Azure Private Link, with private endpoints, is easy to set up and manage, and ensures that your Azure resource is secured, can be privately accessed on Azure, and is protected from data leakage, all through a simple workflow.  

Securing your networking components will prevent your network from infiltrating and start before it’s too late. 

 

An example of describing a business rule in a policy definition can be that due to regulatory compliance standards, your business needs to control the physical location of the deployment of resources, as some locations aren’t allowed to gain access. Hence, you can use or create a ‘location’ policy such that users can only deploy resources in West Europe, but not in China, for example.
Azure Policy

Azure Monitor 

Azure Monitor is used on Azure to collect, analyze, and act on telemetry data gathered from your resources in the Azure cloud. Azure Monitor enables you to proactively act upon issues affecting the performance and security of your Azure resource by implementing alerts.
 
Azure monitor can detect and diagnose your applications and infrastructure issues with Application Insights and VM Insights. It enables you to create, view, and manage alerts based on metrics for your Azure resource, for example, when a model deployment has failed or you have unusable nodes. As this is detected, you can drill through the alert to what’s been causing this, giving you troubleshooting and diagnostics through Log Analytics integrations. Another feature of using Azure monitor is change analysis, which detects resource changes on a subscription level, helping us to understand the cause of the issue. The first step in fixing your security is to become aware of it, and with the capabilities of Azure Monitor, you can do so.  

 

Azure Log Analytics in Azure Monitor 

Azure Log Analytics is used for querying data gathered by the Azure Monitor. Azure Log Analytics consists of features such as filter and sort, making analyzing the log store from Azure Monitor much easier. Querying with Azure Log Analytics can be done by utilizing the Kusto Query Language (KQL). Log Analytics also has more advanced features available to create statistical analyses of the data as well as visualization for trend analysis. 

 

Azure Policy  

When you need to assess the compliance of your data science environment, Azure Policy is the tool for you! 
 
Azure Policy works by evaluating resources and comparing them to specific business rules, described in a policy definition in JSON format. These policies can be defined by yourself. If you have multiple business rules, it also gives you the possibility to group them and create a policy set/initiative. An example of describing a business rule in a policy definition can be that due to regulatory compliance standards, your business needs to control the physical location of the deployment of resources as some locations aren’t allowed to gain access. Hence, you can use or create a ‘location’ policy such that users can only deploy resources in West Europe, but not in China, for example.

The definition of these rules you’ve created can then be assigned to any resource in Azure, such as resource groups, subscriptions, and resources such as Azure Machine Learning. So, do you want to be compliant and be able to assess your compliance at a large scale? Use Azure Policy.

 

Microsoft Defender for Cloud 

If you need a cloud security posture management and workload protection platform tool for your data science environment, look at Microsoft Defender for Cloud. It helps you manage the security of your resources and workload in multi-cloud environments, on-premises, or entirely on Azure. It assesses your security based on Defender for secure cloud score, such that your security development is trackable and progress is measured. It also acts as a recommendation engine, where you’ll be guided through actions you can take to become more secure, where security risks happen, and how to assess them. Alerts through Defender for the cloud are real-time, so you can immediately prevent any risks and continuously ensure your data science environment is secure.

 

How Intercept tackles the move to Azure      

Have we yet convinced you of the importance of securing your data science solutions on Azure? Or are you not sure yet where to start? At Intercept, we are ready to help! Together with our Data Scientist, we can guide you through your requirements, helping you through every step of your data journey.   

 

Benieuwd wat we voor u kunnen betekenen?

Request a Data Design or Second Opinion now

Find out what ADF, Azure Databricks, Azure Synapse Analytics, or a combination of these tools can do for your company.