In this article, we’ll tackle the following subjects:
- Why secure your data science environment in Azure?
- How to secure your environment in Azure?
Note: Are you curious to explore the world of data science but don’t know where to start? We recommend you start with our introductory article to familiarize yourself with data science. Do you want to learn how to move your current data science environment to Azure? Please click here to find out more!
Azure offers us a multitude of tools to secure your data science environment. At Intercept, the most often used tools are:
- Identity and Access Management (IAM)
- Networking
- Azure Monitor
- Azure Log Analytics
- Azure Policy
- Defender for Cloud
Please read on if you want to get an idea of these tools and how to use them.
Identity and Access Management (IAM)
IAM on Azure helps us define roles for our data science infrastructure users. Roles can be distributed on subscription, resource group, or individual resource level. In the portal, roles can be assigned to users, (AD) groups, Service Principals or Managed Identities.
Please consider this example: some users in your data science team are only allowed to view Azure Machine Learning. Hence, they will receive a “Reader” role for the Azure Machine Learning resource. Administrators might be allowed to make modifications to the Azure Machine Learning resource and will therefore receive the “Contributor” role.
All in all, IAM helps us limit users' access rights, following the principle of least privilege which prevents unauthorized access.
Networking
Everything we do virtually and on Azure needs an authorized connection and ability to communicate, refusing connections and communications that shouldn’t be authorized. From accessing our data science environment to accessing and connecting data sources. But how do you get started? Two important things to consider when configuring networking on Azure are:
- Azure firewall rules;
- Azure Private Link
Azure Firewall
When it comes to Azure Firewalls, you oversee whitelisting addresses, for example, for who is authorized to connect to the data science environment. It’s important to understand that you have to configure this yourself, which can be changed in the future or when an employee leaves your company. When it comes to the communication of, for example, a storage account to your azure data science environment through Azure Machine Learning, the configuration is based on azure security standards, and protocols could change for communication per azure tool.
Azure Private Link
Another component of network security is securing your Azure service resources in your data science environment with virtual networks using Azure Private Link. This service accesses Azure Machine Learning using a private endpoint in the Virtual Network. Azure Private Link, with private endpoints, is easy to set up and manage, and ensures that your Azure resource is secured, can be privately accessed on Azure, and is protected from data leakage, all through a simple workflow.
Securing your networking components will prevent your network from infiltrating and start before it’s too late.