John Downs

Building Human-Focused Software

Filtering by Tag: Security

Automatic Key Rotation for Azure Services

This post was originally published on the Kloud blog.

Securely managing keys for services that we use is an important, and sometimes difficult, part of building and running a cloud-based application. In general I prefer not to handle keys at all, and instead rely on approaches like managed service identities with role-based access control, which allow for applications to authenticate and authorise themselves without any keys being explicitly exchanged. However, there are a number of situations where do we need to use and manage keys, such as when we use services that don't support role-based access control. One best practice that we should adopt when handling keys is to rotate (change) them regularly.

Key rotation is important to cover situations where your keys may have compromised. Common attack vectors include keys having been committed to a public GitHub repository, a log file having a key accidentally written to it, or a disgruntled ex-employee retaining a key that had previously been issued. Changing the keys means that the scope of the damage is limited, and if keys aren't changed regularly then these types of vulnerability can be severe.

In many applications, keys are used in complex ways and require manual intervention to rotate. But in other applications, it's possible to completely automate the rotation of keys. In this post I'll explain one such approach, which rotates keys every time the application and its infrastructure components are redeployed. Assuming the application is deployed regularly, for example using a continuous deployment process, we will end up rotating keys very frequently.

Approach

The key rotation process I describe here relies on the fact that the services we'll be dealing with - Azure Storage, Cosmos DB, and Service Bus - have both a primary and a secondary key. Both keys are valid for any requests, and they can be changed independently of each other. During each release we will pick one of these keys to use, and we'll make sure that we only use that one. We'll deploy our application components, which will include referencing that key and making sure our application uses it. Then we'll rotate the other key.

The flow of the script is as follows:

  1. Decide whether to use the primary key or the secondary key for this deployment. There are several approaches to do this, which I describe below.

  2. Deploy the ARM template. In our example, the ARM template is the main thing that reads the keys. The template copies the keys into an Azure Function application's configuration settings, as well as into a Key Vault. You could, of course, output the keys and have your deployment script put them elsewhere if you want to.

  3. Run the other deployment logic. For our simple application we don't need to do anything more than run the ARM template deployment, but for many deployments you might copy your application files to a server, swap the deployment slots, or perform a variety of other actions that you need to run as part of your release.

  4. Test the application is working. The Azure Function in our example will perform some checks to ensure the keys are working correctly. You might also run other 'smoke tests' after completing your deployment logic.

  5. Record the key we used. We need to keep track of the keys we’ve used in this deployment so that the next deployment can use the other one.

  6. Rotate the other key. Now we can rotate the key that we are not using. The way that we rotate keys is a little different for each service.

  7. Test the application again. Finally, we run one more check to ensure that our application works. This is mostly a last check to ensure that we haven't accidentally referenced any other keys, which would break our application now that they've been rotated.

We don't rotate any keys until after we've already switched the application to using the other set of keys, so we should never end up in a situation where we've referenced the wrong keys from the Azure Functions application. However, if we wanted to have a true zero-downtime deployment then we could use something like deployment slots to allow for warming up our application before we switch it into production.

A Word of Warning

If you're going to apply this principle in this post or the code below to your own applications, it's important to be aware of an important limitation. The particular approach described here only works if your deployments are completely self-contained, with the keys only used inside the deployment process itself. If you provide keys for your components to any other systems or third parties, rotating keys in this manner will likely cause their systems to break.

Importantly, any shared access signatures and tokens you issue will likely be broken by this process too. For example, if you provide third parties with a SAS token to access a storage account or blob, then rotating the account keys will cause the SAS token to be invalidated. There are some ways to avoid this, including generating SAS tokens from your deployment process and sending them out from there, or by using stored access policies; these approaches are beyond the scope of this post.

The next sections provide some detail on the important steps in the list above.

Step 1: Choosing a Key

The first step we need to perform is to decide whether we should use the primary or secondary keys for this deployment. Ideally each deployment would switch between them - so deployment 1 would use the primary keys, deployment 2 the secondary, deployment 3 the primary, deployment 4 the secondary, etc. This requires that we store some state about the deployments somewhere. Don’t forget, though, that the very first time we deploy the application we won’t have this state set. We need to allow for this scenario too.

The option that I’ve chosen to use in the sample is to use a resource group tag. Azure lets us use tags to attach custom metadata to most resource types, as well as to resource groups. I’ve used a custom tag named CurrentKeys to indicate whether the resources in that group currently use the primary or secondary keys.

There are other places you could store this state too - some sort of external configuration system, or within your release management tool. You could even have your deployment scripts look at the keys currently used by the application code, compare them to the keys on the actual target resources, and then infer which key set is being used that way.

A simpler alternative to maintaining state is to randomly choose to use the primary or secondary keys on every deployment. This may sometimes mean that you end up reusing the same keys repeatedly for several deployments in a row, but in many cases this might not be a problem, and may be worth the simplicity of not maintaining state.

Step 2: Deploy the ARM Template

Our ARM template includes the resource definitions for all of the components we want to create - a storage account, a Cosmos DB account, a Service Bus namespace, and an Azure Function app to use for testing. You can see the full ARM template here.

Note that we are deploying the Azure Function application code using the ARM template deployment method.

Additionally, we copy the keys for our services into the Azure Function app's settings, and into a Key Vault, so that we can access them from our application.

Step 4: Testing the Keys

Once we've finished deploying the ARM template and completing any other deployment steps, we should test to make sure that the keys we're trying to use are valid. Many deployments include some sort of smoke test - a quick test of core functionality of the application. In this case, I wrote an Azure Function that will check that it can connect to the Azure resources in question.

Testing Azure Storage Keys

To test connectivity to Azure Storage, we run a query against the storage API to check if a blob container exists. We don't actually care if the container exists or not; we just check to see if we can successfully make the request:

Testing Cosmos DB Keys

To test connectivity to Cosmos DB, we use the Cosmos DB SDK to try to retrieve some metadata about the database account. Once again we're not interested in the results, just in the success of the API call:

Testing Service Bus Keys

And finally, to test connectivity to Service Bus, we try to get a list of queues within the Service Bus namespace. As long as we get something back, we consider the test to have passed:

You can view the full Azure Function here.

Step 6: Rotating the Keys

One of the last steps we perform is to actually rotate the keys for the services. The way in which we request key rotations is different depending on the services we're talking to.

Rotating Azure Storage Keys

Azure Storage provides an API that can be used to regenerate an account key. From PowerShell we can use the New-AzureRmStorageAccountKey cmdlet to access this API:

Rotating Cosmos DB Keys

For Cosmos DB, there is a similar API to regenerate an account key. There are no first-party PowerShell cmdlets for Cosmos DB, so we can instead a generic Azure Resource Manager cmdlet to invoke the API:

Rotating Service Bus Keys

Service Bus provides an API to regenerate the keys for a specified authorization rule. For this example we're using the default RootManageSharedAccessKey authorization rule, which is created automatically when the Service Bus namespace is provisioned. The PowerShell cmdlet New-AzureRmServiceBusKey can be used to access this API:

You can see the full script here.

Conclusion

Key management and rotation is often a painful process, but if your application deployments are completely self-contained then the process described here is one way to ensure that you continuously keep your keys changing and up-to-date.

You can download the full set of scripts and code for this example from GitHub.

Demystifying Managed Service Identities on Azure

This post was originally published on the Kloud blog.

Managed service identities (MSIs) are a great feature of Azure that are being gradually enabled on a number of different resource types. But when I'm talking to developers, operations engineers, and other Azure customers, I often find that there is some confusion and uncertainty about what they do. In this post I will explain what MSIs are and are not, where they make sense to use, and give some general advice on how to work with them.

What Do Managed Service Identities Do?

A managed service identity allows an Azure resource to identify itself to Azure Active Directory without needing to present any explicit credentials. Let's explain that a little more.

In many situations, you may have Azure resources that need to securely communicate with other resources. For example, you may have an application running on Azure App Service that needs to retrieve some secrets from a Key Vault. Before MSIs existed, you would need to create an identity for the application in Azure AD, set up credentials for that application (also known as creating a service principal), configure the application to know these credentials, and then communicate with Azure AD to exchange the credentials for a short-lived token that Key Vault will accept. This requires quite a lot of upfront setup, and can be difficult to achieve within a fully automated deployment pipeline. Additionally, to maintain a high level of security, the credentials should be changed (rotated) regularly, and this requires even more manual effort.

With an MSI, in contrast, the App Service automatically gets its own identity in Azure AD, and there is a built-in way that the app can use its identity to retrieve a token. We don't need to maintain any AD applications, create any credentials, or handle the rotation of these credentials ourselves. Azure takes care of it for us.

It can do this because Azure can identify the resource - it already knows where a given App Service or virtual machine 'lives' inside the Azure environment, so it can use this information to allow the application to identify itself to Azure AD without the need for exchanging credentials.

What Do Managed Service Identities Not Do?

Inbound requests: One of the biggest points of confusion about MSIs is whether they are used for inbound requests to the resource or for outbound requests from the resource. MSIs are for the latter - when a resource needs to make an outbound request, it can identify itself with an MSI and pass its identity along to the resource it's requesting access to.

MSIs pair nicely with other features of Azure resources that allow for Azure AD tokens to be used for their own inbound requests. For example, Azure Key Vault accepts requests with an Azure AD token attached, and it evaluates which parts of Key Vault can be accessed based on the identity of the caller. An MSI can be used in conjunction with this feature to allow an Azure resource to directly access a Key Vault-managed secret.

Authorization: Another important point is that MSIs are only directly involved in authentication, and not in authorization. In other words, an MSI allows Azure AD to determine what the resource or application is, but that by itself says nothing about what the resource can do. For some Azure resources this is Azure's own Identity and Access Management system (IAM). Key Vault is one exception - it maintains its own access control system, and is managed outside of Azure's IAM. For non-Azure resources, we could communicate with any authorisation system that understands Azure AD tokens; an MSI will then just be another way of getting a valid token that an authorisation system can accept.

Another important point to be aware of is that the target resource doesn't need to run within the same Azure subscription, or even within Azure at all. Any service that understands Azure Active Directory tokens should work with tokens for MSIs.

How to Use MSIs

Now that we know what MSIs can do, let's have a look at how to use them. Generally there will be three main parts to working with an MSI: enabling the MSI; granting it rights to a target resource; and using it.

  1. Enabling an MSI on a resource. Before a resource can identify itself to Azure AD,it needs to be configured to expose an MSI. The way that you do this will depend on the specific resource type you're enabling the MSI on. In App Services, an MSI can be enabled through the Azure Portal, through an ARM template, or through the Azure CLI, as documented here. For virtual machines, an MSI can be enabled through the Azure Portal or through an ARM template. Other MSI-enabled services have their own ways of doing this.

  2. Granting rights to the target resource. Once the resource has an MSI enabled, we can grant it rights to do something. The way that we do this is different depending on the type of target resource. For example, Key Vault requires that you configure its Access Policies, while to use the Event Hubs or the Azure Resource Manager APIs you need to use Azure's IAM system. Other target resource types will have their own way of handling access control.

  3. Using the MSI to issue tokens. Finally, now that the resource's MSI is enabled and has been granted rights to a target resource, it can be used to actually issue tokens so that a target resource request can be issued. Once again, the approach will be different depending on the resource type. For App Services, there is an HTTP endpoint within the App Service's private environment that can be used to get a token, and there is also a .NET library that will handle the API calls if you're using a supported platform. For virtual machines, there is also an HTTP endpoint that can similarly be used to obtain a token. Of course, you don't need to specify any credentials when you call these endpoints - they're only available within that App Service or virtual machine, and Azure handles all of the credentials for you.

Finding an MSI's Details and Listing MSIs

There may be situations where we need to find our MSI's details, such as the principal ID used to represent the application in Azure AD. For example, we may need to manually configure an external service to authorise our application to access it. As of April 2018, the Azure Portal shows MSIs when adding role assignments, but the Azure AD blade doesn't seem to provide any way to view a list of MSIs. They are effectively hidden from the list of Azure AD applications. However, there are a couple of other ways we can find an MSI.

If we want to find a specific resource's MSI details then we can go to the Azure Resource Explorer and find our resource. The JSON details for the resource will generally include an identity property, which in turn includes a principalId:

1-resource.png

That principalId is the client ID of the service principal, and can be used for role assignments.

Another way to find and list MSIs is to use the Azure AD PowerShell cmdlets. The Get-AzureRmADServicePrincipal cmdlet will return back a complete list of service principals in your Azure AD directory, including any MSIs. MSIs have service principal names starting with https://identity.azure.net, and the ApplicationId is the client ID of the service principal:

2-applicationid.png

Now that we've seen how to work with an MSI, let's look at which Azure resources actually support creating and using them.

Resource Types with MSI and AAD Support

As of April 2018, there are only a small number of Azure services with support for creating MSIs, and of these, currently all of them are in preview. Additionally, while it's not yet listed on that page, Azure API Management also supports MSIs - this is primarily for handling Key Vault integration for SSL certificates.

One important note is that for App Services, MSIs are currently incompatible with deployment slots - only the production slot gets assigned an MSI. Hopefully this will be resolved before MSIs become fully available and supported.

As I mentioned above, MSIs are really just a feature that allows a resource to assume an identity that Azure AD will accept. However, in order to actually use MSIs within Azure, it's also helpful to look at which resource types support receiving requests with Azure AD authentication, and therefore support receiving MSIs on incoming requests. Microsoft maintain a list of these resource types here.

Example Scenarios

Now that we understand what MSIs are and how they can be used with AAD-enabled services, let's look at a few example real-world scenarios where they can be used.

Virtual Machines and Key Vault

Azure Key Vault is a secure data store for secrets, keys, and certificates. Key Vault requires that every request is authenticated with Azure AD. As an example of how this might be used with an MSI, imagine we have an application running on a virtual machine that needs to retrieve a database connection string from Key Vault. Once the VM is configured with an MSI and the MSI is granted Key Vault access rights, the application can request a token and can then get the connection string without needing to maintain any credentials to access Key Vault.

API Management and Key Vault

Another great example of an MSI being used with Key Vault is Azure API Management. API Management creates a public domain name for the API gateway, to which we can assign a custom domain name and SSL certificate. We can store the SSL certificate inside Key Vault, and then give Azure API Management an MSI and access to that Key Vault secret. Once it has this, API Management can automatically retrieve the SSL certificate for the custom domain name straight from Key Vault, simplifying the certificate installation process and improving security by ensuring that the certificate is not directly passed around.

Azure Functions and Azure Resource Manager

Azure Resource Manager (ARM) is the deployment and resource management system used by Azure. ARM itself supports AAD authentication. Imagine we have an Azure Function that needs to scan our Azure subscription to find resources that have recently been created. In order to do this, the function needs to log into ARM and get a list of resources. Our Azure Functions app can expose an MSI, and so once that MSI has been granted reader rights on the resource group, the function can get a token to make ARM requests and get the list without needing to maintain any credentials.

App Services and Event Hubs/Service Bus

Event Hubs is a managed event stream. Communication to both publish onto, and subscribe to events from, the stream can be secured using Azure AD. An example scenario where MSIs would help here is when an application running on Azure App Service needs to publish events to an Event Hub. Once the App Service has been configured with an MSI, and Event Hubs has been configured to grant that MSI publishing permissions, the application can retrieve an Azure AD token and use it to post messages without having to maintain keys.

Service Bus provides a number of features related to messaging and queuing, including queues and topics (similar to queues but with multiple subscribers). As with Event Hubs, an application could use its MSI to post messages to a queue or to read messages from a topic subscription, without having to maintain keys.

App Services and Azure SQL

Azure SQL is a managed relational database, and it supports Azure AD authentication for incoming connections. A database can be configured to allow Azure AD users and applications to read or write specific types of data, to execute stored procedures, and to manage the database itself. When coupled with an App Service with an MSI, Azure SQL's AAD support is very powerful - it reduces the need to provision and manage database credentials, and ensures that only a given application can log into a database with a given user account. Tomas Restrepo has written a great blog post explaining how to use Azure SQL with App Services and MSIs.

Summary

In this post we've looked into the details of managed service identities (MSIs) in Azure. MSIs provide some great security and management benefits for applications and systems hosted on Azure, and enable high levels of automation in our deployments. While they aren't particularly complicated to understand, there are a few subtleties to be aware of. As long as you understand that MSIs are for authentication of a resource making an outbound request, and that authorisation is a separate thing that needs to be managed independently, you will be able to take advantage of MSIs with the services that already support them, as well as the services that may soon get MSI and AAD support.