Securely managing keys for services that we use is an important, and sometimes difficult, part of building and running a cloud-based application. In general I prefer not to handle keys at all, and instead rely on approaches like managed service identities with role-based access control, which allow for applications to authenticate and authorise themselves without any keys being explicitly exchanged. However, there are a number of situations where do we need to use and manage keys, such as when we use services that don't support role-based access control. One best practice that we should adopt when handling keys is to rotate (change) them regularly.
Key rotation is important to cover situations where your keys may have compromised. Common attack vectors include keys having been committed to a public GitHub repository, a log file having a key accidentally written to it, or a disgruntled ex-employee retaining a key that had previously been issued. Changing the keys means that the scope of the damage is limited, and if keys aren't changed regularly then these types of vulnerability can be severe.
In many applications, keys are used in complex ways and require manual intervention to rotate. But in other applications, it's possible to completely automate the rotation of keys. In this post I'll explain one such approach, which rotates keys every time the application and its infrastructure components are redeployed. Assuming the application is deployed regularly, for example using a continuous deployment process, we will end up rotating keys very frequently.
The key rotation process I describe here relies on the fact that the services we'll be dealing with - Azure Storage, Cosmos DB, and Service Bus - have both a primary and a secondary key. Both keys are valid for any requests, and they can be changed independently of each other. During each release we will pick one of these keys to use, and we'll make sure that we only use that one. We'll deploy our application components, which will include referencing that key and making sure our application uses it. Then we'll rotate the other key.
The flow of the script is as follows:
Decide whether to use the primary key or the secondary key for this deployment. There are several approaches to do this, which I describe below.
Deploy the ARM template. In our example, the ARM template is the main thing that reads the keys. The template copies the keys into an Azure Function application's configuration settings, as well as into a Key Vault. You could, of course, output the keys and have your deployment script put them elsewhere if you want to.
Run the other deployment logic. For our simple application we don't need to do anything more than run the ARM template deployment, but for many deployments you might copy your application files to a server, swap the deployment slots, or perform a variety of other actions that you need to run as part of your release.
Test the application is working. The Azure Function in our example will perform some checks to ensure the keys are working correctly. You might also run other 'smoke tests' after completing your deployment logic.
Record the key we used. We need to keep track of the keys we’ve used in this deployment so that the next deployment can use the other one.
Rotate the other key. Now we can rotate the key that we are not using. The way that we rotate keys is a little different for each service.
Test the application again. Finally, we run one more check to ensure that our application works. This is mostly a last check to ensure that we haven't accidentally referenced any other keys, which would break our application now that they've been rotated.
We don't rotate any keys until after we've already switched the application to using the other set of keys, so we should never end up in a situation where we've referenced the wrong keys from the Azure Functions application. However, if we wanted to have a true zero-downtime deployment then we could use something like deployment slots to allow for warming up our application before we switch it into production.
A Word of Warning
If you're going to apply this principle in this post or the code below to your own applications, it's important to be aware of an important limitation. The particular approach described here only works if your deployments are completely self-contained, with the keys only used inside the deployment process itself. If you provide keys for your components to any other systems or third parties, rotating keys in this manner will likely cause their systems to break.
Importantly, any shared access signatures and tokens you issue will likely be broken by this process too. For example, if you provide third parties with a SAS token to access a storage account or blob, then rotating the account keys will cause the SAS token to be invalidated. There are some ways to avoid this, including generating SAS tokens from your deployment process and sending them out from there, or by using stored access policies; these approaches are beyond the scope of this post.
The next sections provide some detail on the important steps in the list above.
Step 1: Choosing a Key
The first step we need to perform is to decide whether we should use the primary or secondary keys for this deployment. Ideally each deployment would switch between them - so deployment 1 would use the primary keys, deployment 2 the secondary, deployment 3 the primary, deployment 4 the secondary, etc. This requires that we store some state about the deployments somewhere. Don’t forget, though, that the very first time we deploy the application we won’t have this state set. We need to allow for this scenario too.
The option that I’ve chosen to use in the sample is to use a resource group tag. Azure lets us use tags to attach custom metadata to most resource types, as well as to resource groups. I’ve used a custom tag named
CurrentKeys to indicate whether the resources in that group currently use the primary or secondary keys.
There are other places you could store this state too - some sort of external configuration system, or within your release management tool. You could even have your deployment scripts look at the keys currently used by the application code, compare them to the keys on the actual target resources, and then infer which key set is being used that way.
A simpler alternative to maintaining state is to randomly choose to use the primary or secondary keys on every deployment. This may sometimes mean that you end up reusing the same keys repeatedly for several deployments in a row, but in many cases this might not be a problem, and may be worth the simplicity of not maintaining state.
Step 2: Deploy the ARM Template
Our ARM template includes the resource definitions for all of the components we want to create - a storage account, a Cosmos DB account, a Service Bus namespace, and an Azure Function app to use for testing. You can see the full ARM template here.
Note that we are deploying the Azure Function application code using the ARM template deployment method.
Additionally, we copy the keys for our services into the Azure Function app's settings, and into a Key Vault, so that we can access them from our application.
Step 4: Testing the Keys
Once we've finished deploying the ARM template and completing any other deployment steps, we should test to make sure that the keys we're trying to use are valid. Many deployments include some sort of smoke test - a quick test of core functionality of the application. In this case, I wrote an Azure Function that will check that it can connect to the Azure resources in question.
Testing Azure Storage Keys
To test connectivity to Azure Storage, we run a query against the storage API to check if a blob container exists. We don't actually care if the container exists or not; we just check to see if we can successfully make the request:
Testing Cosmos DB Keys
To test connectivity to Cosmos DB, we use the Cosmos DB SDK to try to retrieve some metadata about the database account. Once again we're not interested in the results, just in the success of the API call:
Testing Service Bus Keys
And finally, to test connectivity to Service Bus, we try to get a list of queues within the Service Bus namespace. As long as we get something back, we consider the test to have passed:
Step 6: Rotating the Keys
One of the last steps we perform is to actually rotate the keys for the services. The way in which we request key rotations is different depending on the services we're talking to.
Rotating Azure Storage Keys
Azure Storage provides an API that can be used to regenerate an account key. From PowerShell we can use the
New-AzureRmStorageAccountKey cmdlet to access this API:
Rotating Cosmos DB Keys
For Cosmos DB, there is a similar API to regenerate an account key. There are no first-party PowerShell cmdlets for Cosmos DB, so we can instead a generic Azure Resource Manager cmdlet to invoke the API:
Rotating Service Bus Keys
Service Bus provides an API to regenerate the keys for a specified authorization rule. For this example we're using the default
RootManageSharedAccessKey authorization rule, which is created automatically when the Service Bus namespace is provisioned. The PowerShell cmdlet
New-AzureRmServiceBusKey can be used to access this API:
Key management and rotation is often a painful process, but if your application deployments are completely self-contained then the process described here is one way to ensure that you continuously keep your keys changing and up-to-date.