John Downs

Building Human-Focused Software

Filtering by Tag: Azure Functions

Low-Cost Rate Limiting for Azure Functions APIs with API Management's Consumption Tier

This post was originally published on the Kloud blog.

Azure Functions can be used as a lightweight platform for building APIs. They support a number of helpful features for API developers including custom routes and a variety of output bindings that can implement complex business rules. They also have a consumption-based pricing model, which provides a low-cost, pay-per-use pricing model while you have low levels of traffic, but can scale or burst for higher levels of demand.

The Azure Functions platform also provides Azure Functions Proxies, which gives another set of features to further extend APIs built on top of Azure Functions. These features include more complex routing rules and the ability to do a small amount of request rewriting. These features have led some people to compare Azure Functions Proxies to a very lightweight API management system. However, there are a number of features of an API management platform that Azure Functions Proxies doesn't support. One common feature of an API management layer is the ability to perform rate limiting on incoming requests.

Azure API Management is a hosted API management service that provides a large number of features. Until recently, API Management's pricing model was often prohibitive for small APIs, since using it for production workloads required provisioning a service instance with a minimum of about a AUD$200 monthly cost. But Microsoft recently announced a new consumption tier for API Management. Based on a similar pricing model to Azure Functions, the consumption tier for API Management bills per request, which makes it a far more appealing choice for serverless APIs. APIs can now use features like rate limiting - and many others - without needing to invest in a large monthly expense.

In this post I'll describe how Azure Functions and the new API Management pricing tier can be used together to build a simple serverless API with rate limiting built in, and at a very low cost per transaction.

Note: this new tier is in preview, and so isn't yet ready for production workloads - but it will hopefully be generally available and supported soon. In the meantime, it's only available for previewing in a subset of Azure regions. For my testing I've been using Australia East.

Example Scenario

In this example, we'll build a simple serverless API that would benefit from rate limiting. In our example function we simulate performing some business logic to calculate shipping rates for orders. Our hypothetical algorithm is very sophisticated, and so we may later want to monetise our API to make it available for high-volume users. In the meantime we want to allow our customers to try it out a little bit for free, but we want to put limits around their use.

There may be other situations where we need rate limiting too - for example, if we have a back-end system we call into that can only cope with a certain volume of requests, or that bills us when we use it.

First, let's write a very simple function to simulate some custom business logic.

Function Code

For simplicity I'm going to write a C# script version of an Azure Function. You could easily change this to a precompiled function, or use any of the other languages that Azure Functions supports.

Our simulated function logic is as follows:

  • Receive an HTTP request with a body containing some shipping details.

  • Calculate the shipping cost.

  • Return the shipping cost.

In our simulation we'll just make up a random value, but of course we may have much more sophisticated logic in future. We could also call into other back-end functions or APIs too.

Here's our function code:

If we paste this code into the Azure Functions portal, we'll be able to try it out, and sure enough we can get a result:

API Management Policy

Now that we've got our core API function working, the next step is to put an API Management gateway in front of it so we can apply our rate limiting logic. API Management works in terms of policies that are applied to incoming requests. When we work with the consumption tier of API Management we can make use of the policy engine, although there are some limitations. Even with these limitations, policies are very powerful and let us express and enforce a lot of complex rules. A full discussion of API Management's policy system is beyond the scope of this post, but I recommend reviewing the policy documentation.

Here is a policy that we can use to perform our rate limiting:

This policy uses the caller's IP address as the rate limit key. This means that if the same IP address makes three API calls within a 15-second period, it will get rate limited and told to try again later. Of course, we can adjust the lockout time, the number of calls allowed, and even the way that we group requests together when determining the rate limit.

Because we may have additional APIs in the future that would be subject to this rate limit, we'll create an API Management product and apply the policy to that. This means that any APIs we add to that product will have this policy applied.

Securing the Connection

Of course, there's not much point in putting an API Management layer in front of our function API if someone can simply go around it and call the function directly. There are a variety of ways of securing the connection between an API Management instance and a back-end Azure Functions app, including using function keys, function host keys, and Azure AD tokens. In other tiers of API Management you can also use the IP address of the API Management gateway, but in the consumption tier we don't get any IP addresses to perform whitelisting on.

For this example we'll use the function key for simplicity. (For a real production application I'd recommend using a different security model, though.) This means that we will effectively perform a key exchange:

  • Requests will arrive into the API Management service without any keys.

  • The API Management service will perform its rate limiting logic.

  • If this succeeds, the API Management service will call into the function and pass in the function key, which only it knows.

In this way, we're treating the API Management service as a trusted subsystem - we're configuring it with the credentials (i.e. the function key) necessary to call the back-end API. Azure API Management provides a configuration system to load secrets like this, but for simplicity we'll just inject the key straight into a policy. Here's the policy we've used:

We'll inject the function key into the policy at the time we deploy the policy.

As this logic is specific to our API, we'll apply this policy to the API and not to our product.

Deploying Through an ARM Template

We'll use an ARM template to deploy this whole example. The template performs the following actions, approximately in this order:

  • Deploys the Azure Functions app.

  • Adds the shipping calculator function into the app using the deployment technique I discussed in a previous post.

  • Deploys an API Management instance using the consumption tier.

  • Creates an API in our API Management instance.

  • Configures an API operation to call into the shipping calculator function.

  • Adds a policy to the API operation to add the Azure Functions host key to the outbound request to the function.

  • Creates an API Management product for our shipping calculator.

  • Adds a rate limit policy to the product.

Here's the ARM template:

There's a lot going on here, and I recommend reading the API Management documentation for further detail on each of these. One important note is that whenever you interact with an API Management instance on the consumption tier using ARM templates, you must use API version 2018-06-01-preview or newer.

Calling our API

Now that we've deployed our API we can call it through our API Management gateway's public hostname. In my case I used Postman to make some API calls. The first few calls succeeded:

But then after I hit the rate limit, as expected I got an error response back:


Trying again 13 seconds later, the request succeeded. So we can see our API Management instance is configured correctly and is performing rate limiting as we expected.


With the new consumption tier of Azure API Management, it's now possible to have a low-cost set of API management features to deploy alongside your Azure Functions APIs. Of course, your APIs could be built on any technology, but if you are already in the serverless ecosystem then this is a great way to protect your APIs and back-ends. Plus, if your API grows to a point where you need more of the features of Azure API Management that aren't provided in the consumption tier, or if you want to switch to a fixed-cost pricing model, you can always upgrade your API Management instance to one of the higher tiers. You can do this by simply modifying the sku property on the API Management resource within the ARM template.

Integration Testing Timer-Triggered Precompiled v2 Azure Functions

This post was originally published on the Kloud blog.

In a recent post, I described a way to run integration tests against precompiled C# Azure Functions using the v2 runtime. In that post, we looked at an example of invoking an HTTP-triggered function from within an integration test.

Of course, there are plenty of other triggers available for Azure Functions too. Recently I needed to write an integration test against a timer-triggered function and decided to investigate the best way to do this.

The Azure Functions runtime provides a convenient API for invoking a timer-trigger function. You can issue an HTTP POST request against an endpoint for your function, and the function runtime will start up and trigger the function. I wasn't able to find any proper documentation about this, so this blog post is a result of some of my experimentation.

To invoke a timer-triggered function named MyFunction, you need to issue an HTTP request as follows:

Replace  with either your real Azure Functions hostname - such as - or, if you're running through the Azure Functions CLI, use localhost and the port number you're running it on, such as localhost:7071.

Interestingly, invoking this endpoint immediately returns an HTTP 201 response, but the function runs separately. This makes sense, though, since timer-trigger functions are not intended to return data in the same way that HTTP-triggered functions are.

I've created an updated version of the GitHub repository from the previous post with an example of running a test against a timer-triggered function. In this example, the function simply writes a message to an Azure Storage queue, which we can then look at to confirm the function has run. Normally the function would only run once a week, at 9.30am on Mondays, but our integration test triggers it each time it runs to verify that it works correctly.

In this version of the test fixture we also wait for the app to start before our test runs. We do this by polling the / endpoint with GET requests until it responds. This ensures that we can access the timer invocation HTTP endpoint successfully from our tests.

Of course, just like in the previous post's integration tests, a timer-triggered integration test can run from an Azure Pipelines build too, so you can include timer-triggered functions in your continuous integration and testing practices alongside the rest of your code. In fact, the same build.yaml that we used in the previous post can be used to run these tests, too.

Integration Testing Precompiled v2 Azure Functions

This post was originally published on the Kloud blog.

Azure Functions code can often contain important functionality that needs to be tested. The two most common ways of testing code are unit testing and integration testing. Unit testing runs pieces of code in isolation, and this is relatively simple to do with Azure Functions. Integration testing can be a little trickier though, and I haven't found any good documentation about how do this with version 2 of the Functions runtime. In this post I'll outline the approach I'm using to run integration tests against my Azure Functions v2 code.

In an application with a lot of business logic, unit testing may get us most of the way to verifying the code's quality. But Azure Functions code often involves pieces of functionality that can't be easily unit tested. For example, triggers, input and output bindings are very powerful features that let us avoid writing boilerplate code to bind to HTTP requests, connect to Azure Storage and Service Bus blobs, queues, and tables, or building our own timer logic. Similarly, we may need to connect to external services or databases, or work with libraries that can't be easily mocked or faked. If we want to test these parts of our Functions apps then we need some form of integration testing.

Approaches to Integration Testing

Integration tests involve running code in as close to a real environment as is practicable. The tests are generally run from our development machines or build servers. For example, ASP.NET Core lets us host an in-memory server for our application, which we can then connect to real databases, in-memory versions of systems like the Entity Framework, or emulators for services like Azure Storage and Cosmos DB.

Azure Functions v1 included some features to support integration testing of script-based Functions apps. But to date, I haven't found any guidance on how to run integration tests using a precompiled .NET Azure Functions app running against the v2 runtime.

Example Functions App

For the purposes of this post I've written a very simple Functions App with two functions that illustrate two common use cases. One function (HelloWorld) receives an HTTP message and returns a response, and the second (HelloQueue) receives an HTTP message and writes a message to a queue. The actual functions themselves are really just simple placeholders based on the Visual Studio starter function template:

In a real application you're likely to have a lot more going on than just writing to a queue, but the techniques below can be adapted to cover a range of different scenarios.

You can access all the code for this blog post on GitHub.

Implementing Integration Tests

The Azure Functions v2 core tools support local development and testing of Azure Functions apps. One of the components in this set of tools is func.dll, which lets us host the Azure Functions runtime. By automating this from our integration test project we can start our Functions app in a realistic host environment, run our tests, and tear it down again. This is ideal for running a suite of integration tests.

While you could use any test framework you like, the sample implementation I've provided uses xUnit.

Test Collection and Fixture

The xUnit framework provides a feature called collection fixtures. These fixtures let us group tests together; they also let us run initialisation code before the first test runs, and run teardown code after the last test finishes. Here's a placeholder test collection and fixture that will support our integration tests:

Starting the Functions Host

Now we have the fixture class definition, we can use it to start and stop the Azure Functions host. We will make use of the System.Diagnostics.Process class to start and stop the .NET Core CLI (dotnet.exe), which in turn starts the Azure Functions host through the func.dll library.

Note: I assume you already have the Azure Functions v2 core tools installed. You may already have these if you've got Visual Studio installed with the Azure Functions feature enabled. If not, you can install it into your global NPM packages by using the command npm install -g azure-functions-core-toolsas per the documentation here.

Our fixture code looks like this:

The code is fairly self-explanatory: during initialiation it reads various paths from configuration settings and starts the process; during teardown it kills the process and disposes of the Process object.

I haven't added any Task.Delay code, or anything to poll the function app to check if it's ready to receive requests. I haven't found this to be necessary. However, if you find that the first test run in a batch fails, this might be something you want to consider adding at the end of the fixture initialisation.


Some of the code in the above fixture file won't compile yet because we have some configuration settings that need to be passed through. Specifically, we need to know the path to the dotnet.exe file (the .NET Core CLI), the func.dll file (the Azure Functions host), and to our Azure Functions app binaries.

I've created a class called ConfigurationHelper that initialises a static property that will help us:

The Settings class is then defined with the configuration settings we need:

Then we can create an appsettings.json file to set the settings:

That's all we need to run the tests. Now we can write a couple of actual test classes.

Writing Some Tests

You can write your integration tests in any framework you want, whether that be something very simple like pure xUnit tests or a more BDD-oriented framework like SpecFlow.

Personally I like BDDfy as a middle ground - it's simple and doesn't require a lot of extra plumbing like SpecFlow, while letting us write BDD-style tests in pure code.

Here are a couple of example integration tests I've written for the sample app:

Test the HelloWorld Function

This test simply calls the HTTP-triggered HelloWorld function and checks the output is as expected.

Test the HelloQueue Function

The second test checks that the HelloQueue function posts to a queue correctly. It does this by clearing the queue before it runs, letting the HelloQueue function run, and then confirming that a single message - with the expected contents - has been enqueued.

Running the Tests

Now we can compile the integration test project and run it from Visual Studio's Test Explorer. Behind the scenes it runs the .NET Core CLI, starts the Azure Functions host, executes our tests, and then kills the host when they're finished. And we can see the tests pass!

Running from Azure Pipelines

Getting the tests running from our local development environment is great, but integration tests are most useful when they run automatically as part of our continuous integration process. (Sometimes integration tests take so long to run that they get relegated to run on nightly builds instead, but that's a topic that's outside the scope of this post.)

Most build servers should be able to run our integration tests without any major problems. I'll use the example here of Azure Pipelines, which is part of the Azure DevOps service (and which used to be called VSTS's Build Configurations feature). Azure Pipelines lets us define our build process as a YAML file, which is also a very convenient way to document and share it!

Here's a build.yaml for building our Azure Functions app and running the integration tests:

The three key parts here are:

  • Lines 3 and 4 override the FunctionHostPath app setting with the location that Azure Pipelines hosted agents use for NPM packages, which is different to the location on most developers' PCs.

  • Line 6 links the build.yaml with the variable group IntegrationTestConnectionStrings. Variable groups are outside the scope of this post, but briefly, they let us create a predefined set of variables that are available as environment variables. Inside the IntegrationTestConnectionStrings variable group, I have set two variables - AzureWebJobsStorage and StorageConnectionString - to a connection string for an Azure Storage account that I want to use when I run from the hosted agent.

  • Lines 16 through 21 install the Azure Functions Core Tools, which gives us the func.dll host that we use. For Azure Pipelines hosted agents we need to run this step every time we run the build since NPM packages are reset after each build completes.

  • Lines 23 through 28 use the dotnet test command, which is part of the .NET Core tooling, to execute our integration tests. This automatically publishes the results to Azure DevOps too.

When we run the build, we can see the tests have run successfully:


And not only that, but we can see the console output in the build log, which can be helpful when diagnosing any issues we might see:

I've found this approach to be really useful when developing complex Azure Functions apps, where integration testing is a part of the quality control necessary for functions that run non-trivial workloads.

Remember you can view the complete code for this post on GitHub.

Update: A second post, adapting this method to test timer-triggered functions, is now available too. TODO

Automatic Key Rotation for Azure Services

This post was originally published on the Kloud blog.

Securely managing keys for services that we use is an important, and sometimes difficult, part of building and running a cloud-based application. In general I prefer not to handle keys at all, and instead rely on approaches like managed service identities with role-based access control, which allow for applications to authenticate and authorise themselves without any keys being explicitly exchanged. However, there are a number of situations where do we need to use and manage keys, such as when we use services that don't support role-based access control. One best practice that we should adopt when handling keys is to rotate (change) them regularly.

Key rotation is important to cover situations where your keys may have compromised. Common attack vectors include keys having been committed to a public GitHub repository, a log file having a key accidentally written to it, or a disgruntled ex-employee retaining a key that had previously been issued. Changing the keys means that the scope of the damage is limited, and if keys aren't changed regularly then these types of vulnerability can be severe.

In many applications, keys are used in complex ways and require manual intervention to rotate. But in other applications, it's possible to completely automate the rotation of keys. In this post I'll explain one such approach, which rotates keys every time the application and its infrastructure components are redeployed. Assuming the application is deployed regularly, for example using a continuous deployment process, we will end up rotating keys very frequently.


The key rotation process I describe here relies on the fact that the services we'll be dealing with - Azure Storage, Cosmos DB, and Service Bus - have both a primary and a secondary key. Both keys are valid for any requests, and they can be changed independently of each other. During each release we will pick one of these keys to use, and we'll make sure that we only use that one. We'll deploy our application components, which will include referencing that key and making sure our application uses it. Then we'll rotate the other key.

The flow of the script is as follows:

  1. Decide whether to use the primary key or the secondary key for this deployment. There are several approaches to do this, which I describe below.

  2. Deploy the ARM template. In our example, the ARM template is the main thing that reads the keys. The template copies the keys into an Azure Function application's configuration settings, as well as into a Key Vault. You could, of course, output the keys and have your deployment script put them elsewhere if you want to.

  3. Run the other deployment logic. For our simple application we don't need to do anything more than run the ARM template deployment, but for many deployments you might copy your application files to a server, swap the deployment slots, or perform a variety of other actions that you need to run as part of your release.

  4. Test the application is working. The Azure Function in our example will perform some checks to ensure the keys are working correctly. You might also run other 'smoke tests' after completing your deployment logic.

  5. Record the key we used. We need to keep track of the keys we’ve used in this deployment so that the next deployment can use the other one.

  6. Rotate the other key. Now we can rotate the key that we are not using. The way that we rotate keys is a little different for each service.

  7. Test the application again. Finally, we run one more check to ensure that our application works. This is mostly a last check to ensure that we haven't accidentally referenced any other keys, which would break our application now that they've been rotated.

We don't rotate any keys until after we've already switched the application to using the other set of keys, so we should never end up in a situation where we've referenced the wrong keys from the Azure Functions application. However, if we wanted to have a true zero-downtime deployment then we could use something like deployment slots to allow for warming up our application before we switch it into production.

A Word of Warning

If you're going to apply this principle in this post or the code below to your own applications, it's important to be aware of an important limitation. The particular approach described here only works if your deployments are completely self-contained, with the keys only used inside the deployment process itself. If you provide keys for your components to any other systems or third parties, rotating keys in this manner will likely cause their systems to break.

Importantly, any shared access signatures and tokens you issue will likely be broken by this process too. For example, if you provide third parties with a SAS token to access a storage account or blob, then rotating the account keys will cause the SAS token to be invalidated. There are some ways to avoid this, including generating SAS tokens from your deployment process and sending them out from there, or by using stored access policies; these approaches are beyond the scope of this post.

The next sections provide some detail on the important steps in the list above.

Step 1: Choosing a Key

The first step we need to perform is to decide whether we should use the primary or secondary keys for this deployment. Ideally each deployment would switch between them - so deployment 1 would use the primary keys, deployment 2 the secondary, deployment 3 the primary, deployment 4 the secondary, etc. This requires that we store some state about the deployments somewhere. Don’t forget, though, that the very first time we deploy the application we won’t have this state set. We need to allow for this scenario too.

The option that I’ve chosen to use in the sample is to use a resource group tag. Azure lets us use tags to attach custom metadata to most resource types, as well as to resource groups. I’ve used a custom tag named CurrentKeys to indicate whether the resources in that group currently use the primary or secondary keys.

There are other places you could store this state too - some sort of external configuration system, or within your release management tool. You could even have your deployment scripts look at the keys currently used by the application code, compare them to the keys on the actual target resources, and then infer which key set is being used that way.

A simpler alternative to maintaining state is to randomly choose to use the primary or secondary keys on every deployment. This may sometimes mean that you end up reusing the same keys repeatedly for several deployments in a row, but in many cases this might not be a problem, and may be worth the simplicity of not maintaining state.

Step 2: Deploy the ARM Template

Our ARM template includes the resource definitions for all of the components we want to create - a storage account, a Cosmos DB account, a Service Bus namespace, and an Azure Function app to use for testing. You can see the full ARM template here.

Note that we are deploying the Azure Function application code using the ARM template deployment method.

Additionally, we copy the keys for our services into the Azure Function app's settings, and into a Key Vault, so that we can access them from our application.

Step 4: Testing the Keys

Once we've finished deploying the ARM template and completing any other deployment steps, we should test to make sure that the keys we're trying to use are valid. Many deployments include some sort of smoke test - a quick test of core functionality of the application. In this case, I wrote an Azure Function that will check that it can connect to the Azure resources in question.

Testing Azure Storage Keys

To test connectivity to Azure Storage, we run a query against the storage API to check if a blob container exists. We don't actually care if the container exists or not; we just check to see if we can successfully make the request:

Testing Cosmos DB Keys

To test connectivity to Cosmos DB, we use the Cosmos DB SDK to try to retrieve some metadata about the database account. Once again we're not interested in the results, just in the success of the API call:

Testing Service Bus Keys

And finally, to test connectivity to Service Bus, we try to get a list of queues within the Service Bus namespace. As long as we get something back, we consider the test to have passed:

You can view the full Azure Function here.

Step 6: Rotating the Keys

One of the last steps we perform is to actually rotate the keys for the services. The way in which we request key rotations is different depending on the services we're talking to.

Rotating Azure Storage Keys

Azure Storage provides an API that can be used to regenerate an account key. From PowerShell we can use the New-AzureRmStorageAccountKey cmdlet to access this API:

Rotating Cosmos DB Keys

For Cosmos DB, there is a similar API to regenerate an account key. There are no first-party PowerShell cmdlets for Cosmos DB, so we can instead a generic Azure Resource Manager cmdlet to invoke the API:

Rotating Service Bus Keys

Service Bus provides an API to regenerate the keys for a specified authorization rule. For this example we're using the default RootManageSharedAccessKey authorization rule, which is created automatically when the Service Bus namespace is provisioned. The PowerShell cmdlet New-AzureRmServiceBusKey can be used to access this API:

You can see the full script here.


Key management and rotation is often a painful process, but if your application deployments are completely self-contained then the process described here is one way to ensure that you continuously keep your keys changing and up-to-date.

You can download the full set of scripts and code for this example from GitHub.

Deploying Azure Functions with ARM Templates

This post was originally published on the Kloud blog.

There are many different ways in which an Azure Function can be deployed. In a future blog post I plan to go through the whole list. There is one deployment method that isn't commonly known though, and it's of particular interest to those of us who use ARM templates to deploy our Azure infrastructure. Before I describe it, I'll quickly recap ARM templates.

ARM Templates

Azure Resource Manager (ARM) templates are JSON files that describe the state of a resource group. They typically declare the full set of resources that need to be provisioned or updated. ARM templates are idempotent, so a common pattern is to run the template deployment regularly—often as part of a continuous deployment process—which will ensure that the resource group stays in sync with the description within the template.

In general, the role of ARM templates is typically to deploy the infrastructure required for an application, while the deployment of the actual application logic happens separately. However, Azure Functions' ARM integration has a feature whereby an ARM template can be used to deploy the files required to make the function run.

How to Deploy Functions in an ARM Template

In order to deploy a function through an ARM template, we need to declare a resource of type Microsoft.Web/sites/functions, like this:

There are two important parts to this.

First, the config property is essentially the contents of the function.json file. It includes the list of bindings for the function, and in the example above it also includes the disabled property.

Second, the files property is an object that contains key-value pairs representing each file to deploy. The key represents the filename, and the value represents the full contents of the file. This only really works for text files, so this deployment method is probably not the right choice for precompiled functions and other binary files. Also, the file needs to be inlined within the template, which may quickly get unwieldy for larger function files—and even for smaller files, the file needs to be escaped as a JSON string. This can be done using an online tool like this, or you could use a script to do the escaping and pass the file contents as a parameter into the template deployment.

Importantly, in my testing I found that using this method to deploy over an existing function will remove any files that are not declared in the files list, so be careful when testing this approach if you've modified the function or added any files through the portal or elsewhere.


There are many different ways you can insert your function file into the template, but one of the ways I tend to use is a PowerShell script. Inside the script, we can read the contents of the file into a string, and create a HashTable for the ARM template deployment parameters:

Then we can use the New-AzureRmResourceGroupDeployment cmdlet to execute the deployment, passing in $templateParameters to the -TemplateParameterObject argument.

You can see the full example here.

Of course, if you have a function that doesn't change often then you could instead manually convert the file into a JSON-encoded string using a tool like this one, and paste the function right into the ARM template. To see a full example of how this can be used, check out this example ARM template from a previous blog article I wrote.

When to Use It

Deploying a function through an ARM template can make sense when you have a very simple function that is comprised of one, or just a few, files to be deployed. In particular, if you already deploy the function app itself through the ARM template then this might be a natural extension of what you're doing.

This type of deployment can also make sense if you're wanting to quickly deploy and test a function and don't need some of the more complex deployment-related features like control over handling locked files. It's also a useful technique to have available for situations where a full deployment script might be too heavyweight.

However, for precompiled functions, functions that have binary files, and for complex deployments, it's probably better to use another deployment mechanism. Nevertheless, I think it's useful to know that this is a tool in your Azure Functions toolbox.

Avoiding Cosmos DB Bill Shock with Azure Functions

This post was originally published on the Kloud blog.

Cosmos DB is a fantastic database service for many different types of applications. But it can also be quite expensive, especially if you have a number of instances of your database to maintain. For example, in some enterprise development teams you may need to have dev, test, UAT, staging, and production instances of your application and its components. Assuming you're following best practices and keeping these isolated from each other, that means you're running at least five Cosmos DB collections. It's easy for someone to accidentally leave one of these Cosmos DB instances provisioned at a higher throughput than you expect, and before long you're racking up large bills, especially if the higher throughput is left overnight or over a weekend.

In this post I'll describe an approach I've been using recently to ensure the Cosmos DB collections in my subscriptions aren't causing costs to escalate. I've created an Azure Function that will run on a regular basis. It uses a managed service identity to identify the Cosmos DB accounts throughout my whole Azure subscription, and then it looks at each collection in each account to check that they are set at the expected throughput. If it finds anything over-provisioned, it sends an email so that I can investigate what's happening. You can run the same function to help you identify over-provisioned collections too.

Step 1: Create Function App

First, we need to set up an Azure Functions app. You can do this in many different ways; for simplicity, we'll use the Azure Portal for everything here.

Click Create a Resource on the left pane of the portal, and then choose Serverless Function App. Enter the information it prompts for - a globally unique function app name, a subscription, a region, and a resource group - and click Create.


Step 2: Enable a Managed Service Identity

Once we have our function app ready, we need to give it a managed service identity. This will allow us to connect to our Azure subscription and list the Cosmos DB accounts within it, but without us having to maintain any keys or secrets. For more information on managed service identities, check out my previous post.

Open up the Function Apps blade in the portal, open your app, and click Platform Features, then Managed service identity:


Switch the feature to On and click Save.

Step 3: Create Authorisation Rules

Now we have an identity for our function, we need to grant it access to the parts of our Azure subscription we want it to examine for us. In my case I'll grant it the rights over my whole subscription, but you could just give it rights on a single resource group, or even just a single Cosmos DB account. Equally you can give it access across multiple subscriptions and it will look through them all.

Open up the Subscriptions blade and choose the subscription you want it to look over. Click Access Control (IAM):


Click the Add button to create a new role assignment.

The minimum role we need to grant the function app is called Cosmos DB Account Reader Role. This allows the function to discover the Cosmos DB accounts, and to retrieve the read-only keys for those accounts, as described here. The function app can't use this role to make any changes to the accounts.

Finally, enter the name of your function app, click it, and click Save:

This will create the role assignment. Your function app is now authorised to enumerate and access Cosmos DB accounts throughout the subscription.

Step 4: Add the Function

Next, we can actually create our function. Go back into the function app and click the + button next to Functions. We'll choose to create a custom function:

Then choose a timer trigger:

Choose C# for the language, and enter the name CosmosChecker. (Feel free to use a name with more panache if you want.) Leave the timer settings alone for now:


Your function will open up with some placeholder code. We'll ignore this for now. Click the View files button on the right side of the page, and then click the Add button. Create a file named project.json, and then open it and paste in the following, then click Save:

This will add the necessary package references that we need to find and access our Cosmos DB collections, and then to send alert emails using SendGrid.

Now click on the run.csx file and paste in the following file:

I won't go through the entire script here, but I have added comments to try to make its purpose a little clearer.

Finally, click on the function.json file and replace the contents with the following:

This will configure the function app with the necessary timer, as well as an output binding to send an email. We'll discuss most of these settings later, but one important setting to note is the schedule setting. The value I've got above means the function will run every hour. You can change it to other values using cron expressions, such as:

  • Run every day at 9.30am UTC: 0 30 9 * * *

  • Run every four hours: 0 0 */4 * * *

  • Run once a week: 0 0 * * 0

You can decide how frequently you want this to run and replace the schedule with the appropriate value from above.

Step 5: Get a SendGrid Account

We're using SendGrid to send email alerts. SendGrid has built-in integration with Azure Functions so it's a good choice, although you're obviously welcome to switch out for anything else if you'd prefer. You might want an SMS message to be sent via Twilio, or a message to be sent to Slack via the Slack webhook API, for example.

If you don't already have a SendGrid account you can sign up for a free account on their website. Once you've got your account, you'll need to create an API key and have it ready for the next step.

Step 6: Configure Function App Settings

Click on your function app name and then click on Application settings:


Scroll down to the Application settings section. We'll need to enter three settings here:

  1. Setting name: SendGridKey. This should have a value of your SendGrid API key from step 5.

  2. Setting name: AlertToAddress. This should be the email address that you want alerts to be sent to.

  3. Setting name: AlertFromAddress. This should be the email address that you want alerts to be sent from. This can be the same as the 'to' address if you want.

Your Application settings section should look something like this:


Step 7: Run the Function

Now we can run the function! Click on the function name again (CosmosChecker), and then click the Run button. You can expand out the Logs pane at the bottom of the screen if you want to watch it run:

Depending on how many Cosmos DB accounts and collections you have, it may take a minute or two to complete.

If you've got any collections provisioned over 2000 RU/s, you should receive an email telling you this fact:

Configuring Alert Policies

By default, the function is configured to alert whenever it sees a Cosmos DB collection provisioned over 2000 RU/s. However, your situation may be quite different to mine. For example, you may want to be alerted whenever you have any collections provisioned over 1000 RU/s. Or, you may have production applications that should be provisioned up to 100,000 RU/s, but you only want development and test collections provisioned at 2000 RU/s.

You can configure alert policies in two ways.

First, if you have a specific collection that should have a specific policy applied to it - like the production collection I mentioned that should be allowed to go to 100,000 RU/s - then you can create another application setting. Give it the name MaximumThroughput:, and set the value to the limit you want for that collection.

For example, a collection named customers in a database named customerdb in an account named myaccount-prod would have a setting named MaximumThroughput:myaccount-prod:customerdb:customers. The value would be 100000, assuming you wanted the function to check this collection against a limit of 100,000 RU/s.

Second, by default the function has a default quota of 2000 RU/s. You can adjust this to whatever value you want by altering the value on line 17 of the function code file (run.csx).

ARM Template

If you want to deploy this function for yourself, you can also use an ARM template I have prepared. This performs all the steps listed above except step 3, which you still need to do manually.

Of course, you are also welcome to adjust the actual logic involved in checking the accounts and collections to suit your own needs. The full code is available on GitHub and you are welcome to take and modify it as much as you like! I hope this helps to avoid some nasty bill shocks.

Monitoring Azure Storage Queues with Application Insights and Azure Monitor

This post was originally published on the Kloud blog.

Azure Queues provides an easy queuing system for cloud-based applications. Queues allow for loose coupling between application components, and applications that use queues can take advantage of features like peek-locking and multiple retry attempts to enable application resiliency and high availability. Additionally, when Azure Queues are used with Azure Functions or Azure WebJobs, the built-in poison queue support allows for messages that repeatedly fail processing attempts to be moved to a dedicated queue for later inspection.

An important part of operating a queue-based application is monitoring the length of queues. This can tell you whether the back-end parts of the application are responding, whether they are keeping up with the amount of work they are being given, and whether there are messages that are causing problems. Most applications will have messages being added to and removed from queues as part of their regular operation. Over time, an operations team will begin to understand the normal range for each queue’s length. When a queue goes out of this range, it’s important to be alerted so that corrective action can be taken.

Azure Queues don’t have a built-in queue length monitoring system. Azure Application Insights allows for the collection of large volumes of data from an application, but it does not support monitoring queue lengths with its built-in functionality. In this post, we will create a serverless integration between Azure Queues and Application Insights using an Azure Function. This will allow us to use Application Insights to monitor queue lengths and set up Azure Monitor alert emails if the queue length exceeds a given threshold.

Solution Architecture

There are several ways that Application Insights could be integrated with Azure Queues. In this post we will use Azure Functions. Azure Functions is a serverless platform, allowing for blocks of code to be executed on demand or at regular intervals. We will write an Azure Function to poll the length of a set of queues, and publish these values to Application Insights. Then we will use Application Insights’ built-in analytics and alerting tools to monitor the queue lengths.

Base Application Setup

For this sample, we will use the Azure Portal to create the resources we need. You don't even need Visual Studio to follow along. I will assume some basic familiarity with Azure.

First, we'll need an Azure Storage account for our queues. In our sample application, we already have a storage account with two queues to monitor:

  • processorders: this is a queue that an API publishes to, and a back-end WebJob reads from the queue and processes its items. The queue contains orders that need to be processed.

  • processorders-poison: this is a queue that WebJobs has created automatically. Any messages that cannot be processed by the WebJob (by default after five attempts) will be moved into this queue for manual handling.

Next, we will create an Azure Functions app. When we create this through the Azure Portal, the portal helpfully asks if we want to create an Azure Storage account to store diagnostic logs and other metadata. We will choose to use our existing storage account, but if you prefer, you can have a different storage account than the one your queues are in. Additionally, the portal offers to create an Application Insights account. We will accept this, but you can create it separately later if you want.


Once all of these components have been deployed, we are ready to write our function.

Azure Function

Now we can write an Azure Function to connect to the queues and check their length.

Open the Azure Functions account and click the + button next to the Functions menu. Select a Timer trigger. We will use C# for this example. Click the Create this function button.


By default, the function will run every five minutes. That might be sufficient for many applications. If you need to run the function on a different frequency, you can edit the schedule element in the function.json file and specify a cron expression.

Next, paste the following code over the top of the existing function:

This code connects to an Azure Storage account and retrieves the length of each queue specified. The key parts here are:

var connectionString = System.Configuration.ConfigurationManager.AppSettings["AzureWebJobsStorage"];

Azure Functions has an application setting called AzureWebJobsStorage. By default this refers to the storage account created when we provisioned the functions app. If you wanted to monitor a queue in another account, you could reference the storage account connection string here.

var queue = queueClient.GetQueueReference(queueName);
var length = queue.ApproximateMessageCount;

When you obtain a reference to a queue, you must explicitly fetch the queue attributes in order to read the ApproximateMessageCount. As the name suggests, this count may not be completely accurate, especially in situations where messages are being added and removed at a high rate. For our purposes, an approximate message count is enough for us to monitor.

log.Info($": ");

For now, this line will let us view the length of the queues within the Azure Functions log window. Later, we will switch this out to log to Application Insights instead.

Click Save and run. You should see something like the following appear in the log output window below the code editor:

2017-09-07T00:35:00.028 Function started (Id=57547b15-4c3e-42e7-a1de-1240fdf57b36)
2017-09-07T00:35:00.028 C# Timer trigger function executed at: 9/7/2017 12:35:00 AM
2017-09-07T00:35:00.028 processorders: 1
2017-09-07T00:35:00.028 processorders-poison: 0
2017-09-07T00:35:00.028 Function completed (Success, Id=57547b15-4c3e-42e7-a1de-1240fdf57b36, Duration=9ms)

Integrating into Azure Functions

Azure Functions has integration with Appliation Insights for logging of each function execution. In this case, we want to save our own custom metrics, which is not currently supported by the built-in integration. Thankfully, integrating the full Application Insights SDK into our function is very easy.

First, we need to add a project.json file to our function app. To do this, click the View files tab on the right pane of the function app. Then click the + Add button, and name your new file project.json. Paste in the following:

This adds a NuGet reference to the Microsoft.ApplicationInsights package, which allows us to use the full SDK from our function.

Next, we need to update our function so that it writes the queue length to Application Insights. Click on the run.csx file on the right-hand pane, and replace the current function code with the following:

The key new parts that we have just added are:

private static string key = TelemetryConfiguration.Active.InstrumentationKey = System.Environment.GetEnvironmentVariable("APPINSIGHTS_INSTRUMENTATIONKEY", EnvironmentVariableTarget.Process);
private static TelemetryClient telemetry = new TelemetryClient() { InstrumentationKey = key };

This sets up an Application Insights TelemetryClient instance that we can use to publish our metrics. Note that in order for Application Insights to route the metrics to the right place, it needs an instrumentation key. Azure Functions' built-in integration with Application Insights means that we can simply reference the instrumentation key it has set up for us. If you did not set up Application Insights at the same time as the function app, you can configure this separately and set your instrumentation key here.

Now we can publish our custom metrics into Application Insights. Note that Application Insights has several different types of custom diagnostics that can be tracked. In this case, we use metrics since they provide the ability to track numerical values over time, and set up alerts as appropriate. We have added the following line to our foreach loop, which publishes the queue length into Application Insights.

telemetry.TrackMetric($"Queue length – ", (double)length);

Click Save and run. Once the function executes successfully, wait a few minutes before continuing with the next step - Application Insights takes a little time (usually less than five minutes) to ingest new data.

Exploring Metrics in Application Insights

In the Azure Portal, open your Application Insights account and click Metrics Explorer in the menu. Then click the + Add chart button, and expand the Custom metric collection. You should see the new queue length metrics listed.


Select the metrics, and as you do, notice that a new chart is added to the main panel showing the queue length count. In our case, we can see the processorders queue count fluctuate between one and five messages, while the processorders-poison queue stays empty. Set the Aggregation property to Max to better see how the queue fluctuates.


You may also want to click the Time range button in the main panel, and set it to Last 30 minutes, to fully see how the queue length changes during your testing.

Setting up Alerts

Now that we can see our metrics appearing in Application Insights, we can set up alerts to notify us whenever a queue exceeds some length we specify. For our processorders queue, this might be 10 messages - we know from our operations history that if we have more than ten messages waiting to be processed, our WebJob isn't processing them fast enough. For our processorders-poison queue, though, we never want to have any messages appear in this queue, so we can have an alert if more than zero messages are on the queue. Your thresholds may differ for your application, depending on your requirements.

Click the Alert rules button in the main panel. Azure Monitor opens. Click the + Add metric alert button. (You may need to select the Application Insights account in the Resource drop-down list if this is disabled.)

On the Add rule pane, set the values as follows:

  • Name: Use a descriptive name here, such as `QueueProcessOrdersLength`.

  • Metric: Select the appropriate `Queue length - queuename` metric.

  • Condition: Set this to the value and time period you require. In our case, I have set the rule to `Greater than 10 over the last 5 minutes`.

  • Notify via: Specify how you want to be notified of the alert. Azure Monitor can send emails, call a webhook URL, or even start a Logic App. In our case, I have opted to receive an email.

Click OK to save the rule.


If the queue count exceeds your specified limit, you will receive an email shortly afterwards with details:



Monitoring the length of your queues is a critical part of operating a modern application, and getting alerted when a queue is becoming excessively long can help to identify application failures early, and thereby avoid downtime and SLA violations. Even though Azure Storage doesn't provide a built-in monitoring mechanism, one can easily be created using Azure Functions, Application Insights, and Azure Monitor.