John Downs

Building Human-Focused Software

Filtering by Tag: Application Insights

Automating Azure Instrumentation and Monitoring - Part 2: Application Insights

This post was originally published on the Kloud blog.

Application Insights is a component of Azure Monitor for application-level instrumentation. It collects telemetry from your application infrastructure like web servers, App Services, and Azure Functions apps, and from your application code. In this post we'll discuss how Application Insights can be automated in several key ways: first, by setting up an Application Insights instance in an ARM template; second, by connecting it to various types of Azure application components through automation scripts including Azure Functions, App Services, and API Management; and third, by configuring its smart detection features to emit automatic alerts in a configurable way. As this is the first time in this series that we'll deploy instrumentation code, we'll also discuss an approach that can be used to manage the deployment of different types and levels of monitoring into different environments.

This post is part of a series:

  • Part 1 provides an introduction to the series by describing why we should instrument our systems, outlines some of the major tools that Azure provides such as Azure Monitor, and argues why we should be adopting an 'infrastructure as code' mindset for our instrumentation and monitoring components.

  • Part 2 (this post) describes Azure Application Insights, including its proactive detection and alert features. It also outlines a pattern for deploying instrumentation components based on the requirements we might typically have for different environments, from short-lived development and test environments through to production.

  • Part 3 discusses how to publish custom metrics, both through Application Insights and to Azure Monitor. Custom metrics let us enrich the data that is available to our instrumentation components.

  • Part 4 covers the basics of alerts and metric alerts. Azure Monitor's powerful alerting system is a big topic, and in this part we'll discuss how it works overall, as well as how to get alerts for built-in and custom metrics.

  • Part 5 covers log alerts and resource health alerts, two other major types of alerts that Azure Monitor provides. Log alerts let us alert on information coming into Application Insights logs and Log Analytics workspaces, while resource health alerts us when Azure itself is having an issue that may result in downtime or degraded performance.

  • Part 6 (coming soon) describes dashboards. The Azure Portal has a great dashboard UI, and our instrumentation data can be made available as charts. Dashboards are also possible to automate, and I'll show a few tips and tricks I've learned when doing this.

  • Part 7 (coming soon) covers availability tests, which let us proactively monitor our web applications for potential outages. We'll discuss deploying and automating both single-step (ping) and multi-step availability tests.

  • Part 8 (coming soon) describes autoscale. While this isn't exactly instrumentation in and of itself, autoscale is built on much of the same data used to drive alerts and dashboards, and autoscale rules can be automated as well.

  • Finally, part 9 (coming soon) covers exporting data to other systems. Azure Monitor metrics and log data can be automatically exported, as can Application Insights data, and the export rules can be exported and used from automation scripts.

Setting up Application Insights

When using the Azure Portal or Visual Studio to work with various types of resources, Application Insights will often be deployed automatically. This is useful when we're exploring services or testing things out, but when it comes time to building a production-grade application, it's better to have some control over the way that each of our components is deployed. Application Insights can be deployed using ARM templates, which is what we'll do in this post.

Application Insights is a simple resource to create from an ARM template. An instance with the core functionality can be created with a small ARM template:

There are a few important things to note about this template:

  • Application Insights is only available in a subset of Azure regions. This means you may need to deploy it in a region other than the region your application infrastructure is located in. The template above includes a parameter to specify this explicitly.

  • The name of your Application Insights instance doesn't have to be globally unique. Unlike resources like App Services and Cosmos DB accounts, there are no DNS names attached to an Application Insights instance, so you can use the same name across multiple instances if they're in different resource groups.

  • Application Insights isn't free. If you have a lot of data to ingest, you may incur large costs. You can use quotas to manage this if you're worried about it.

  • There are more options available that we won't cover here. This documentation page provides further detail.

After an Application Insights instance is deployed, it has an instrumentation key that can be used to send data to the correct Application Insights instance from your application. The instrumentation key can be accessed both through the portal and programmatically, including within ARM templates. We'll use this when publishing telemetry.

Publishing Telemetry

There are a number of ways to publish telemetry into Application Insights. While I won't cover them all, I'll give a quick overview of some of the most common ways to get data from your application into Application Insights, and how to automate each.

Azure Functions

Azure Functions has built-in integration with Application Insights. If you create an Azure Functions app through the portal it asks whether you want to set up this integration. Of course, since we're using automation scripts, we have to do a little work ourselves. The magic lies in an app setting, APPINSIGHTS_INSTRUMENTATIONKEY, which we can attach to any function app. Here is an ARM template that deploys an Azure Functions app (and associated app service plan and storage account), an Application Insights instance, and the configuration to link the two:

App Services

If we're deploying a web app into an app service using ASP.NET, we can use the ASP.NET integration directly. In fact, this works across many different hosting environments, and is described in the next section.

If you've got an app service that is not using ASP.NET, though, you can still get telemetry from your web app into Application Insights by using an app service extension. Extensions augment the built-in behaviour of an app service by installing some extra pieces of logic into the web server. Application Insights has one such extension, and we can configure it using an ARM template:

As you can see, similarly to the Azure Functions example, the APPINSIGHTS_INSTRUMENTATIONKEY is used here to link the app service with the Application Insights instance.

One word of warning - I've found that the site extension ARM resource isn't always deployed correctly the first time the template is deployed. If you get an error the first time you deploy, try it again and see if the problem goes away. I've tried, but have never fully been able to understand why this happens or how to stop it.

ASP.NET Applications

If you have an ASP.NET application running in an App Service or elsewhere, you can install a NuGet package into your project to collect Application Insights telemetry. This process is documented here. If you do this, you don't need to install the App Services extension from the previous section. Make sure to set the instrumentation key in your configuration settings and then flow it through to Application Insights from your application code.

API Management

If you have an Azure API Management instance, you might be aware that this can publish telemetry into Application Insights too. This allows for monitoring of requests all the way through the request handling pipeline. When it comes to automation, Azure API Management has very good support for ARM templates, and its Application Insights integration is no exception.

At a high level there are two things we need to do: first, we create a logger resource to establish the API Management-wide connection with Application Insights; and second, we create a diagnostic resource to instruct our APIs to send telemetry to the Application Insights instance we have configured. We can create a diagnostic resource for a specific API or to cover all APIs.

The diagnostic resource includes a sampling rate, which is the percentage of requests that should have their telemetry sent to Application Insights. There is a lot of detail to be aware of with this feature, such as the performance impact and the ways in which sampling can reduce that impact. We won't get into that here, but I encourage you to read more detail from Microsoft's documentation before using this feature.

Here's an example ARM template that deploys an API Management instance, an Application Insights instance, and configuration to send telemetry from every request into Application Insights:

Smart Detection

Application Insights provides a useful feature called smart detection. Application Insights watches your telemetry as it comes in, and if it notices unusual changes, it can send an alert to notify you. For example, it can detect the following types of issues:

  • An application suddenly sends back a higher rate of 5xx (error)-class status responses than it was sending previously.

  • The time it takes for an application to communicate with a database has increased significantly above the previous average.

Of course, this feature is not foolproof - for example, in my experience it won't detect slow changes in error rates over time that may still indicate an issue. Nevertheless, it is a very useful feature to have available to us, and it has helped me identify problems on numerous occasions.

Smart detection is enabled by default. Unless you configure it otherwise, smart detection alerts are sent to all owners of the Azure subscription in which the Application Insights instance is located. In many situations this is not desirable: when your Azure subscription contains many different applications, each with different owners; or when the operations or development team are not granted the subscription owner role (as they should not be!); or when the subscriptions are managed by a central subscription management team who cannot possibly deal with the alerts they receive from all applications. We can configure each smart detection alert using the proactiveDetectionConfigs ARM resource type.

Here is an example ARM template showing how the smart detection alerts can be redirected to an email address you specify:

In development environments, you may not want to have these alerts enabled at all. Development environments can be used sporadically, and can have a much higher error rate than normal, so the signals that Application Insights uses to proactively monitor for problems aren't as useful. I find that it's best to configure smart detection myself so that I can switch it on or off for different environments, and for those environments that do need it, I'll override the alert configuration to send to my own alert email address and not to the subscription owners. This requires us to have different instrumentation configuration for different environments.

Instrumentation Environments

In most real-world applications, we end up deploying the application in at least three environments: development environments, which are used by software developers as they actively work on a feature or change; non-production environments, which are used by testers, QA engineers, product managers, and others who need to access a copy of the application before it goes live; and production environments, which are used by customers and may be monitored by a central operations team. Within these categories, there can be multiple actual environments too - for example, there can be different non-production environments for different types of testing (e.g. functional testing, security testing, and performance testing), and some of these may be long-lived while others are short-lived.

Each of these different environments also has different needs for instrumentation:

  • Production environments typically need the highest level of alerting and monitoring since an issue may affect our customers' experiences. We'll typically have many alerts and dashboards set up for production systems. But we also may not want to collect large volumes of telemetry from production systems, especially if doing so may cause a negative impact on our application's performance.

  • Non-production environments may need some level of alerting, but there are certain types of alerts that may not make sense compared to production environments. For example, we may run our non-production systems on a lower tier of infrastructure compared to our production systems, and so an alert based on the application's response time may need different thresholds to account for the lower expected performance. But in contrast to non-production environments, we may consider it to be important to collect a lot of telemetry in case our testers do find any issues and we need to diagnose them interactively, so we may allow for higher levels of telemetry sampling than we would in a production environment.

  • Development environments may only need minimal instrumentation. Typically in development environments I'll deploy all of the telemetry collection that I would deploy for non-production environments, but turn all alerts and dashboards off. In the event of any issues, I'll be interactively working with the telemetry myself anyway.

Of course, your specific needs may be different, but in general I think it's good to categorise our instrumentation across types of environments. For example, here is how I might typically deploy Application Insights components across environments:

Instrumentation Type Development NonProduction Production
Application Insights smart detection Off On, sending alerts to developers On, sending alerts to production monitoring group
Application Insights Azure Functions integration On On On
Application Insights App Services integration On On On
Application Insights API Management integration On, at 100% sampling On, at 100% sampling On, at 30% sampling

Once we've determined those basic rules, we can then implement them. In the case of ARM templates, I tend to use ARM template parameters to handle this. As we go through this series we'll see examples of how we can use parameters to achieve this conditional logic. I'll also present versions of this table with my suggestions for the components that you might consider deploying for each environment.

Configuring Smart Detection through ARM Templates

Now that we have a basic idea of how we'll configure instrumentation in each environment, we can reconsider how we might configure Application Insights. Typically I suggest deploying a single Application Insights instance for each environment the system will be deployed into. If we're building up a complex ARM template with all of the system's components, we can embed the conditional logic required to handle different environments in there.

Here's a large ARM template that includes everything we've created in this post, and has the three environment type modes:

Summary

Application Insights is a very useful tool for monitoring our application components. It collects telemetry from a range of different sources, which can all be automated. It provides automatic analysis of some of our data and has smart detection features, which again we can configure through our automation scripts. Furthermore, we can publish data into it ourselves as well. In fact, in the next post this series, we'll discuss how we can publish custom metrics into Application Insights.

Monitoring Azure Storage Queues with Application Insights and Azure Monitor

This post was originally published on the Kloud blog.

Azure Queues provides an easy queuing system for cloud-based applications. Queues allow for loose coupling between application components, and applications that use queues can take advantage of features like peek-locking and multiple retry attempts to enable application resiliency and high availability. Additionally, when Azure Queues are used with Azure Functions or Azure WebJobs, the built-in poison queue support allows for messages that repeatedly fail processing attempts to be moved to a dedicated queue for later inspection.

An important part of operating a queue-based application is monitoring the length of queues. This can tell you whether the back-end parts of the application are responding, whether they are keeping up with the amount of work they are being given, and whether there are messages that are causing problems. Most applications will have messages being added to and removed from queues as part of their regular operation. Over time, an operations team will begin to understand the normal range for each queue’s length. When a queue goes out of this range, it’s important to be alerted so that corrective action can be taken.

Azure Queues don’t have a built-in queue length monitoring system. Azure Application Insights allows for the collection of large volumes of data from an application, but it does not support monitoring queue lengths with its built-in functionality. In this post, we will create a serverless integration between Azure Queues and Application Insights using an Azure Function. This will allow us to use Application Insights to monitor queue lengths and set up Azure Monitor alert emails if the queue length exceeds a given threshold.

Solution Architecture

There are several ways that Application Insights could be integrated with Azure Queues. In this post we will use Azure Functions. Azure Functions is a serverless platform, allowing for blocks of code to be executed on demand or at regular intervals. We will write an Azure Function to poll the length of a set of queues, and publish these values to Application Insights. Then we will use Application Insights’ built-in analytics and alerting tools to monitor the queue lengths.

Base Application Setup

For this sample, we will use the Azure Portal to create the resources we need. You don't even need Visual Studio to follow along. I will assume some basic familiarity with Azure.

First, we'll need an Azure Storage account for our queues. In our sample application, we already have a storage account with two queues to monitor:

  • processorders: this is a queue that an API publishes to, and a back-end WebJob reads from the queue and processes its items. The queue contains orders that need to be processed.

  • processorders-poison: this is a queue that WebJobs has created automatically. Any messages that cannot be processed by the WebJob (by default after five attempts) will be moved into this queue for manual handling.

Next, we will create an Azure Functions app. When we create this through the Azure Portal, the portal helpfully asks if we want to create an Azure Storage account to store diagnostic logs and other metadata. We will choose to use our existing storage account, but if you prefer, you can have a different storage account than the one your queues are in. Additionally, the portal offers to create an Application Insights account. We will accept this, but you can create it separately later if you want.

1-functionsapp.png

Once all of these components have been deployed, we are ready to write our function.

Azure Function

Now we can write an Azure Function to connect to the queues and check their length.

Open the Azure Functions account and click the + button next to the Functions menu. Select a Timer trigger. We will use C# for this example. Click the Create this function button.

2-function.png

By default, the function will run every five minutes. That might be sufficient for many applications. If you need to run the function on a different frequency, you can edit the schedule element in the function.json file and specify a cron expression.

Next, paste the following code over the top of the existing function:

This code connects to an Azure Storage account and retrieves the length of each queue specified. The key parts here are:

var connectionString = System.Configuration.ConfigurationManager.AppSettings["AzureWebJobsStorage"];

Azure Functions has an application setting called AzureWebJobsStorage. By default this refers to the storage account created when we provisioned the functions app. If you wanted to monitor a queue in another account, you could reference the storage account connection string here.

var queue = queueClient.GetQueueReference(queueName);
queue.FetchAttributes();
var length = queue.ApproximateMessageCount;

When you obtain a reference to a queue, you must explicitly fetch the queue attributes in order to read the ApproximateMessageCount. As the name suggests, this count may not be completely accurate, especially in situations where messages are being added and removed at a high rate. For our purposes, an approximate message count is enough for us to monitor.

log.Info($": ");

For now, this line will let us view the length of the queues within the Azure Functions log window. Later, we will switch this out to log to Application Insights instead.

Click Save and run. You should see something like the following appear in the log output window below the code editor:

2017-09-07T00:35:00.028 Function started (Id=57547b15-4c3e-42e7-a1de-1240fdf57b36)
2017-09-07T00:35:00.028 C# Timer trigger function executed at: 9/7/2017 12:35:00 AM
2017-09-07T00:35:00.028 processorders: 1
2017-09-07T00:35:00.028 processorders-poison: 0
2017-09-07T00:35:00.028 Function completed (Success, Id=57547b15-4c3e-42e7-a1de-1240fdf57b36, Duration=9ms)

Integrating into Azure Functions

Azure Functions has integration with Appliation Insights for logging of each function execution. In this case, we want to save our own custom metrics, which is not currently supported by the built-in integration. Thankfully, integrating the full Application Insights SDK into our function is very easy.

First, we need to add a project.json file to our function app. To do this, click the View files tab on the right pane of the function app. Then click the + Add button, and name your new file project.json. Paste in the following:

This adds a NuGet reference to the Microsoft.ApplicationInsights package, which allows us to use the full SDK from our function.

Next, we need to update our function so that it writes the queue length to Application Insights. Click on the run.csx file on the right-hand pane, and replace the current function code with the following:

The key new parts that we have just added are:

private static string key = TelemetryConfiguration.Active.InstrumentationKey = System.Environment.GetEnvironmentVariable("APPINSIGHTS_INSTRUMENTATIONKEY", EnvironmentVariableTarget.Process);
private static TelemetryClient telemetry = new TelemetryClient() { InstrumentationKey = key };

This sets up an Application Insights TelemetryClient instance that we can use to publish our metrics. Note that in order for Application Insights to route the metrics to the right place, it needs an instrumentation key. Azure Functions' built-in integration with Application Insights means that we can simply reference the instrumentation key it has set up for us. If you did not set up Application Insights at the same time as the function app, you can configure this separately and set your instrumentation key here.

Now we can publish our custom metrics into Application Insights. Note that Application Insights has several different types of custom diagnostics that can be tracked. In this case, we use metrics since they provide the ability to track numerical values over time, and set up alerts as appropriate. We have added the following line to our foreach loop, which publishes the queue length into Application Insights.

telemetry.TrackMetric($"Queue length – ", (double)length);

Click Save and run. Once the function executes successfully, wait a few minutes before continuing with the next step - Application Insights takes a little time (usually less than five minutes) to ingest new data.

Exploring Metrics in Application Insights

In the Azure Portal, open your Application Insights account and click Metrics Explorer in the menu. Then click the + Add chart button, and expand the Custom metric collection. You should see the new queue length metrics listed.

4-appinsights.png

Select the metrics, and as you do, notice that a new chart is added to the main panel showing the queue length count. In our case, we can see the processorders queue count fluctuate between one and five messages, while the processorders-poison queue stays empty. Set the Aggregation property to Max to better see how the queue fluctuates.

5-appinsightschart.png

You may also want to click the Time range button in the main panel, and set it to Last 30 minutes, to fully see how the queue length changes during your testing.

Setting up Alerts

Now that we can see our metrics appearing in Application Insights, we can set up alerts to notify us whenever a queue exceeds some length we specify. For our processorders queue, this might be 10 messages - we know from our operations history that if we have more than ten messages waiting to be processed, our WebJob isn't processing them fast enough. For our processorders-poison queue, though, we never want to have any messages appear in this queue, so we can have an alert if more than zero messages are on the queue. Your thresholds may differ for your application, depending on your requirements.

Click the Alert rules button in the main panel. Azure Monitor opens. Click the + Add metric alert button. (You may need to select the Application Insights account in the Resource drop-down list if this is disabled.)

On the Add rule pane, set the values as follows:

  • Name: Use a descriptive name here, such as `QueueProcessOrdersLength`.

  • Metric: Select the appropriate `Queue length - queuename` metric.

  • Condition: Set this to the value and time period you require. In our case, I have set the rule to `Greater than 10 over the last 5 minutes`.

  • Notify via: Specify how you want to be notified of the alert. Azure Monitor can send emails, call a webhook URL, or even start a Logic App. In our case, I have opted to receive an email.

Click OK to save the rule.

6-alert.png

If the queue count exceeds your specified limit, you will receive an email shortly afterwards with details:

7-alert.png

Summary

Monitoring the length of your queues is a critical part of operating a modern application, and getting alerted when a queue is becoming excessively long can help to identify application failures early, and thereby avoid downtime and SLA violations. Even though Azure Storage doesn't provide a built-in monitoring mechanism, one can easily be created using Azure Functions, Application Insights, and Azure Monitor.