Cosmos DB Server-Side Programming with TypeScript - Part 2: User-Defined Functions
User-defined functions (UDFs) in Cosmos DB allow for simple calculations and computations to be performed on values, entities, and documents. In this post I will introduce UDFs, and then provide detailed steps to set up a basic UDF written in TypeScript. Many of these same steps will be applicable to stored procedures and triggers, which we'll look at in future posts.
This is the second part of a series of blog posts on server-side development using Cosmos DB with TypeScript.
Part 1 gives an overview of the server side programmability model, the reasons why you might want to consider server-side code in Cosmos DB, and some key things to watch out for.
Part 2 (this post) deals with user-defined functions, the simplest type of server-side programming, which allow for adding simple computation to queries.
Part 3 talks about stored procedures. These provide a lot of powerful features for creating, modifying, deleting, and querying across documents - including in a transactional way.
Part 4 introduces triggers. Triggers come in two types - pre-triggers and post-triggers - and allow for behaviour like validating and modifying documents as they are inserted or updated, and creating secondary effects as a result of changes to documents in a collection.
Part 5 discusses unit testing your server-side scripts. Unit testing is a key part of building a production-grade application, and even though some of your code runs inside Cosmos DB, your business logic can still be tested.
Finally, part 6 explains how server-side scripts can be built and deployed into a Cosmos DB collection within an automated build and release pipeline, using Microsoft Visual Studio Team Services (VSTS).
UDFs are the simplest type of server-side development available for Cosmos DB. UDFs generally accept one or more parameters and return a value. They cannot access Cosmos DB's internal resources, and cannot read or write documents from the collection, so they are really only intended for simple types of computation. They can be used within queries, including in the
UDFs are simple enough that types are almost not necessary, but for consistency we will use TypeScript for these too. This will also allow us to work through the setup of a TypeScript project, which we'll reuse for the next parts of this series.
Accept a single string as a parameter. Do some string parsing on it, then return the parsed value.
Accept a single string as well as another parameter. Based on the value of the second parameter, change the parsing behaviour, then return the parsed value.
Accept an array of values as a parameter. Combine the values using some custom logic that you define, then return the combined value.
Accept no parameters. Return a piece of custom data based on the current date and time.
Accept a complex type as a parameter. Do some parsing of the document and then return a single output.
Invoking a UDF
A UDF can be invoked from within the
WHERE clauses of a SQL query. To invoke a UDF, you need to include the prefix
udf. before the function name, like this:
SELECT udf.parseString(c.stringField) FROM c
In this example,
udf. is a prefix indicating that we want to call a UDF, and
You can pass in multiple parameters to the UDF by comma delimiting them, like this:
SELECT udf.parseString(c.stringField, 1234) FROM c SELECT udf.parseString(c.stringField1, c.stringField2) FROM C
To pass a hard-coded array into a UDF, you can simply use square brackets, like this:
SELECT udf.parseArray(["arrayValue1", "arrayValue2", "arrayValue3"])
Now that we've talked through some of the key things to know about UDFs let's try writing one, using our sample scenario from part 1.
Defining our UDF
Let's imagine that our order system was built several years ago, and our business has now evolved significantly. As a result, we are in the middle of changing our order schema to represent customer IDs in different ways. Cosmos DB makes this easy by not enforcing a schema, so we can simply switch to the new schema when we're ready.
Our old way of representing a customer ID was like this:
Now, though, we are representing customers with additional metadata, like this:
However, we still want to be able to easily use a customer's ID within our queries. We need a way to dynamically figure out the customer's ID for an order, and this needs to work across our old and new schemas. This is a great candidate for a UDF. Let's deploy a Cosmos DB account and set up this UDF.
Setting Up a Cosmos DB Account and Collection
First, we'll deploy a Cosmos DB account and set up a database and collection using the Azure Portal. (Later in this series, we will discuss how this can be made more automatable.) Log into the Azure Portal and click
New, then choose
Cosmos DB. Click
We need to specify a globally unique name for our Cosmos DB account - I have used
johnorders, but you can use whatever you want. Make sure to select the
SQL option in the
API drop-down list. You can specify any subscription, resource group, and location that you want. Click
Create, and Cosmos DB will provision the account - this takes around 5-10 minutes.
Once the account is created, open it in the Portal. Click
Add Collection to add a new collection.
Let's name the collection
Orders, and it can go into a database also named
Orders. Provision it with a fixed (10GB) capacity, and 400 RU/s throughput (the minimum).
Note that this collection will cost money to run, so you'll want to remember to delete the collection when you're finished. You can leave an empty Cosmo DB account for no charge, though.
Preparing a Folder
TypeScript requires that we provide it with some instructions on how to compile our code. We'll also use Node Package Manager (NPM) to tie all of our steps together, and so need to prepare a few things before we can write our UDF's code.
Note that in parts 2, 3, and 4 of this series, we will write each server-side component as if it was its own independent application. This is to keep each of these posts easy to follow in a standalone way. However, in parts 5 and 6, we will combine these into a single folder structure. This will more accurately represent a real-world application, in which you are likely to have more than one server-side component.Create a new folder on your local machine and add a file named
package.json into it. This contains the NPM configuration we need. The contents of the file should be as follows:
project.json file does the following:
It defines the package dependencies we have when we develop our code. Currently we only have one -
It also defines a script that we can execute from within NPM. Currently we only have one -
At a command prompt, open the folder we've been working in. Run
npm install, which will find and install the TypeScript compiler. If you don't have NPM installed, install it by following the instructions here.
Next, create a file named
tsconfig.json. This is the TypeScript project configuration. This should contain the following:
tsconfig.json instructs TypeScript to do the following:
Find and compile all of the files with the
.tsextension inside the
srcfolder (we haven't created this yet!).
output/udf-getCustomerId.js. This filename is arbitrary and it could be any name we want, but I find it helpful to use the convention of
.js, especially as we add more code in later parts of this series.
Note that the
outFiledirective means that, even if we included multiple source TypeScript files, TypeScript will save the complete compiled script into a single output file. This is in keeping with the requirement that Cosmos DB imposes that a server-side component has to be specified in a single file.
Writing the UDF
Now we can write our actual UDF code! If you want to compare against my completed UDF, you can access it on GitHub.
Create a folder named
src, and within that, create a file named
getCustomerId.ts. (Once again, this file doesn't need to be named this way, but I find it helpful to use the UDF's name for the filename.) Here's the contents of this file:
Briefly, here's an explanation of what this file does:
It declares a function named
getCustomerId, which accepts a single parameter of type
OrderDocumentand returns a string. This is the function that represents our UDF.
The function inspects the
documentprovided, and depending on which version of the schema it follows, it pulls the customer ID out of the appropriate field.
If the customer ID isn't in either of the places it expects to find them, it throws an error. This will be further thrown up to the client by Cosmos DB.
Finally, it declares an interface named
OrderDocument. This represents the shape of the data we're expecting to store in our collection, and it has both of the ways of representing customer IDs.
Note that we are using an interface and not a class, because this data type has no meaning to Cosmos DB - it's only for use at development and build time.
Also note that we could put this into its own
orderDocument.tsfile if wanted to keep things separated out.
At the end of this, your folder should look something like this:
We have now written our first UDF! We're almost ready to run it - but before then, we need to compile it.
Compiling the UDF
At the command line run
npm run build. This will run the build script we defined inside the
package.json file, which in turn simply runs the
tsc (TypeScript compiler) command-line application.
tsc will find the
tsconfig.json file and knows what to do with it.
Once it's finished, you should see a new
output folder containing a file named
udf-getCustomerId.js. This is our fully compiled UDF! It should look like the following:
Deploying the UDF
Now we can deploy the UDF. Back in the Azure Portal, open
Script Explorer under the
Collections section, and then click
Create User Defined Function.
getCustomerId in the
Now we can copy and paste the contents of the
udf-getCustomerId.js file into the large script text box.
Save to install the UDF to Cosmos DB.
Testing the UDF
Finally, let's test the UDF! We'll need to add a couple of pieces of sample data. Click
Document Explorer under the
Collections section, and then click the
Create button. Paste in this sample document:
Save, and then close the blade and create a second document with the following contents:
This gives us enough to test with. Now click
Query Explorer under the
Collections section. Enter the following query:
SELECT c.id, udf.getCustomerId(c) AS customerId FROM c
This query does the following:
Refers to each document within the current collection (
getCustomerIdUDF, passing in the document contents. Note that to refer to a UDF, you must prefix the name of the UDF with
Projects out the document ID (
id) and the customerId as
You should see the following output:
That's exactly what we wanted to see - the UDF has pulled out the correct field for each document.
As a point of interest, notice the
Request Charge on the query results. Try running the query a few times, and you should see that it fluctuates a little - but is generally around 3.5 RUs.
Now let's try passing in an invalid input into our UDF. Run this query:
SELECT udf.getCustomerId('123') FROM c
Cosmos DB will give you back the error that our UDF threw because the input data (
123) didn't match either of the schemas it expected:
So we've now tested out the old customer ID format, the new customer ID format, and some invalid input, and the UDF behaves as we expect.
UDFs provide us with a way to encapsulate simple computational logic, and to expose this within queries. Although we can't refer to other documents or external data sources, UDFs are a good way to expose certain types of custom business logic. In this post, we've created a simple UDF for Cosmos DB, and tested it using a couple of simple documents. In the next part of this series we'll move on to stored procedures, which allow for considerably more complexity in our server-side code.
UDFs are intended for simple computation.
They can be used within queries, including in the
UDFs cannot access anything within the Cosmos DB collection, nor can they access any external resources.
They can accept one or more parameters, which must be provided when calling the UDF.
UDFs can be invoked from within a query by using the
SELECT udf.getCustomerId(c) FROM c.