Wednesday, July 31, 2013

Using Windows Azure Caching efficiently across multiple Cloud Service roles

Windows Azure Caching is a great way to improve performance of your Azure application at no additional cost. The cache is running alongside your application in Cloud Service roles. The only thing you need to decide is how much memory of the role you want use for it (for co-located cache role). You can also dedicate the all memory of a role to caching if you want (with dedicated cache role). Roles that host caching are called cache clusters.

Starting with Azure Caching is so easy that it can be a while before you fully understand the best way to use it. On a recent project my first tough was to enable caching on all the Cloud Service roles as co-located service. This was causing us problems.

First of all, been a developer I debug my application using the local Azure compute emulator. The emulator runs one cache service for each instances of roles with cache clusters. The application has 2 web roles and 1 worker role so when I start a debugging session with multiple instances per role I need a lot of memory to run everything. More importantly, cache clusters do not share cached data between each other. This caused us to have stale data in the application.

That is when I figured out that I needed to read a bit more on Azure Caching if I was to use it efficiently.

Understanding cache clusters

When you enable caching on an Azure role each instance of that role will run a cache service using a portion of the memory (or all of it if it's a dedicated cache role). The cache services running on each instance of a single role are managed as a single cache cluster. Cache services can talk to each other and synchronize data but only inside the same cache cluster (same role). That is why enabling caching of many roles might not be the best thing to do.

Another thing to mention is that cache clusters can only be created on small role instances or bigger. The reason is that with extra small instance you only get 768MB of RAM which is pretty much all used up by anything you run on those instances.

Now the enable a cache cluster on your role go to the role property page on the Caching tab.

Here you will notice that I also enabled notifications which will allow us to efficiently use local caches later.

For more information on the different configuration options for cache clusters go here.

Configuring roles to use cache clients

Now that we took care of the server side of caching configuration let's talk about the client side. Each instances of each roles inside the same cloud deployment can connect to a cache cluster. If you run only one cluster then you are guarantied to access the same cached data from whatever role you are inside your application (as long you have a valid configuration).

One nice feature we can enable in each role configuration is the local cache client. With this we can cache data locally in a role instance memory the data we recently fetched from the cache cluster for even faster access. Remember the Notification option we enabled on the server side? Using the configuration below in the Web.config or App.config of your role will ensure data stored in the local cache client gets updated whenever the cache server version of that data changes. Basically, the local cache client will invalidate data based on notifications received from the cache cluster.

For more information on client side configuration go here.

Other concerns

This post is only about an overview of the consideration of running multiple clusters versus a single one. Using Azure Caching there are a lot more configuration options you need to take a look at here. Also really important is how to use Azure Caching in your application.

Conclusions

I've spend a lot of time figuring out how all of this was working. I hope this post will help you with your learning experience.

Handling Azure Storage Queue poison messages

This post will talk about what to do now that we are handling poison messages in our Azure Storage Queues.

First, let's review what we'll done so far.

Messages that continuously fail to process will end up in the Error Queue. Someone asked me why do we need error queues at all? We could simply log the errors and delete the message right? Well, if you have a really efficient and pro-active DevOps team I suppose logging errors along with the original messages ought to be enough.

Someone will review why the message failed and if it was only a transient error then he could send the original message again in the queue.

We could also store failed messages into an Azure Storage Table.

Then we could simply monitor new entries in this table and act on it. Again, the original message should be stored in the table so we could send it again if we choose.

I think the best reason to use a queue for error message is if you want to have an administrative tools to monitor, review and re-send messages. In this case the queue mechanics let's you handle those messages like any other process using queues do.

For me one of the unpleasant side effect of using error queues is that all the queues in my system are now multiplied by two (one normal and one error queue). It's not too bad if your naming scheme is consistent but even then if you do operational work using a tool like Cerebrata Azure Management Studio or even from Visual Studio's Server Explorer you will feel overwhelmed by the quantity of queues.

Managing queue messages with Cerebrata Azure Management Studio

Managing queue messages with Visual Studio Server Explorer

Whatever you do, I suggest you always at least log failures properly. Later while debugging the issue you will be thankful to easily match the failure logs with the message who caused it.

The Never Ending Journey