Erlang bindings for Windows Azure
Background
I'm a procrastinator. Where others see an obstacle to surmount, I see an opportunity to put something away for later. Which is exactly what I did for my Erlang wrapper around the Windows Azure a few days ago. I discovered that Erlang didn't support SHA 256 out of the box, necessary for the HMAC-SHA256 needed for authenticating storage requests, I threw up my hands in a mediocre blog post, filed away my code for a later date when either Erlang added SHA 256 support to the crypto module (likely) or when I got around to implementing SHA 256 myself (not likely).
However, my procrastination this time was short lived. Steve Vinoski showed up in the comments and to my astonishment, whipped together a SHA-256 implementation. Having run out of excuses, I spent some time and finished up my client. Along the way, I wrote a HMAC-SHA256 module in Erlang which was an interesting exercise as well. You can find the GitHub project here. It's in its infancy and can only deal with blobs at the moment. But I am taking feature requests so holler if you need something.
Sample
I'm still unsure of the right idiomatic way to do most things in Erlang. I've created it using the gen_server OTP framework though in retrospect, a simple module would have sufficed. Here's some sample test code which accesses storage (the key and account for the local development storage which runs on my machine).
-module(winazuretest).
-export([test/0]).
test() ->
winazure:start({"devstoreaccount1",
"Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==",
true}),
winazure:create_container("test",true),
winazure:put_blob("test","blob1","Hello!","text/plain"),
"Hello!" = winazure:get_blob("test","blob1"),
winazure:delete_blob("test","blob1"),
winazure:delete_container("test").
Observations
- I'm still learning the 'Erlang way' to do things. This is evident in my HMAC implementation and in my storage client. In some places, I have some ugly unspeakable hacks because I couldn't figure out 'the right way' to do it. I would love some feedback on how to 'Erlang-ize' my code.
- When implementing HMAC based on Wikipedia pseudocode, don't assume 'block size' is 256 just because your hash algorithm is 'SHA-256'. An assumption like could just lead to you staring at random strings of numbers for long periods of time on New Year's eve.
- There is some issue with either Erlang's HTTP 1.1 support or Windows Azure's HTTP 1.1 support or the combination of the two. The client keeps dying after a few calls. I couldn't figure it out so I switched to using HTTP 1.0.
- Erlang's bit syntax seems to be the equivalent of cryptic Perl one-liners. Something that a seasoned programmer can crank out with elan but a newbie scratches their head at. My 'bit-syntax-fu' is as weak as my 'regex-fu' and my nightmares now have double angle brackets in them.
- Distel and Erlang's emacs mode are the best way to code Erlang right now. I tried using TextMate's Erlang bundle for a few days but hitting "C-c C-k' to compile the current buffer and be thrown into a Erlang buffer is too good to give up.
- I now need to find some meaty project to do in Erlang which would need me to make use of Erlang's concurrency support.
Thoughts on Erlang
- First of, I'm blocked from completing my library because Erlang doesn't seem to have sha 256 support built-in. Or rather, it seems to be commented out and though there are a couple of questions on the mailing list, there is no clear indication for why this is the case. Since Windows Azure uses HMAC-SHA-256 for its authentication, I'm hosed. Of course, I could try and implement sha 256 support but that really isn't my cup of tea :-)
- Every language has a culture surrounding it, a purpose it shines at. For Perl, this is text processing. For VB, it was business applications. Erlang shines in concurrency and writing fault-tolerant and distributed applications. In fact, I wish I had discovered Erlang when I was trying to high performance networking code a while back - it makes things very, very elegant
- Erlang's abstractions for concurrent and distributed programming are to be marvelled at. I was blown away by how easy it is to launch a process in a remote machine and play with it easily. There's a section in the book where Armstrong talks about how he launches erlang nodes on machines without having made up his mind on how he's going to use them. That's very, very remarkable and I would love something similar in the managed family of languages.
- The language seems awkward and dated. I still can't intuitively remember where to place a semi-colon, a comma or a period. The language doesn't 'feel' modern
- The documentation is sorely lacking. I've been spoilt by my time in Python, Ruby and C# where searching on the web will almost always get you the right result while you hit a one-line method documentation in Erlang.
I'll update this post as I play with Erlang more. Erlang experts out there, if you can help me with my SHA 256 problem, you'll make a fan for life :).
How Red Dog works
Labels: windowsazure
Compressed GZip content from Windows Azure blob storage
On one of our internal mailing lists, someone asked how to serve gzipped content from Windows Azure storage directly. When serving static content (like JS/CSS files) over HTTP, serving them compressed usually gives you big wins in terms of network performance since you have to send less bytes over the wire.
Windows Azure currently doesn't compressing uncompressed data on the fly for you. However, there's nothing stopping you from storing the data compressed in the first place. The key is to set the Content-Encoding header to 'gzip' when uploading the blob so that when the blob storage serves it back out, clients know that the content they're getting is compressed and know how to deal with it.
One caveat with this approach is that there is no way to support clients which can't support gzip decompression. However, all modern browsers support this really well so you shouldn't run into any issues for plain vanilla HTTP content.
Here's a snippet of code illustrating how to do this. In this, I take a file posted over a Asp.Net file upload control and store it compressed in blob storage.
/* Blob to be uploaded,in this case coming from a file upload control */
var inputstream = fileUpload.PostedFile.InputStream;
var contentType = fileUpload.PostedFile.ContentType;
/* We'll use the filename without the path as the blob key */
var blobKey = Path.GetFileName(fileUpload.PostedFile.FileName);
var container = BlobStorage.Create
(
StorageAccountInfo.
GetDefaultBlobStorageAccountFromConfiguration()
).GetBlobContainer("js");
/*Compress the blob we're storing*/
var compressedData = new MemoryStream();
var gzipStream = new GZipStream(compressedData,
CompressionMode.Compress);
gzipStream.Write(fileUpload.FileBytes,0,
fileUpload.FileBytes.Length);
gzipStream.Close();
/* Store the compressed content to blob store
and set the right Content-Encoding header */
container.CreateBlob(
new BlobProperties(blobKey)
{ /* This is the part that makes it all work! */
ContentEncoding = "gzip",
ContentType = contentType
},
new BlobContents(compressedData.ToArray()),
false/*Don't overwrite*/
);
I uploaded a 20K sample JS file and then hit the blob's URL with Firefox. The screenshot shows you how the 20K JS file got compressed down to 7K and then sent down with the right header for the browser to decompress it on the fly

This technique typically works better on large text content like CSS and JS files. I wouldn't recommend using this on images as they are typically highly compressed already. For more on such performance techniques, I would strongly recommend Steve Souders' excellent book.
Labels: windowsazure
Python wrapper for Windows Azure storage
The best way to really learn a system is to write code against it. I spent some time over a weekend and started writing a Python wrapper on top of our storage APIs. I've gotten it to the point where I can authenticate against a storage endpoint, be it development storage on your local machine or the *.core.windows.net endpoints in the sky. I spent some time today implementing the basic blob primitives (list containers, get/put blob but there is a long way to go before it is usable).
Though it is raw, I'm making the code public since I thought it would be instructive for folks trying to figure out how authentication works or trying to implement wrappers in other languages.
I'm hosting the code at a GitHub repository here and you can follow commits as I make them. I'm hoping to get all the blob primitives done by the end of the week and queues if possible. I need to spend some time thinking about what the right way to model table storage in Python is. Here is some sample code below to put and get a blob and a container out of storage
conn = WAStorageConnection(DEVSTORE_HOST, DEVSTORE_ACCOUNT, DEVSTORE_SECRET_KEY)
for (container_name,etag, last_modified ) in conn.list_containers():
print container_name
print etag
print last_modified
conn.create_container("testcontainer", False)
conn.put_blob("testcontainer","test","Hello World!" )
print conn.get_blob("testcontainer", "test")
Here's the magic signing code. This is based on my reading of the docs so this shouldn't be considered 'official' in any way (consult the SDK sample for something of production quality)
def _get_auth_header(self, http_method, path, data, headers):
# As documented at http://msdn.microsoft.com/en-us/library/dd179428.aspx
string_to_sign =""
#First element is the method
string_to_sign += http_method + NEW_LINE
#Second is the optional content MD5
string_to_sign += NEW_LINE
#content type - this should have been initialized atleast to a blank value
if headers.has_key("content-type"):
string_to_sign += headers["content-type"]
string_to_sign += NEW_LINE
# date - we don't need to add header here since the special date storage header
# always exists in our implementation
string_to_sign += NEW_LINE
# Construct canonicalized storage headers.
# TODO: Note that this doesn't implement parts of the spec - combining header fields with same name,
# unfolding long lines and trimming white spaces around the colon
ms_headers =[header_key for header_key in headers.keys() if header_key.startswith(PREFIX_STORAGE_HEADER)]
ms_headers.sort()
for header_key in ms_headers:
string_to_sign += "%s:%s%s" % (header_key, headers[header_key], NEW_LINE)
# Add canonicalized resource
string_to_sign += "/" + self.account_name + path
utf8_string_to_sign = unicode(string_to_sign).encode("utf-8")
hmac_digest = hmac.new(self.secret_key, utf8_string_to_sign, hashlib.sha256).digest()
return base64.encodestring(hmac_digest).strip()
NOTE: This is code written by a program manager with too much free time :-). Use at your own risk and don't blame me if your computer blows up or if demons fly out of your nose.
Thanks to Igor Dvorkin for helping me out with this.
Labels: windowsazure
Windows Azure at Silicon Valley Code Camp
I'll be doing a talk at Silicon Valley Code Camp on Windows Azure tomorrow at 3:45PM. I haven't made up mind on what exactly I'm going to say or what demos I'm going to show but I can promise you that it'll be fun. If you're in the area, ping me if you want to meet up.
What: Windows Azure:Everything you wanted to know about Microsoft's operating system for the cloud
When: 3.45pm on 11/8/2008
Where: Foothill College, 12345 El Monte Road (Parking Lot 5), Los Altos Hills, CA 94022
How do I register: http://www.siliconvalley-codecamp.com/Register.aspx
Labels: windowsazure
Windows Azure - Links and Resources

I've collected a set of Windows Azure links and resources so that I have one central place to point people to. I'll update this page as I find new links or as these links change
Core Resources
www.azure.com : Central site for everything Azure related. From there you'll find the links to sign up tog et on the Windows Azure waiting list
Windows Azure SDK : Lets you build, debug and package services locally. In particular, check out the development fabric and development storage which provide a replica of what you'll see in the cloud
Windows Azure Tools for Microsoft Visual Studio: Don't you love short names? :-) This extends Visual Studio and integrates with the SDK so that you can create, build, debug and publish from within the comfy and familiar environment that is devenv.exe
Windows Azure on MSDN: We have a great page with links to the documentation, blogs, screencasts, etc. Highly recommended
Windows Azure Forum: Several members from my team hang out here and answer questions. If you have a question or comment, do post it here.
PDC talks
Lap around Windows Azure: If you have the time to watch only one talk (and for some reason, you decide that you don't want to watch mine), watch this talk. Manuvir's talk not only filled up the big 2K people room, it filled up three overflow rooms as well
Developing and deploying your first Windows Azure service: Probably the PDC talk with the most code written. Watch Steve go from hello world to showing off AtomPub on top of Windows Azure, all using Asp.Net
Windows Azure:Programming in the cloud: Daniel Wang and Stefan Schakow walk through the programming model and APIs, using both Microsoft and non-Microsoft technologies and write a lot of code along the way
Architecting Services for Windows Azure: Yousef Khalidi, who runs the fabric team, talks about the principles and design of the fabric and the service model. If you want to 'grok' the fabric, watch this talk and Erick's talk below
Under the Hood:Inside the Windows Azure Hosting Environment: Erick Smith starts off and goes under the hood of the fabric and talks about how we deploy and manage the services. ChuckL (yes, the same ChuckL from the original NT team), takes over and talks about how our virtualization works. Lots of low-level details
Windows Azure: Essential Cloud Storage Services: *The* talk to watch if you want to understand Windows Azure storage. Brad Calder (or 'The Brad' as some of us call him) walks through blobs, tables and queues and dives deep into each of them. Brad is the chief architect of our storage systems so he knows everything that is worth knowing about our storage systems :-)
Windows Azure Tables: Programming Cloud Table Storage: Niranjan and Pablo dig into how to model data on top of our tables and show off the Ado.Net data services (Astoria) programming model.
Windows Azure: Cloud service development best practices: The best talk of the lot ;-). I dig into best practices for building cloud services and how they map onto Windows Azure. And I have pictures of goofy monsters and peanut butters among other things
Showcase: Windows Azure enables Live Meeting: How the next generation Live Meeting app was built on top of Windows Azure
Blogs
Windows Azure Blog: The 'official' blog. It's a bit light on content right now but we'll have more soon
Steve Marx: My manager and official team fashion model. This blog actually runs on top of Windows Azure and if you watch his PDC talk, you'll see him create it on the fly
Jim Nakashima: Jim works on the VS extensions and has some great posts. In fact, I'm annoyed that he beat me to some of the posts I intended to write!
Cloud Computing Tools: This is Jim's team's blog. I'm jealous of their prime blogs.msdn.com URL.
Sriram Krishnan: Yours truly
Channel 9 videos
Manuvir Das:Introducing Windows Azure: The whiteboard intro to our stuff
Steve Marx: Windows Azure for developers: Steve talks about what Microsoft's operating system for the cloud means for developers. And I love the look on his face in that thumbnail (caption contest, anyone?)
Labels: windowsazure
Archives
November 2004 January 2006 June 2006 July 2006 August 2006 September 2006 October 2006 November 2006 December 2006 January 2007 February 2007 March 2007 April 2007 May 2007 June 2007 July 2007 August 2007 September 2007 October 2007 December 2007 January 2008 February 2008 March 2008 April 2008 May 2008 June 2008 July 2008 August 2008 September 2008 October 2008 November 2008 December 2008 January 2009