Azure DocumentDB: first use cases

A few weeks ago Microsoft released (in preview mode) its new NoSQL Database: DocumentDB.

Not Only SQL (NoSQL) databases are typically segmented in the following categories: Key-Value (e.g. Azure Table Storage, Redis), Column (e.g. HBase, Cassandra), Document (e.g. CouchDB, MongoDB) & Graph. By its name but mostly by its feature set, DocumentDB falls in the document category.

My first reaction was Wow, a bit late at the party!

Indeed, the technology space of NoSQL has slowly started to consolidate so it would seem a bit late to get a new product on this crowded market place, unless you have value-added features.

And DocumentDB does. Its main marketing points are:

SQL syntax for queries (easy ramp-up)
4 consistency policies, giving you flexibility and choice

But then you read a little bit more and you realise that DocumentDB is the technology powering OneNote in production with zillions of users. So it has been in production for quite a while and should be rather stable. I wouldn’t be surprised to learn that it is behind the new Azure Search as well (released in preview mode the same day).

Now what to do with that new technology?

I don’t see it replacing SQL Server as the backbone of major project anytime soon.

I do see it replacing its other Azure NoSQL brother-in-law… Yes, looking at you Azure Table-Storage with your dead-end feature set.

Table Storage had a nice early start and stalled right after. Despite the community asking for secondary indexes, they never came, making Table Storage the most scalable write-only storage solution on-the-block.

In comparison DocumentDB has secondary indexes and the beauty is that you do not even need to think about them, they are dynamically created to optimize the queries you throw at the engine!

On top of indexes, DocumentDB, supporting SQL syntax, supports batch-operation. Something as simple as saying ‘delete all the verbose logs older than 2 weeks’ requires a small program in Table Storage and that program will run forever if you have loads of records since it will load each record before deleting it. In comparison, DocumentDB will accept a delete-criteria SQL (one line of code) command and should perform way faster.

Actually Logging is the first application I’m going to use DocumentDB for.

Having logs in Table Storage is a royal pain when time comes to consume the log. Azure Storage Explorer? Oh, that’s fantastic if you have 200 records. Otherwise you either load them in Excel or SQL, in both cases defying the purpose of having a scalable back-end.

Yes, I can see DocumentDB as a nice intermediary between Azure SQL Federation (where scalability isn’t sufficiently transparent) and Table Storage (for reasons I just enumerated). In time I can see it replacing Table Storage, although that will depend on the pricing.

I’ll try to do a logging POC. Stay tune for news on that.

Vincent-Philippe Lauzon's

Azure, Data & Machine Learning