Azure DocumentDB: first use cases

A few weeks ago Microsoft released (in preview mode) its new NoSQL Database: DocumentDB.

Not Only SQL (NoSQL) databases are typically segmented in the following categories: Key-Value (e.g. Azure Table Storage, Redis), Column (e.g. HBase, Cassandra), Document (e.g. CouchDB, MongoDB) & Graph. By its name but mostly by its feature set, DocumentDB falls in the document category.

My first reaction was Wow, a bit late at the party!

Indeed, the technology space of NoSQL has slowly started to consolidate so it would seem a bit late to get a new product on this crowded market place, unless you have value-added features.

And DocumentDB does. Its main marketing points are:

  • SQL syntax for queries (easy ramp-up)
  • 4 consistency policies, giving you flexibility and choice

But then you read a little bit more and you realise that DocumentDB is the technology powering OneNote in production with zillions of users. So it has been in production for quite a while and should be rather stable. I wouldn’t be surprised to learn that it is behind the new Azure Search as well (released in preview mode the same day).

Now what to do with that new technology?

I don’t see it replacing SQL Server as the backbone of major project anytime soon.

I do see it replacing its other Azure NoSQL brother-in-law… Yes, looking at you Azure Table-Storage with your dead-end feature set.

Table Storage had a nice early start and stalled right after. Despite the community asking for secondary indexes, they never came, making Table Storage the most scalable write-only storage solution on-the-block.

In comparison DocumentDB has secondary indexes and the beauty is that you do not even need to think about them, they are dynamically created to optimize the queries you throw at the engine!

On top of indexes, DocumentDB, supporting SQL syntax, supports batch-operation. Something as simple as saying ‘delete all the verbose logs older than 2 weeks’ requires a small program in Table Storage and that program will run forever if you have loads of records since it will load each record before deleting it. In comparison, DocumentDB will accept a delete-criteria SQL (one line of code) command and should perform way faster.

Actually Logging is the first application I’m going to use DocumentDB for.

Having logs in Table Storage is a royal pain when time comes to consume the log. Azure Storage Explorer? Oh, that’s fantastic if you have 200 records. Otherwise you either load them in Excel or SQL, in both cases defying the purpose of having a scalable back-end.

 

Yes, I can see DocumentDB as a nice intermediary between Azure SQL Federation (where scalability isn’t sufficiently transparent) and Table Storage (for reasons I just enumerated). In time I can see it replacing Table Storage, although that will depend on the pricing.

I’ll try to do a logging POC. Stay tune for news on that.

How to improve Azure: Granularity of access

In this blog series I explore some of the shortcomings of the Microsoft Azure platform (as of this date, April 2014) and discuss ways it could be improved. This isn’t a rant against the platform: I’ve been using and promoting the platform for more than four (4) years now and I’m very passionate about it. Here I am pointing at problems and suggest solutions. Feel free to jump in the discussion in the comments section!

The past blog entries are:

In my last post, I discussed about the variety of security models in the Azure platform, i.e. the different ways to authenticate or to get access to a resource.  This time, I would like to discuss the granularity of access.

In any system, once you are authenticated, you are authorized to some access / actions.  Access comes in different flavours:  read/write, create/delete, etc.  .

A recurring theme in authorization schemes is the concept of a hierarchy of rights in order to get around the complexity related to fine granularity of access.  For example, in both Windows File System & SharePoint, each file can be denied to a group or specific users but typically access is managed at a higher level, e.g. folders, library, site, site collection, etc.  .

If we now look at different Azure services, we’ll find the same the same diversity we found in security models.

SQL Azure has the same authorization scheme than SQL Server, i.e. Database, Schemas, objects (e.g. tables, views).  Service Bus has three possible actions, manage, read & write and those can be hierarchically given to an entire namespace or some sub domain.  Active Directory has two permissions for the Graph API, read & read-write applied to the entire directory.

Now the weakness I’ve experimented is the lack of granularity in some services, mainly Azure Storage & Active Directory.

Active Directory Graph API grants read or read-write access to the entire directory.  I cannot give to an application the ability to manage only some groups:  it’s a master switch, once I gave the access, it’s total.

In the case of Azure Storage, there is two types of access:  via SAS or via access keys.  With the access key, you’re the king of the entire storage account:  you can create / delete containers and read-write wherever you want.  The SAS comes in two flavours.  Ad hoc ones have quite granular access:  they can be given to a file with read or read-write access or on a container with read, read-write, or list (to obtain the list of files).  SAS policies on the other hand are good for an entire container so once you give one to a system, it can do whatever you authorized it to do (e.g. write) to the entire container.  So ad-hoc SAS sounds good, don’t they?  Except they are short lived and must be created…  using the access keys!

 

The main problem with not having fine grain enough access controls is that you end up giving more access that you want to.  I give you access to write in this entire blob container but please write only in this sub folder.  If files start to disappear, you’ll be on the list of suspects though.

There is an overarching principle in Information Technology, Principle of least privilege, give access to an agent ONLY what they need to do their business.  In order to do that, you need the underlying platform to support that.

Microsoft Pattern & Practices has a corresponding pattern, the valet key pattern.  This is a quite specialized view on the principle of least privilege, limited to SAS.  The problem with SAS, as I mentioned in previous articles, is that it multiplies the number of secrets you need to keep as a system.  If, as a system, I access 10 resources, I need to keep 10 SAS, i.e. 10 secrets since if I divulge a SAS to a third party, said third party now has access it shouldn’t have.  An authentication / authorization mechanism minimizes the number of secrets to one:  the secret you need to authenticate.

 

So…  how can we improve?

Well, the conceptual solution is quite easy:

  • Make sure to have the most granular access scheme possible
  • Use other mechanisms, e.g. hierarchy, to ensure the granularity doesn’t make the access unmanageable

This is quite coupled with the previous article about unifying the security models:  you need to authenticate a user / agent in order to grant them access.  But even with SAS you could go more granular than is currently the case.

How to improve Azure: Security Models

In this blog series I explore some of the shortcomings of the Microsoft Azure platform (as of this date, April 2014) and discuss ways it could be improved. This isn’t a rant against the platform: I’ve been using and promoting the platform for more than four (4) years now and I’m very passionate about it. Here I am pointing at problems and suggesting solutions. Feel free to jump in the discussion in the comments section!

The past blog entries are:

 

What is the security model of Microsoft Azure?

This question is at the heart of the weakness I would like to discuss in this article:  there are no unique security model in Azure.  There are a plethora of models, depending on which Services you are consuming.  Let’s look at some examples:

  • Azure Storage:  you can either use an access key (there are two:  primary or secondary) in the Authorization cookie of HTTP requests, which gives you all privileges within an entire storage account or you can use Shared Access Signature (SAS) that you place in the query strings of your HTTP requests, which gives you limit access (see valet key).  There are actually two flavours of SAS, an ad hoc form and a more permanent one.
  • SQL Azure:  user name / password of an SQL Account.  This is the same mechanism used with on-premise SQL Server that has been discouraged by Microsoft to used in opposition to the Windows Integrated mechanism, which isn’t supported by SQL Azure.
  • Service Bus:  access token provided by Access Control Service (ACS) or a SAS (independent of the Azure Storage SAS).  Both mechanisms can give limited access to resources (see my past blog on how to secure Service Bus access).
  • Azure Active Directory Graph API:  JWT access token provided by Azure Active Directory using OAuth-2.

Those are just a few and they are the access to the service itself.  When you want to manage those services (e.g. creating an SQL Azure Database), you may have a different security model.

If you use Microsoft Azure lightly, i.e. one or two services, it might not appear as a problem.  Once you start using Azure a bit more as a development platform, then this lack of uniformity will hit you:

  1. You have a lot of protocols to understand and implement.
  2. You need to work around different limitations, e.g. I can be very granular on access on the Service Bus but with SAS Policies in Azure Storage, the SAS is good for an entire container.
  3. You have a lot of secrets to store, since many Services do not share secrets (see my blob on secrets for more details).
  4. You have a lot of credentials to manage, e.g. you need to implement retention policies and renew different secrets in different services which all have different management interfaces.  This increases the chances of error or more typically, of not automating such policies.

 

How could we fix that?

 

I would propose a dual security model for each Service.

The primary Security Model would be Claims based and not even limited to Microsoft Azure Active Directory but open to any Identity provider.  For ease of use, your Azure Subscription could be configured with a set of trusted Identity Provider (i.e. URI & signing keys, token type).  When you want to configure an access you could then have a standard dialog where you pick the Identity Provider and compose a Claims rule (e.g. I want the claim type & claim value or those types & value, or either this combination or that one, etc. .).

This would enable your solution to have Service Identities managed wherever you want (although Azure Active Directory would be easier since it is part of the platform) and the same Identity could be used to access SQL Azure, Service Bus, Media Services, etc.  .  Quite a bit like we do on-premise with Service Accounts.

The access token provided by the Identity provider would need to be passed in the Authorization cookie at each HTTP Request or differently for different protocol (e.g. TCP TDS for SQL, proprietary TCP for Service Bus, etc.).

image

The secondary Security Model could be unique to each Service but would typically address the shortcomings of the primary one.  The primary model I define here requires access to an Identity Provider (this one must be up), configuration of an account in that Provider, etc.  .  A simpler Security Model, e.g. SAS, would be quite good for faster ramp-up with a Service (although it has the weakness of multiplying the number of secrets).

 

With this Primary / Secondary Security Model we would have a robust security model (Primary) and one for quicker use (Secondary).  When using the primary security model, it is quite possible to have only one secret to store for an application:  the user name & password of the Service Identity associated with the application.  This would allow for central management of the secret and much less complex logic to store and use the secrets.

 

Hope this was useful!

How to improve Azure: Can you keep a secret?

In this blog series I explore some of the shortcomings of the Windows Azure platform (as of this date, March 2014) and discuss ways it could be improved. This isn’t a rant against the platform: I’ve been using and promoting the platform for more than four (4) years now and I’m very passionate about it. Here I am pointing at problems and suggesting solutions. Feel free to jump in the discussion in the comments section!

   
 

What is a secret in the context of a Cloud Application?

A secret is any credentials giving access to something. Do I mean a password? Well, I mean a password, a username, an encryption key, a Share Access Signature (SAS), whatever gives access to resources.

A typical Cloud application interacting with a few services accumulates a few of those. As an example:

  • User name / password to authenticate against the Azure Access Control Service (ACS) related to an Azure Service Bus (you access more than one Service Bus namespace? You’ll have as many credentials as namespaces you are interacting with)
  • SAS to access a blob container
  • Storage Account Access key to access a table in a Storage Account (yes you could do it with SAS now, but I’m striking for diversity in this example ;) )

All those secrets are used as input to some Azure SDK libraries during the runtime of the application. For instance, in order to create a MessagingFactory for the Azure Service Bus, you’ll need to call a CreateAsync method with the credentials of the account you wish to use.

This means your application requires to know about the credentials: a weakness right there!

Compare this with a typical way you configure an application on Windows Server. For instance, you want an IIS process to run under a given Service account? You asked your favorite sys-admin to punch in the Service Account name & password into the IIS console at configuration time (i.e. not at runtime). The process will then run under that account and never the app will need to know the password.

This might look like a convenience but it’s actually a big deal. If your app is compromised in the Windows Server scenario, there is no way it can reveal the user credentials. In the case of your Azure app, well, it could reveal it. Once a malicious party has access to account credentials, it gives it more freedom to attack you than just having access to an app running under that account.

But it doesn’t stop there…

Where do you store your secret on your Azure app? %99 of the time, in the web.config. That makes it especially easy for a malicious party to access your secrets.

Remember, an application deployed in Azure is accessible by anyone. The only thing protecting it is authentication. If you take an application living in your firewall and port it to the cloud, you just made it much more accessible (which is often an advantage because partners or even your employees, from an hotel room, have access to it without going through the hoops of VPN connections) but are also forced to store credentials in a less secure way!

On top of that, in terms of management, it’s a bit awkward because it mixes application parameters with secrets. Once a developer deploys or creates a deployment package to pass it to the sys-admin (or whoever plays that role, it might be a dev-ops developer, but typically, not everyone in the dev group will know about production credentials), it must specifies some arbitrary config keys the sys-admin must override.

So in summary, we have the following issues:

  • Application knows secrets
  • Secrets are stored in an unsecure way in the web.config
  • Secrets are stored with other configuration parameters and do not have a standard naming (you need to come up with one)

 

Ok. How do we fix it?

This one isn’t easy. Basically, my answer is: in the long run we could but cloud platforms haven’t reached a mature enough level to implement that today. But we can establish a roadmap and get there one day with intermediary steps easing the pain along the way.

Basically, the current situation is:


That is, the app gets credentials from an unsecure secret store (typically web.config) then request an access token from an identity / token provider. It then uses that token to access resource. The credentials aren’t used anymore.

So a nice target solution would be:


Here the application requests the token from Windows Azure (we’ll discuss how) and Azure reads the secrets and fetch the token on behalf of the application. Here the application never knows about the secrets. If the application is compromised, it might still be able to get tokens, but not the credentials. This is a situation comparable to the Windows Server scenario we talked above.

Nice. Now how would that really work?

Well, it would require a component in Azure, let’s call it the secret gateway, to have the following characteristics:

  • Have access to your secrets
  • Knows how to fetch tokens using the secrets (credentials)
  • Have a way to authenticate the application so that only the application can access it

That sounds like a job for an API. Here the danger is to design a .NET specific solution. Remember that Azure isn’t only targeting .NET. It is able to host PHP, Ruby, Python, node.js, etc. . On the other hand, if we move it to something super accessible (e.g. Web Service), we’ll have the same problem to authenticate the calls (i.e. requirement #3) than how we started.

I do not aim at a final solution here so let’s just say that the API would need to be accessible by any runtime. It could be a local web service for instance. The ‘authentication’ could then be a simple network rule. This isn’t trivial in the case of a free Web Site where a single VM is shared (multi-tenant) between other customers. Well, I’m sure there’s a way!

The first requirement is relatively easy. It would require Azure to define a vault and only the secret gateway to have access to it. No rocket science here, just basic encryption, maybe a certificate deployed with your application without your knowledge…

The second requirement is where the maturity of the cloud platform becomes a curse. Whatever you’ll design today, e.g. oauth-2 authentication with SWT or JWT, is guaranteed to be obsolete within 2-3 years. The favorite token type seems to be changing every year (SAML, SWT, JWT, etc.), so is the authentication protocol (WS-Federation, OAuth, OAuth-2, XAuth, etc.).

Nevertheless it could be done. It might be full of legacy stuff after 2 years, but it can keep evolving.

I see the secret gateway being configured in two parts:

  • You specify a bunch of key / values (e.g. BUS_SVC_IDENTITY : “svc.my.identity”)
  • You specify token mechanism and their parameter (e.g. Azure Storage SAS using STORAGE_ACCOUNT & STORAGE_ACCOUNT_ACCESS_KEY)

You could even have a trivial mechanism simply providing you with a secret. The secret gateway would then act as a vault…

We could actually build it today as a separate service if it wasn’t from the third requirement.

 

Do you think this solution would be able to fly? Do you think the problem is worth Microsoft putting resources behind it (for any solution)?

Hope you enjoyed the ride!

How to improve Azure

I’m very passionate about Windows Azure. I’ve been using and promoting the platform for more than four (4) years now.

So I’ve been working with the technology for a while but in the recent month I’ve been involved on an intensive architecture project where we pushed the envelope of the platform. As a consequence we did hit quite a few limitation of the platform.

I also had the pleasure of working directly with Microsoft to resolve some of those issues.

In this blog series I will address what still remain to this date (March 2014) limitations of the platform. Instead of winning about it, I will suggest ways Azure could be improve to address those shortcomings. That will be more constructive and will generate some discussion. Feel free to jump in the discussion in the comments section!

Azure ACS fading away

ACS is on life support for quite a while now.  It was never never fully integrated to the Azure Portal, keeping the UI it had in its Azure Labs day (circa 2010, for those who were born back then).

In an article last summer, Azure Active Directory is the future of ACS, Vittorio Bertocci announces the roadmap:  the demise of ACS as Windows Azure Active Directory (WAAD) beefs up its feature set.

In a more recent article about Active Directory Authentication Library (ADAL), it is explained that ACS didn’t get feature parity with WAAD on Refresh Token capabilities.  So it has started.

For me, the big question is Azure Service Bus.  The Service Bus uses ACS as its Access Control mechanism.  As I explained in a past blog, the Service Bus has a quite elegant and granular way of securing its different entities through ACS.

Now, what is going to happened to that when ACS goes down?  It is anyone’s guess.

Hopefully the same mechanisms will be transposed to WAAD.