I am designing a back-end where Database contention is the number-one issue for scalability.
We use those darn distributed transactions and therefore I do not know how long the transaction my component is participating into will last. If a transaction remains open for a few seconds, many of those would create contention in the Database my component is using. This will result in dead-locks, time-outs and scalability would go out of the window.
I didn’t want to go the all no-sql way and forgo any transaction management. The crux of the deal is that during one of those distributed transaction, we are inserting one record while a recurrent task performs a large and complicated select on the entire table. I though if I could wait long enough to insert, until the end of the transaction, I would reduce the contention. Here is how to do it.
First, what you do not want to do is to hook on the TransactionCompleted event of the Transaction.Current object. It is tempting, it is easy, it is right at hand, but it doesn’t work. Well… it works but whatever code you run in the event handler will exist outside the transaction since as the name of the event suggests, the transaction has already completed by then. The point of using transaction is to have many actions being atomically bundled so I didn’t want to have a key operation happening outside the transaction.
The real solution isn’t that much more complicated actually. Check out Transaction.Current.EnlistVolatile. This method allow you to hook yourself in the transaction process. You basically become a transaction participant. We use the volatile version of the method since we do not want to appear as a durable participant since that would involve us being able, like the antic Pheonix, to be born again from our ashes if the process fail. No need for that, I was happy for this to work only if the original process stayed online for the duration of the transaction.
public Enlistment EnlistVolatile( IEnlistmentNotification enlistmentNotification, EnlistmentOptions enlistmentOptions )
We need to implement the IEnlistmentNotification interface:
public interface IEnlistmentNotification
void Commit(Enlistment enlistment);
void InDoubt(Enlistment enlistment);
void Prepare(PreparingEnlistment preparingEnlistment);
void Rollback(Enlistment enlistment);
In my case I simply needed to hold a few variable within the object implementing that interface. Basically the data I wanted to insert in the DB and the Transaction instance itself. Then I had the simple following implementation:
- In Prepare, do the insertion and call PreparingEnlistment.Prepared. This last step is essential since we simulate a transaction resource, we need to confirmed that we prepared in the two-phase commit.
- In Commit, simply call Enlistment.Done.
That’s it! You’re now in the Transaction pipeline and can do late operations.
I wrote a blog entry back in November 2010 about ACID Transactions & SQL Azure.
I assumed light transactions using System.Transactions.TransactionScope were not supported based on an article I’ve just read back then. The article suggested using explicit transaction management (i.e. BeginTransaction, Commit & Rollback on the returned SqlTransaction object) which I found quite backward, hence the title: back to 2001.
I went as far as creating an entry on My Great Windows Azure Idea requesting the ability to use TransactionScope for light transaction-only.
Well, it turns out I was wrong and you do not have to go that close to the metal: TransactionScope is already supported as long you don’t enlist more than one connection into, in which case it would escalate to a distributed transaction, which isn’t supported in SQL Azure.
Sorry about the misleading and thank you Luis Panzano for pointing that out to me!
I meant to write about this since I read about it a little back in July, today is the day.
You know I love Microsoft SQL Azure.
The technology impressed me when it was released. Until then Azure contained only Azure storage. Azure Storage is a great technology if you plan to be the next e-Bay on the block and shove billions of transactions a day to your back-end. If you’re interested into migrating your enterprise application or host your mom & pop transactional web site in the cloud, it’s both an overkill in terms of scalability and a complete paradigm shift. The latter frustrated a lot of early adopter. A few months later, Microsoft replied by releasing SQL Azure. I was impressed.
Not only did they listen to feedback but they worked around the clock to release quite a nice product. SQL Azure isn’t just an SQL Server hosted in the cloud. It’s totally self managed. SQL Azure hosts 3 copies of your data in redundancy, so it’s totally resilient to hardware failures and maintenance: like the rest of Azure it’s build with failure in mind as being part of life and dealt with by the platform. Also, SQL Azure is trivial to provision: just log to Windows Azure portal and click new SQL Azure…
This enables a lot of very interesting scenarios. For instance, if you need to stage data once a month and don’t have to capacity in-house, go for it, you’re gona pay only for the time the DB is on-line. You can easily sync it with other SQL Azure DB and soon you’re gona be able to run reporting in the cloud with it. It’s a one stop shop, you pay for use, you don’t need to buy a license for SQL Server nor for Windows Server running underneath.
Now that is all nice and you might think, let’s move everything there! Ok, it’s currently limited to 50Gb which is a show stopper for some enterprise applications and certainly a lot of e-Commerce applications, but that leaves a lot of scenarios addressed by it.
A little caveat I wanted to talk to you about today is… its lack of distributed transaction support.
Of course, that makes sense. An SQL Azure DB is a virtual service. You can imagine that bogging down those services with locks wouldn’t scale very well. Plus, it’s not because two SQL Azure instances resides in your account that they reside on the same servers. So supporting distributed transactions would lead to quite a few issues.
Now most of you are probably saying to themselves: “who cares, I hate those MS-DTS transactions requiring an extra port to be open anyway and I never use it”. Now you might not use that but you might have become accustomed to using .NET Framework (2.0 and above) class System.Transactions.TransactionScope. This wonderful component allows you to write code with the following elegant pattern:
// Do DB operations
This pattern allows you to declaratively manage your transactions, committing them and rolling back if an exception is thrown.
Now… that isn’t supported in SQL Azure! How come? Well, yes you’ve been using it with SQL Server 2005 & 2008 without ever needing Microsoft Distributed Transaction Service (MS DTS) but maybe you didn’t notice it but you were actually using a feature introduced in SQL Server 2005: upgradable transaction. This allows SQL Server 2005 to start a transaction as a light transaction on one DB and if need be, with time, to upgrade it to a distributed transaction on more than one transactional resources (e.g. another SQL Server DB, an MSMQ queue or what have you).
When your server doesn’t support upgradable transactions (e.g. SQL Server 2000), when you use System.Transactions.TransactionScope, it opens a distributed transaction right away.
Well, SQL Azure doesn’t support upgradable transactions (presumably because they have nothing to upgrade to), so when your code will run, it will try to open a distributed transaction and will blow.
Now, it depends what you do. If you’re like a vast majority of developers (and even some architect) and you think that ACID transactions is related to LSD, then you probably never manage transactions at all in your code, so this news won’t affect you too much. If you’re aware of transaction and like me embraced System.Transactions.TransactionScope and sprinkled it over your code like if it was paprika on an Hungarian dish, then you might think that migrating to SQL Azure will take a little longer than an afternoon.
Now it all varies. If you wrapped your SQL Connection creation in a factory, you might be able to pull out something a little faster.
Anyhow, I found that feature quite disappointing. A lot of people use SQL Server light transactions and that would be (I think) relatively easy to support. The API could blow when you try to upgrade the transaction. I suppose this would be a little complicated since it would require a new provider for SQL Azure. This is what I proposed today at:
So please go and vote for it!
I’ve been working some more on some detailed design of the Azure Pub-Sub CodePlex project. This project aims at creating a pub-sub messaging infrastructure running on Windows Azure & using Azure Storage.
Azure storage doesn’t support ACID transactions in general. Each operation is a transaction within a given partition, but you can’t span a transaction on many operations. Now there is a very good reason for that: ACID transactions don’t scale well. They are typically implemented using locks which create contention and also require a trust between parties (should I trust you holding a lock on my resources?). It also creates challenges in terms of high-availability: what do you do if one of the party fails during a transaction, should the transaction be shared with fail-over servers? Azure is built with a scale-less mind-set and therefore it doesn’t support transactions spanning multiple operations. SQL Azure does support ACID transactions within one database. But SQL Azure isn’t Azure Storage despite what the name of the technologies would let you believe. Azure Storage is basically Tables, Queues & Blobs.
The challenge for me is that I’m not used to design in an environment not supporting transactions. I eat transactions for breakfast and architected many systems to rely on them to guarantee the integrity of the systems. At the same time, this is the whole point of this CodePlex project: to meet those challenges and learn from it. So let’s do that!
The main scenario I was trying to secure was the pushing of messages into a queue. Now this scenario goes as follow:
- An agent (Azure Worker Role) takes a message from an Azure Queue
- The agent process the message and determine which subscriber it should send it to
- The agent persists a bunch of meta data in tables & blobs
- The agent persists an ID in another Azure Queue
- The agent sends a notification via TCP or HTTP
I wanted to have this scenario to be fail-tolerant in that, if the agent fails at any of those point, at the end, the message will still be sent once-and-only-once to the subscribers.
You can’t secure this scenario with a wrapping transaction. So what can you do?
Well, I won’t go through the trial & errors that went though my head. Instead I’ll go directly to the solution. First let’s formalize the operations happening:
- Read Input Azure Queue (making the message invisible)
- Process Message, Find subscribers
- Persists Meta Data in Tables
- Persists Data in Blobs
- For each targeted subscriber
- Check a confirmation table to see if the message exists, if so, go to step 6.
- Push an ID in the Azure Queue associated with the subscriber
- Write the subscriber queue-message ID into a confirmation table
- Send a notification to them (TCP)
- Remove the Input Azure Queue message (it was invisible, now we permanently delete it)
Well, that’s about the solution, if you add some checking code in there. Basically if we fail before point #6, the input message will eventually become visible again and will be reprocessed by the agent again. The only thing we have to make sure is that we are fault tolerant if the agent finds half-done work on that message.
Fort instance, when we’re writing meta in the tables or blob, we have to make sure that the write either overwrite previous writes or fail and that failure is detected, we check if what is already written is correct and continue.
The key point is at point 5.3. Basically, we confirm the message-push operation. This ensures that the message is sent once-and-only-once. If we fail at point 5.3, the process will restart and the message will be sent twice on the subscriber queue BUT the agent READING it will be able to discard the previous version because they don’t have the queue-ID in the confirmation table.
I find this solution pretty elegant because it actually supports another more complex scenario, which was bulk-messaging. I need to be able to process a message containing a sequence of messages, each of those should be processed and sent to their respective subscribers. With this pattern, I can do it.
The main axis of this pattern is to have ONE gateway for a data operation, even if the data is spread around. Basically, you can write data everywhere, but your transaction isn’t considered completed until this gateway data operation is completed. This way you can restart as often as you like as long as you can handle already-written-data.
The major draw-back of this pattern is that it can create incomplete data-operation junk in storage. This can be taken care of by purging mechanism.
This pattern doesn’t cover every transaction, but is quite generic so I’ll be able to leverage it at a bunch of places.
Did you ever face a similar challenge? What was the solution you used? Does this pattern seems optimal to you? If not, why I’ would love to hear from you!