How to simulate bag access in Windows azure table storage? (Part 1)


It is nice to hear that Microsoft is providing table storage. Hopefully we can get it for non-azure platforms as well. The idea is fast and scalable access to persisted objects without limitations of tabular world. No doubt that relational databases are amazing and let for super complex queries and transactions to happen. Downside is their complexity of design and usage. It tends to be extremely hard to provide real scalable relational data yet satisfying service level agreements on response time, availability etc.

Efforts on developing non-relational non-schema bound data sets are as old as databases, and in the cloud era, they make so much sense. For example Mnesia is a lovely database designed to work with Erlang with a LINQ-like query language. Enough to say it is developed in 80’s and is easy to scale, and provides 100% uptime (you get a mechanism to do hot patching). I also read about this database (RavenDB) a few days ago which is based on a similar motive.

One important thing to remember when working with non-relational databases, is that they are not relational. Thus, you don’t run SQL scripts against them and there is no join, no views, no foreign keys and primary keys. These terms make sense for tabular data. Databases like table storage are semi-structured data storage. Structured is tabular and relational data storage store them. Semi structure is XML, JSON, or any other form of persisted object. Unstructured is web and free-form text, etc.

Mnesia (as a pioneer of table-storage like databases) stores data in set’s and bags. A set is a table, which each record has a unique key. Fair enough, we are used to work with table with primary key which is the same. But a bag, is a table in which many records can share a key, hence there might be no way to access a single row of a table because it does not have a unique key (You may say now, WTF? what happens to my candidate keys and primary keys – and my answer is wait a minute. We are not in relational world, so non of these terms exist here).

So what is the value of having a row in a table which we can not access it directly? It of course has some value. Bearing in mind again that table storage is not relational, a good design paradigm is to NEVER query anything except the key (and of course partition key for table storage). Any other query (which is not bounded to partition key for table storage) is similar to a full table scan in you SQL Server database and full table (or index) scan is is THE killer. You can never become scalable if you have a single operation with full table scan over your growing data.

to be continued…

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: