To Cache or Not To Cache? This is the Question!

How many times in a developing a business app you have wondered if you should cache the data in your web server to reduce the number of database queries for performance.

Assuming you have unlimited memory, should you cache the whole table into memory (e.g. using ASP.Net method output cache) or you should rely on database and use Linq to database directly?

I guess, the answer to this question is pretty obvious. We know the limitation of both methods, linq to object, till now has no support for indexes and would perform memory scan for any query, while linq to DB has all the overhead of connection and query parsing, etc.

So, lets do a little test to see what is the actual limits of these methods? We will set up a super simple DB with a table that contains two integers. Then hammer DB with queries and cache lookups for different cache size to compare the result.

Run the following query to set up our DB:

create table Smash( Id int primary key, Val int )

declare @i int = 0
while @i<1000000
	insert into Smash (Id, Val)
	values (@i, @i)
	set @i = @i + 1

create nonclustered index [IX_val] on [dbo].[Smash]
	[Val] asc,
	[Id] asc

Now that we have our DB ready, lets write query for the result.

        const int Cnt = 100;
        private void button1_Click(object sender, EventArgs e)
            var cacheSize = 1000000;
            for (cacheSize = 100; cacheSize < 1000000; cacheSize *= 2)
                var cache = GetCecheAsBigAs(cacheSize);
                var i = cacheSize / 2;
                var time1 = QueryDBMultipleTimes(Cnt, i);
                WriteResult(time1, "DB", cacheSize);
                var time2 = QueryMemoryMultipleTimes(Cnt, i, cache);
                WriteResult(time2, "MEM", cacheSize);

And the query methods:

        private List GetCecheAsBigAs(int cacheSize)
            using (var db = new TempDataContext())
                return db.Smashes.Take(cacheSize).ToList();

        private TimeSpan QueryMemoryMultipleTimes(int Cnt, int lookup, IEnumerable cache)
            var t = DateTime.Now;
            for(var i=0; i< Cnt; i++)
               cache.Where(s => s.Val == lookup).First().Val.Value;
            return DateTime.Now - t;

        private TimeSpan QueryDBMultipleTimes(int Cnt, int lookup)
            using (var db = new TempDataContext())
                var t = DateTime.Now;
                for (var i = 0; i < Cnt; i++)
                     db.Smashes.Where(s => s.Val == lookup).First().Val.Value;
                return DateTime.Now - t;

Now comes the interesting part. I ran the above code and graphed the result:

The blue line represents cache access and the red one represents DB access. It can be clearly seen that cache terribly beats DB until the cache size becomes 20% of the table size. It can also be spotted that DB access has a steady response time. Remember we don’t change the DB size at all during the test. However, this graph has got no new message. It is the same old graph of linear vs tree access to data.

But don’t be fooled with this experiment. There are things to note of. First, our data structure is super simple. On a more complex query when lookups and joins come in, DB would definitely over perform memory drastically due to all the query optimization effort made into DBMS cores in last 30 years. This graph also doesn’t show the overhead of populating and re-populating cache. However, if your data is reasonably small (up-to a few thousand records), and your query is fairly simple, cache it in the web server.



One Response to “To Cache or Not To Cache? This is the Question!”

  1. jerald Says:

    Nice article.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: