View Matching

View Matching is a fairly old technique first seriously utilised at the beginning of the millennium. Lars et al from Microsoft research exploited this technique for SQL Server 2000.

The idea is to utilise existing materialised views (or indexed views in SQL Server) for query optimisations. For example, if you query “SELECT * FROM Cars WHERE Brand=’BMW’ and PRICE BETWEEN 40,000 AND 70,000” and there is already an indexed views over the cars called v_LuxuryCars which is defined as “SELECT * FROM Cars WHERE PRICE > 40,000”, the query optimiser can exploit v_LuxuryCars for executing the query. This view can be much smaller than the base table and the query can run faster. The real beauty of exploiting materialised views will be obvious in more complex queries though.

The challenge here is how to efficiently figure out if a view can be used for the query. Indeed, if the query predicate, is a subset of the view definition predicate (like the given example; PRICE BETWEEN 40k AND 80K is a subset of PRICE > 40K). This is not straight forward since the predicates can be really complicated. For example query may have the predicate A=2 and B=2, while the view definition has the predicate A=B. In that case the inference of the predicate sub/super relationship is  not obvious).

However, this problem is well studied for relational algebra and even though Microsoft have been very active in developing View Matching techniques, it is not used in the SQL Server core, possibly for performance reasons.

The reason I am interested in view matching is that I have started a project for View Matching over IQueriable interface. I will keep you posted about the progress.

Advertisement

Source control for SQL Server Databases! and More…

The era of painful SQL Server development is over. SQL Server code name Denali will be shipped with a new toolset with code name “Juneau” that has a amazing improvements for database developers. Server explorer gives you Project-Oriented Offline Database Development feature, as written in an MSDN article:

Inside the Server Explorer, you can create a new database project from a running database for offline development. The schema of the current database is then imported into the database project, with each database object represented by a script in the Solution Explorer. For a better viewing experience, you have the option of creating a folder in the Solution Explorer for each schema and/or each object type during the import operation.
While you are working offline, you can invoke the same visual designer tools (TSQL editor and Table Designer) available for online development to make changes to the database scripts, and save all changes to be deployed later.
The offline development experience also provides you with source control functionalities to manage all your scripts.
Another AMAZING feature: You can now add SQLCLR objects directly to the same database project that is opened, without resorting to opening a specific SQL CLR project.  Your TSQL store procedures can interact with your SQLCLR objects within the same project.  Debugging and deployment can also happen seamlessly.
Yet another one:Seamless Integration with SQLCLR  You can now add SQLCLR objects directly to the same database project that is opened, without resorting to opening a specific SQL CLR project.  Your TSQL store procedures can interact with your SQLCLR objects within the same project.  Debugging and deployment can also happen seamlessly.

UPDATE: Check the teched-2011 video from here.

Sampling Data with SQL Server

Larson et al. from Microsoft Research have this paper in ACM SIGMOD 2007. They introduce an interesting sampling mechanism over SQL Server using materialized views. Materialized views are the same as indexed views. If you create an indexed view over a table or a bunch of tables joined together, using their technique, you have a tidy sample for your data which is kept up-to-date automatically.

What caught my eyes was that their base sampling approach is interestingly simple:


CREATE VIEW Sample_TheTableName WITH SCHEMA_BINDING AS
  SELECT ...,
        __RAND = (CAST( ROW_NUMBER()  OVER (ORDER BY RAND()) AS tinyint)
  FROM ...

The additional __RAND column gives a random distribution on data which can give us a statistically valid sample.

A generic Database Project using MEF (Managed Extensibility Framework) – Part 1

A database project as of Visual Studio template is a series of .SQL scripts that run sequentially and create the schema of the project. My experience shows that in most real world scenarios, this is not enough. I list generic requirements of a database project here and take further steps of developing one:

  1. There should be a choice to initially create database from the schema script or load it from a basic backup file.
  2. Pre-requisits on the server should be checked.
  3. Should be cross platform. Basically anything that works with running scripts (not only SQL) should be runnable on this.
  4. Should be extensible. (As any other project should be)

Let’s create the solution. A console application should be good for this. We create a new solution called SDP (Simple Database Project).

For the 4th point, we use MEF (Managed Extensibility Framework). MEF is an opensource project supported by Microsoft and I think is a great framework. In this project, we additionally use MEF for IoC (Inversion of Control). In order to use MEF, we add System.ComponentModel.Composition to our solution. Note that a version of MEF is shipped with .Net Framework and Silverlight 4.0 and you don’t have to download anything.

The first step is to generate (create or restore) the initial database. For this purpose we define IDatabaseGenerator which has a method GenerateDatabase(). Any database generator need to do something with a database system which can be a server like SQL Server or can be an application like excel. Whatever the database system is, we do not introduce the requirement of database connection, etc. to our project. To keep the project extensible and simple we assume everything is possible with running a script using some tool or API. Hence, let’s define IScriptExecuter which executes a given script. It has a method ExecuteScript(). Any script executer needs a list of scripts to execute, so let’s define the generic interface IScriptProvider<T>. We keep this interface generic since we don’t know what type of script should we provide. It has a method GetScripts() that returns IEnumerable<T>.

So let’s implement our classes to run a sql script with a commandline tool and generate a database. We need a context to put essentials together, so let’s define SdpContext class. Before that we should have a script class defined to use with our context:

public class Script: IComparable<Script>
    {
        public virtual string Path { get; protected set; }
        public virtual string Title { get; protected set; }
        public virtual int Order { get; set; }

        public int CompareTo(Script other)
        {
            return Order.CompareTo(other.Order);
        }

        public Script(String path, String title, int Order)
        {
            Path = path;
            Title = title;
            Order = Order;
        }
    }
     public partial class SdpContext
    {
        [ImportMany]
        public IEnumerable<Lazy<IDatabaseGenerator>> DbGenerators {get; set;}

        [ImportMany]
        public IEnumerable<Lazy<IScriptProvider<Script>>> ScriptProviders { get; set; }

        [Import(AllowDefault=true, RequiredCreationPolicy = CreationPolicy.Shared)]
        public IVersionController VersionController { get; set; }

        public SdpContext(Assembly[] plugins)
        {
            ComposeFromAssemblies(plugins);
        }

        public void ComposeFromAssemblies(Assembly[] plugins)
        {
            var container = new CompositionContainer();
            var catalog = new AggregateCatalog();
            foreach (var plugin in plugins)
            {
                catalog.Catalogs.Add( new AssemblyCatalog(plugin) );
            }

            container.ComposeParts();
        }
    }

You have spotted some new stuff in the creation of SdpContext class. SdpContext is supposed to be self sufficient, meaning that you it should run by itself. Ofcourse you need to feed it with a bunch of plugins.
SdpContext is a partial class because I wanted to seperate the basic functionality of the context with other extended functionalities.
Import and ImportMany attributes are MEF attributes. They mean that our context needs a bunch of DbGenerators and ScriptProviders. MEF Framework then magically extracts any DbGenerator and ScriptProvider from the plugins and instantiates them for us. They are all defined as Lazy to avoid their instantiation before when we really need them. The interface VersionController is new here and we still don’t know how it should look like.