Source control for SQL Server Databases! and More…

The era of painful SQL Server development is over. SQL Server code name Denali will be shipped with a new toolset with code name “Juneau” that has a amazing improvements for database developers. Server explorer gives you Project-Oriented Offline Database Development feature, as written in an MSDN article:

Inside the Server Explorer, you can create a new database project from a running database for offline development. The schema of the current database is then imported into the database project, with each database object represented by a script in the Solution Explorer. For a better viewing experience, you have the option of creating a folder in the Solution Explorer for each schema and/or each object type during the import operation.
While you are working offline, you can invoke the same visual designer tools (TSQL editor and Table Designer) available for online development to make changes to the database scripts, and save all changes to be deployed later.
The offline development experience also provides you with source control functionalities to manage all your scripts.
Another AMAZING feature: You can now add SQLCLR objects directly to the same database project that is opened, without resorting to opening a specific SQL CLR project.  Your TSQL store procedures can interact with your SQLCLR objects within the same project.  Debugging and deployment can also happen seamlessly.
Yet another one:Seamless Integration with SQLCLR  You can now add SQLCLR objects directly to the same database project that is opened, without resorting to opening a specific SQL CLR project.  Your TSQL store procedures can interact with your SQLCLR objects within the same project.  Debugging and deployment can also happen seamlessly.

UPDATE: Check the teched-2011 video from here.

Advertisement

Sampling Data with SQL Server

Larson et al. from Microsoft Research have this paper in ACM SIGMOD 2007. They introduce an interesting sampling mechanism over SQL Server using materialized views. Materialized views are the same as indexed views. If you create an indexed view over a table or a bunch of tables joined together, using their technique, you have a tidy sample for your data which is kept up-to-date automatically.

What caught my eyes was that their base sampling approach is interestingly simple:


CREATE VIEW Sample_TheTableName WITH SCHEMA_BINDING AS
  SELECT ...,
        __RAND = (CAST( ROW_NUMBER()  OVER (ORDER BY RAND()) AS tinyint)
  FROM ...

The additional __RAND column gives a random distribution on data which can give us a statistically valid sample.

The answer for the previous challenge

This is the question in my last post:
If we have two coins which one is double headed, and we toss a coin and is head, what is the probability for the other side to be a tail?

If your answer is 1/2, then you are wrong (as I was). I thought this is the same problem as: what is the probability of a selected coin being a normal coin which is 1/2.

I was told that the answer is 1/3 and for some time I was convinced that 1/3 is the right answer but after thinking more about the problem. I thought again in calm and again realized how naive I was the first time.

To understand the problem better let’s rephrase it as below:
We have two boxes one contains a pair of black beans and the other contains a black bean and a white bean. We pick a bean from a box and it is black. What is the probability of “picking another bean from the same box and it being white”?
Look at the figure below:

After taking a black bean out, the bean might be either from the box with  white bean in it or from the box with black bean in it or again from the box with another black bean in it.

Sometimes formulas come to rescue.

P(H) is the probability of getting the Head. And P(H) = 1 – P(T). There is no conditional probability here. If we translate the problem as: Having a head out, what is the probability of NOT having another head.

Having a head out P(H) = 2/3 and P(T)=1-P(H)=1/3.

A little coin probability problem

Today I have been asked the following question: If we have two coins which one is double headed, and we toss a coin and is head, what is the probability for the other side to be a tail?

I will write the correct answer and why in comments.

Analyze table space usage in SQL Server

It is pretty useful to check space usage of your tables. SQL server gives a nice system stored procedure called sp_spaceused that returns what you want. You can run the following script on your database and export the results to excel.

You can come up with nice graphs showing why your data base is big and what partitioning strategy to choose. For example I ran this query for a project with a 10GB database. I could not understand why only a few million small records  occupy 10GB. The result was expectable. There was an exception log table which occupied half of the space and dropping that table saved so much time for everybody.


DECLARE @spu TABLE(NAME VARCHAR(max), [ROWS] BIGINT, reserved VARCHAR(50), DATA VARCHAR(50), index_size VARCHAR(50), unused VARCHAR(50)
)

DECLARE @TName VARCHAR(80)

DECLARE C CURSOR FAST_FORWARD READ_ONLY FOR
SELECT o.name FROM sys.objects AS o
WHERE o.schema_id = SCHEMA_ID('dbo')
AND o.type = 'U'

OPEN C

FETCH NEXT FROM C INTO @Tname

WHILE @@FETCH_STATUS = 0
BEGIN

	INSERT INTO @spu
	EXEC sys.sp_spaceused @objname = @TName -- nvarchar(776)

FETCH NEXT FROM C INTO @Tname

END

CLOSE C
DEALLOCATE C

SELECT * FROM @spu ORDER BY DATA DESC