Unboxing MS SQL Server 2012 – The new analytic functions

As mentioned in my previous blot post, SQL server 2012 comes with 8 new analytic functions:

  • LEAD
  • LAG
  • LAST_VALUE
  • FIRST_VALUE
  • PERCENTILE_CONT
  • PERCENTILE_DISC
  • PERCENT_RANK
  • CUME_DIST

All the new analytic functions work with the window functions I described earlier in this blog series, and calculate an aggregate in the given window.
Let’s start by looking at the LAG and LEAD functions. The syntax looks like this

LAG/LEAD (scalar_expression [,offset] [,default])
OVER ( [ partition_by_clause ] order_by_clause )

Let’s add them to the query and dataset I introduced earlier:


SELECT TOP 10
[Date] ,
[Last] ,
LEAD([Last],1) OVER
(PARTITION  BY YEAR(Date), MONTH(Date), Name ORDER BY Date DESC ) AS 'Lead 1' ,
LAG([Last],1) OVER
(PARTITION  BY YEAR(Date), MONTH(Date), Name ORDER BY Date DESC ) AS 'Lag 1' ,
LAG([Last],2) OVER
(PARTITION  BY YEAR(Date), MONTH(Date), Name ORDER BY Date DESC ) AS 'Lag 2'
FROM dbo.Stock ORDER BY DATE DESC

Which yield the result:


Date                    Last                  Lead 1                Lag 1                 Lag 2

----------------------- --------------------- --------------------- --------------------- ---------------------

2012-04-19 00:00:00.000 51,45                 51,65                 NULL                  NULL

2012-04-18 00:00:00.000 51,65                 53,35                 51,45                 NULL

2012-04-17 00:00:00.000 53,35                 55,55                 51,65                 51,45

2012-04-16 00:00:00.000 55,55                 48,95                 53,35                 51,65

2012-04-13 00:00:00.000 48,95                 50,55                 55,55                 53,35

2012-04-12 00:00:00.000 50,55                 50,65                 48,95                 55,55

2012-04-11 00:00:00.000 50,65                 50,20                 50,55                 48,95

2012-04-10 00:00:00.000 50,20                 52,50                 50,65                 50,55

2012-04-09 00:00:00.000 52,50                 52,50                 50,20                 50,65

2012-04-06 00:00:00.000 52,50                 52,50                 52,50                 50,20

The LAG and LEAD functions are pretty self-explaining, but it is a feature that I often need in my work, and this new way of getting those values are a great addition to the SQL Server.

The FIRST/LAST_Value works pretty much the same way, and are just as self-explaining. The syntax looks like this

FIRST/LAST_VALUE ( [scalar_expression )     OVER ( [ partition_by_clause ] order_by_clause rows_range_clause )

Pay attention to the rows_range clause, if you want to make sure that the first/last row function doesn’t stop at the current row. Let’s look at the SQL:

SELECT TOP 50
[Date] ,
[Last] ,
FIRST_VALUE([Last]) OVER
        (PARTITION  BY YEAR(Date), MONTH(Date), Name ORDER BY Date ASC )
        AS 'First value in month' ,
LAST_VALUE([Last]) OVER
        (PARTITION  BY YEAR(Date), MONTH(Date), Name ORDER BY Date ASC
        RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING )
        AS 'Last Value in month'
FROM dbo.Stock
ORDER BY DATE DESC

Result:

Date                    Last                  First value in month  Last Value in month

----------------------- --------------------- --------------------- ---------------------

2012-04-19 00:00:00.000 51,45                 55,30                 51,45

2012-04-18 00:00:00.000 51,65                 55,30                 51,45

2012-04-17 00:00:00.000 53,35                 55,30                 51,45

2012-04-16 00:00:00.000 55,55                 55,30                 51,45

2012-04-13 00:00:00.000 48,95                 55,30                 51,45

2012-04-12 00:00:00.000 50,55                 55,30                 51,45

2012-04-11 00:00:00.000 50,65                 55,30                 51,45

2012-04-10 00:00:00.000 50,20                 55,30                 51,45

2012-04-09 00:00:00.000 52,50                 55,30                 51,45

2012-04-06 00:00:00.000 52,50                 55,30                 51,45

2012-04-05 00:00:00.000 52,50                 55,30                 51,45

2012-04-04 00:00:00.000 52,50                 55,30                 51,45

2012-04-03 00:00:00.000 54,95                 55,30                 51,45

2012-04-02 00:00:00.000 55,30                 55,30                 51,45

2012-03-30 00:00:00.000 56,60                 56,45                 56,60

2012-03-29 00:00:00.000 55,60                 56,45                 56,60

2012-03-28 00:00:00.000 58,20                 56,45                 56,60

2012-03-27 00:00:00.000 58,15                 56,45                 56,60

2012-03-26 00:00:00.000 60,95                 56,45                 56,60

2012-03-23 00:00:00.000 60,00                 56,45                 56,60

2012-03-22 00:00:00.000 58,25                 56,45                 56,60

2012-03-21 00:00:00.000 58,80                 56,45                 56,60

2012-03-20 00:00:00.000 59,30                 56,45                 56,60

2012-03-19 00:00:00.000 62,40                 56,45                 56,60

2012-03-16 00:00:00.000 63,95                 56,45                 56,60

The last 4 functions PERCENTILE_CONT, PERCENTILE_DISC, PERCENT_RANK, CUME_DIST are Window Distribution Functions used to do statistical computations. Coming up with an example about how to use those will take too long, but the syntax is similar to the other functions, and if you know the math behind, they are straightforward to use.

Conclusion

But to sum up. The conclusion is the same as in my previous blog post in this series. The new analytic functions a very nice addition to the new SQL server, and we can now do more advanced analytics in a much simpler way.

Monitor the process of long running SQL queries like DBCC SHRINKDATABASE

When a query runs a little longer than I would expect, I sometimes starts to get nervous. Especially in the cases, where there is no status telling me when it will finish; Like for instance when I run a shrink database cmd.

You can get information about the queries running like this.


SELECT 
T.text, 
R.Status, 
R.Command, 
DatabaseName = db_name(R.database_id) , 
R.cpu_time, 
R.total_elapsed_time, 
R.percent_complete
FROM sys.dm_exec_requests R 
CROSS APPLY sys.dm_exec_sql_text(R.sql_handle) T


Especially the percent_complete is nice.

Unboxing MS SQL Server 2012 – The new window functions

SQL server 2012 comes with a new set of window functions and analytically possibilities, which I will look into in this blog post.

Prior to 2012
Prior to the 2012 version, windows functions were also possible. The syntax looked like this:

Ranking Window Functions
< OVER_CLAUSE > :: =
OVER ( [ PARTITION BY value_expression , … [ n ] ]
<ORDER BY_Clause> )

Aggregate Window Functions
< OVER_CLAUSE > :: =
OVER ( [ PARTITION BY value_expression , … [ n ] ] )

Where the over clause creates a window for the entire result set, and the PARTITION BY divide the window to a lot of smaller distinct windows. This can be used, when you look at a row with a value, for instance, to a product and day, and you want to compare it to the value for the same product to month or year.

To demonstrate both the new and old version I’ll introduce a dataset.

Let’s look at the Vestas share prices. I’ll create a table only with the share prices for Vestas, like so

CREATE TABLE [dbo].[Stock]
(
[StockID] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](50) NULL,
[Date] [datetime] NULL,
[OpeningPrice] [money] NULL,
[High] [money] NULL,
[Low] [money] NULL,
[Last] [money] NULL,
[Volume] [float] NULL
)

Then I import the data for the last couple of years from http://www.euroinvestor.dk/. Now we have a realistic dataset.

Let’s look at the closing price for the last 100 days like this:

SELECT top 100 StockID, Name, Date, last
FROM dbo.Stock
ORDER BY Date Desc

This yield the result

Using the window functions prior to the 2012 version, we could use the window functions like this

Giving this result set

You can use these values to do more intensive analytics etc.

Of course, the aggregate functions could also be achieved with a subselect, or a join to the same SQL statement grouped on the same as it is partitioned by in window functions, but this would for obvious reasons perform worse and yield a more complicated SQL.

The aggregate functions also provide the same as what will be performed by analysis services, when you are build a data mart with aggregated values.

The Row number is often used in data collection scenarios where you want to filter out duplicates.

The 2012 version

So, that is how window functions worked prior to the 2012 edition. So lets look at some of the things they have improved. First of all, the syntax has been extended as shown below:

OVER (
[ <PARTITION BY clause> ]
[ <ORDER BY clause> ]
[ <ROW or RANGE clause> ]
)

<PARTITION BY clause> ::=
PARTITION BY value_expression , … [ n ]

<ORDER BY clause> ::=
ORDER BY order_by_expression
[ COLLATE collation_name ]
[ ASC | DESC ]
[ ,…n ]

<ROW or RANGE clause> ::=
{ ROWS | RANGE } <window frame extent>

<window frame extent> ::=
{ <window frame preceding>
| <window frame between>
}

<window frame between> ::=
BETWEEN <window frame bound> AND <window frame bound>

<window frame bound> ::=
{ <window frame preceding>
| <window frame following>
}

<window frame preceding> ::=
{
UNBOUNDED PRECEDING
| <unsigned_value_specification> PRECEDING
| CURRENT ROW
}

<window frame following> ::=
{
UNBOUNDED FOLLOWING
| <unsigned_value_specification> FOLLOWING
| CURRENT ROW
}

<unsigned value specification> ::=
{ <unsigned integer literal> }

As you may have noticed, one of the short comings prior to the 2012 version is that you can’t add an ‘order by’ to the aggregate functions, hence you can’t calculate a running AVG or SUM. That has been changed in the new version. Let’s try to add an order by to the previous SQL. And let’s just delete the collumns that makes no sense. So the SQL will look like so:

Then we have a running AVG in each month.

All of my examples probably don’t make much sense to a stock broker. I’m only looking at the SQL functions, and not the data. One thing I do know, is however, that the running AVG won’t make much sense, starting all over from the first each month. Luckily there has also been added a sliding window function, so we can look at the previous/next n rows when doing our calculations.

This can now be done like so.

Where the AVG now is the AVG of the current row, and the 2 previous rows. The extension of the ORDER BY, now overrides the default partitioning.  The result set:

Conclusion

I have looked into some of the new window functions, and it is safe to say, that they have been significantly improved, and can now be used to do more advanced analytics in a simpler way.
A new addition to the window functions is also some new analytic functions that can be used. This is: CUME_DIST, LEAD, FIRST_VALUE, PERCENTILE_CONT, LAG, PERCENTILE_DISC, LAST_VALUE, PERCENT_RANK.

I’ll cover those in the next blog post.

 

Unboxing MS SQL Server 2012 – Installing the database server and the Adventure works databases

The new MS SQL server version 2012 was released not so long ago. With this article series, I will log my first experiences with it, regarding installation and setup, new features, look and feel etc.

First things first. Let’s install it. If you don’t have a developer edition in you company, download a trial version from MS’s website.

Just download the ISO, and install it from there.

Figure 1 – download it

Figure 1 – download it

 

Figure 2 - Double click setup.exe

Figure 2 - Double click setup.exe

 

Figure 3 – installation process looks the same as previous version -> Clik New SQL server stand alone installation

Figure 3 – installation process looks the same as previous version -> Clik New SQL server stand alone installation

 

Figure 4 – If you downloaded the Trial version as me, select the evaluation version

Figure 4 – If you downloaded the Trial version as me, select the evaluation version

 

Figure 5 – Let’s just select SQL server feature installation

Figure 5 – Let’s just select SQL server feature installation

 

After this, just click through with the default setting and start the installation.

After the installation, you can log on to the database server with SQL Server Management Studio (SSMS). Not much to see now, except from a more pretty interface, that looks as though it is part of Visual Studio 2010.

All development is now done visual studio 2010, and there are a lot of improvements, but more of that in a later article. Let’s first get some databases to work with. Let’s Install the adventureworks databases.

Apparently in 2012, there is no install file to install the databases. Instead you can download the mdf files separately from codeplex, and attach them. Do It here: http://msftdbprodsamples.codeplex.com/

Let’s just start with the two relational databases.

Figure 6 – Right click on databases->Select attach database, and select the downloaded mdf files. The OLTP  and the DWH databases.

Figure 6 – Right click on databases->Select attach database, and select the downloaded mdf files. The OLTP and the DWH databases.

 

We are now ready to explore the new SQL features. We’ll start with that in the next blog post.