Claus on Code

A data dudes random musings about code, bugs, products, life etc.


Unboxing MS SQL Server 2012 – The new analytic functions

As mentioned in my previous blot post, SQL server 2012 comes with 8 new analytic functions:

  • LEAD
  • LAG
  • LAST_VALUE
  • FIRST_VALUE
  • PERCENTILE_CONT
  • PERCENTILE_DISC
  • PERCENT_RANK
  • CUME_DIST

All the new analytic functions work with the window functions I described earlier in this blog series, and calculate an aggregate in the given window.
Let’s start by looking at the LAG and LEAD functions. The syntax looks like this

LAG/LEAD (scalar_expression [,offset] [,default])
OVER ( [ partition_by_clause ] order_by_clause )

Let’s add them to the query and dataset I introduced earlier:


SELECT TOP 10
[Date] ,
[Last] ,
LEAD([Last],1) OVER
(PARTITION  BY YEAR(Date), MONTH(Date), Name ORDER BY Date DESC ) AS 'Lead 1' ,
LAG([Last],1) OVER
(PARTITION  BY YEAR(Date), MONTH(Date), Name ORDER BY Date DESC ) AS 'Lag 1' ,
LAG([Last],2) OVER
(PARTITION  BY YEAR(Date), MONTH(Date), Name ORDER BY Date DESC ) AS 'Lag 2'
FROM dbo.Stock ORDER BY DATE DESC

Which yield the result:


Date                    Last                  Lead 1                Lag 1                 Lag 2

----------------------- --------------------- --------------------- --------------------- ---------------------

2012-04-19 00:00:00.000 51,45                 51,65                 NULL                  NULL

2012-04-18 00:00:00.000 51,65                 53,35                 51,45                 NULL

2012-04-17 00:00:00.000 53,35                 55,55                 51,65                 51,45

2012-04-16 00:00:00.000 55,55                 48,95                 53,35                 51,65

2012-04-13 00:00:00.000 48,95                 50,55                 55,55                 53,35

2012-04-12 00:00:00.000 50,55                 50,65                 48,95                 55,55

2012-04-11 00:00:00.000 50,65                 50,20                 50,55                 48,95

2012-04-10 00:00:00.000 50,20                 52,50                 50,65                 50,55

2012-04-09 00:00:00.000 52,50                 52,50                 50,20                 50,65

2012-04-06 00:00:00.000 52,50                 52,50                 52,50                 50,20

The LAG and LEAD functions are pretty self-explaining, but it is a feature that I often need in my work, and this new way of getting those values are a great addition to the SQL Server.

The FIRST/LAST_Value works pretty much the same way, and are just as self-explaining. The syntax looks like this

FIRST/LAST_VALUE ( [scalar_expression )     OVER ( [ partition_by_clause ] order_by_clause rows_range_clause )

Pay attention to the rows_range clause, if you want to make sure that the first/last row function doesn’t stop at the current row. Let’s look at the SQL:

SELECT TOP 50
[Date] ,
[Last] ,
FIRST_VALUE([Last]) OVER
        (PARTITION  BY YEAR(Date), MONTH(Date), Name ORDER BY Date ASC )
        AS 'First value in month' ,
LAST_VALUE([Last]) OVER
        (PARTITION  BY YEAR(Date), MONTH(Date), Name ORDER BY Date ASC
        RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING )
        AS 'Last Value in month'
FROM dbo.Stock
ORDER BY DATE DESC

Result:

Date                    Last                  First value in month  Last Value in month

----------------------- --------------------- --------------------- ---------------------

2012-04-19 00:00:00.000 51,45                 55,30                 51,45

2012-04-18 00:00:00.000 51,65                 55,30                 51,45

2012-04-17 00:00:00.000 53,35                 55,30                 51,45

2012-04-16 00:00:00.000 55,55                 55,30                 51,45

2012-04-13 00:00:00.000 48,95                 55,30                 51,45

2012-04-12 00:00:00.000 50,55                 55,30                 51,45

2012-04-11 00:00:00.000 50,65                 55,30                 51,45

2012-04-10 00:00:00.000 50,20                 55,30                 51,45

2012-04-09 00:00:00.000 52,50                 55,30                 51,45

2012-04-06 00:00:00.000 52,50                 55,30                 51,45

2012-04-05 00:00:00.000 52,50                 55,30                 51,45

2012-04-04 00:00:00.000 52,50                 55,30                 51,45

2012-04-03 00:00:00.000 54,95                 55,30                 51,45

2012-04-02 00:00:00.000 55,30                 55,30                 51,45

2012-03-30 00:00:00.000 56,60                 56,45                 56,60

2012-03-29 00:00:00.000 55,60                 56,45                 56,60

2012-03-28 00:00:00.000 58,20                 56,45                 56,60

2012-03-27 00:00:00.000 58,15                 56,45                 56,60

2012-03-26 00:00:00.000 60,95                 56,45                 56,60

2012-03-23 00:00:00.000 60,00                 56,45                 56,60

2012-03-22 00:00:00.000 58,25                 56,45                 56,60

2012-03-21 00:00:00.000 58,80                 56,45                 56,60

2012-03-20 00:00:00.000 59,30                 56,45                 56,60

2012-03-19 00:00:00.000 62,40                 56,45                 56,60

2012-03-16 00:00:00.000 63,95                 56,45                 56,60

The last 4 functions PERCENTILE_CONT, PERCENTILE_DISC, PERCENT_RANK, CUME_DIST are Window Distribution Functions used to do statistical computations. Coming up with an example about how to use those will take too long, but the syntax is similar to the other functions, and if you know the math behind, they are straightforward to use.

Conclusion

But to sum up. The conclusion is the same as in my previous blog post in this series. The new analytic functions a very nice addition to the new SQL server, and we can now do more advanced analytics in a much simpler way.



Leave a Reply

Your email address will not be published. Required fields are marked *