My GOTO program

I have not completely decided on my GOTO program yet, but here is a list of some of the sessions I’m looking forward to 🙂  Some of them are at the same time, so I really hope that they will be available on VOD afterwards.

Don’t be religious about the Internet of Things!
As an owner of a BI company, I have noticed that all of my competitors are talking about the internet of things, and yet, I still do not know what it is 🙂 So yes, I’m looking very much forward to this session, where I’m hoping to get at good introduction to the subject.

I have a feeling that it is somehow, someones view on big data, but from another perspective. I have tried to take a look at the comic book by The Alexandra Institute, but I’m still not completely sure about what it is.

The “Making sense of data” track
Well, the job description of every business intelligence developer, is properly to “make sense of data”, so I’m guessing this is a must 🙂

The track “Open Data / eGov“
A part of big data (In my line of work, it is important to say big data a lot these days), is to enrich a companies data warehouses with all kinds of market data. So all information about how to do that is welcome.

The NoSQL, MySQL and Big data track(There is that word again)
There is a trend at the current time, that make developers use no SQL databases. As a consumer of the data the developers applications create, I have to know there mindset behind this, so sessions about this, is also of big interest to me.

Well I guess that is it. Else I’ll just go with the flow, and get inspired by the people I talk to at the conference.

Attending GOTO as a Business intelligence developer

Back in my .NET developer days, like 100 years or so ago, I went to my first GOTO conference(Or JAOO as it was called back then). Back then, I loved .NET and anything that related to it, but that wasn’t the reason I went to GOTO.

If I wanted to know more about the changes from 1.0 to 1.1, it was proberly a MS conference I should have went to.

The reason for attending to GOTO, was to get inspired. Get inspired by all the different tracks, learning about what was going on with all the other languages and trends.

So, fast forward approx. a 100 year and a technology lane shift to a BI developer. Why should I go now??? Well, the same reason. If I want to know more about how the new indexes works in MS SQL server, or for instance the new in memory technologies from MS, other conferences offer that.

But my world changes fast, and a lot of new technologies are emerging, and even as a BI developer, GOTO still offers a lot of inspiration. All the next big things in my line of work, like Big data, the internet of things, no sql, open data, and a lot of other things, all has a track at GOTO.

BTW, If I haven’t mentioned it, I’m attending GOTO this year, and if you are like me, I guess I’ll see you there 🙂

Error code: 0x80040E21 Reading from SSAS with SSIS

So, I upgraded a big project for a customer to 2012 from 2008R2. Everything worked fine, except the last part, where some datamarts was loaded with SSIS from SSAS. They had worked with no problem before, but suddenly, I received the above error code. It made no sense to me, and it was hard to find out, what precisely had happen. The worst, was, that I was able to see the preview in the datasource, but not run the package.

A workaround was to create a linked server to the cube, and then just select from that like so

SELECT * FROM openquery(<ServerName>, ‘<MDX>’)

And that was, properly they way I would have implemented it myself. But having hundreds of packages, rewriting them, was not the way to go.
Instead, I found an old post on google, where a person had the same problem, in an older SQL server. The solution was, to open the connection, click the all botton, and type in “(Format=Tabular)” in the extended properties field. Se the image bellow.

So, yeah…., YIEAKS. But hey, if you find yourself having the same problem, you now know what the solution is 🙂

Partitioning revisited

When building big data warehouses(DWH), partitioning tables is something you want to consider.

There is some good examples/explanations in BOL, but with this blog post I just want to give som easy hands on examples, so you can get started right away.

What
Partitioning tables is splitting up the data in a table, so the data is stored sepearatly, and can be managed seperatly. The SQL user can use the table without knowing the table is partitioned.

Why
Why do we want to partition big tables in for instance an enterprise DWH?
Well some benefits are listed on the SQL CAT(See resources) page, and they say it all

  • Partitioning can provide manageability and often performance benefits.
  • Faster, more granular index maintenance.
  • More flexible backup / restore options.
  • Faster data loading and deleting

How
Partitions can be managed both via management studio and with raw SQL. It is actually much easier to understand when using SQL, so I will only use SQL in this blog post.

Lets just start completely from scratch, by creating the partition function. It is simply used for defining the ranges in the partitions.

CREATE PARTITION FUNCTION myRangePF1 (datetime)
AS RANGE LEFT FOR VALUES ('2013.01.01', '2013.02.01', '2013.04.01') ;

After that, we want to create the partition scheme, which is used for mapping the partitions to a filegroup. For the case of simplicity we just use the primary filegroup In this example , but you don’t want to do that in production.

CREATE PARTITION SCHEME myRangePS1
AS PARTITION myRangePF1
TO ([PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY]) ;

Next, let’s create som partitioned tables

CREATE TABLE FactSomething (DW_TS DATETIME, factValue int)
ON myRangePS1 (DW_TS) ;

--Create an archieve table, where we want to store old values
CREATE TABLE FactSomethingArchive (DW_TS DATETIME, factValue int)
ON myRangePS1 (DW_TS) ;

Then let us insert some test data, so we have some data to work with.

INSERT INTO [dbo].[FactSomething] ([DW_TS] ,[factValue])
VALUES ('2010.01.01',1)

INSERT INTO [dbo].[FactSomething] ([DW_TS] ,[factValue])
VALUES ('2010.01.01',1)

INSERT INTO [dbo].[FactSomething] ([DW_TS] ,[factValue])
VALUES ('2010.01.01',1)

INSERT INTO [dbo].[FactSomething] ([DW_TS] ,[factValue])
VALUES ('2013.01.01',1)

INSERT INTO [dbo].[FactSomething] ([DW_TS] ,[factValue])
VALUES ('2013.02.02',1)

INSERT INTO [dbo].[FactSomething] ([DW_TS] ,[factValue])
VALUES ('2014.01.01',1)

You can use this select stm. to get all the necessary information about the partitions:

--Long SQL with all the information regarding the partition table
SELECT t.name AS TableName,p.rows,
p.partition_number,
p.partition_id,
f.function_id,
f.type_desc,
r.boundary_id,
r.value AS BoundaryValue
FROM sys.tables AS t
JOIN sys.indexes AS i
ON t.object_id = i.object_id
JOIN sys.partitions AS p
ON i.object_id = p.object_id AND i.index_id = p.index_id
JOIN sys.partition_schemes AS s
ON i.data_space_id = s.data_space_id
JOIN sys.partition_functions AS f
ON s.function_id = f.function_id
LEFT JOIN sys.partition_range_values AS r
ON f.function_id = r.function_id and r.boundary_id = p.partition_number
WHERE t.name = 'FactSomething' -- AND i.type <= 1
ORDER BY p.partition_number

Which in our case will show the following result.

Where the rows column show the number of rows in the partitions. Lets say we want to archive the oldest partition. Then we just switch the partition like so

ALTER TABLE dbo.FactSomething
SWITCH PARTITION 1
TO dbo.FactSomethingArchive PARTITION 1

If we run the partition info SQL again, we now get the following result

Where you can see, that we now have 0 rows in the first partition. If you do a select * from the archieve table, we will now have 4 rows there.

Lets say, that we don’t want to have the old partitions in our fact table anymore, we just run this merge stm.

ALTER PARTITION FUNCTION myRangePF1()
MERGE RANGE ('2013-01-01')

partition info SQL will now return only 3 rows as shown below.

To sum up 🙂

I have shown how to create partitioned tables, and how to use them. Next step will be managing there indexes, and how to make sliding window partitioning. Maybe there will be a post about those subjects here later 🙂

Resource

http://sqlcat.com/sqlcat/b/top10lists/archive/2008/02/06/top-10-best-practices-for-building-a-large-scale-relational-data-warehouse.aspx

http://msdn.microsoft.com/en-us/library/ms190787.aspx

SQL TIP-> Display seconds as HH:MM:SS

In many BI application you have stored a duration as seconds. To display that on the format HH:MM:SS just use the following SQL(Replace 100 with your column name):

SELECT
convert(char(8),dateadd(second,100,0),108) as [Duration]

Wich yield the result

00:01:40

As in my previous blogpost where I did the same in analysis services, this only works if the duration is below 24 hours, else you have to do some additional logic.

Fun with SQL: Joining on column name and row field value

Ok, it is that time of the day where I’m starting to get high on caffeine 🙂

So this is the scenario. I have an application that is automatically generating a table in my database. It can add columns on the fly(But thankfully not delete).

It could look like this:

CREATE TABLE Cube.AppGenereatedTable
(
x1value float,
x2value float,
ID_Region INT
)

We can now query information about that table like so:

select column_name,ordinal_position,DATA_TYPE , *  from information_schema.columns
where table_name = 'AppGenereatedTable'
AND TABLE_SCHEMA='cube'

The problem is, that right now, I do not know what to do with that data, and when I’m not at the customer anymore, more fields can be added by the application. What I would like, is a place where the customer can add more information about that column, and decide if that column should be included in some logic. To do that, I add a “metadata” table.

CREATE TABLE Dim.SomeMetadata
(
ID_SomeMetadata INT,
ColName nvarchar(100),
friendlyName nvarchar(100),
includeInETLLogig bit
)

Now I want to put it all in a fact table, where I have 1 row / column value. It could look like this:

CREATE TABLE Fact.SomeFactTable
(
Value float,
ID_Metadata INT
)

Ehhh, Ok, so know I will do what the title is saying. To load the data, I need to create the sql automaticly like so:

--Use the colnr to reference all collumns
DECLARE @colnr AS INT
DECLARE @sql AS NVARCHAR(MAX)
SET @colnr = 1
--Do for as many collumns you expect there ever will be. Do some automation here if you like
WHILE (@colnr < 50)
BEGIN
SET  @sql =
'INSERT INTO Fact.SomeFactTable
(Value, ID_Metadata)

--And now we do the SQL with the join. Pay attention 

SELECT '+(select column_name from information_schema.columns
where table_name = 'AppGenereatedTable'
AND TABLE_SCHEMA='cube'
AND ordinal_position=@colnr AND DATA_TYPE = 'float')+' as value
,someMetaData.ID_SomeMetadata
FROM Cube.AppGenereatedTable appGenereatedTable
INNER JOIN Dim.SomeMetadata someMetaData on someMetaData.ColName='''+(select
column_name from information_schema.columns
where table_name = 'AppGenereatedTable'
AND TABLE_SCHEMA='cube'
AND ordinal_position=@colnr AND DATA_TYPE = 'float')+''''

print @colnr
print @sql
BEGIN TRY
exec( @sql )
END TRY
BEGIN CATCH
--Catch error. Do nothing
END CATCH
SET @colnr = @colnr + 1
END

Well, I don’t know if there is other than me, who can use this for something. Next post will be about precisely about what I’m using this to 🙂

SQL Convert datetime to Salesforce date format

If anyone, as I, suffers the pain of generating SOQL to salesforce, here’s a little tip 🙂 If you have a datetime field, which you have to use in the SOQL, then convert it like so:

SELECT CONVERT(NVARCHAR,GETDATE(),126)

Then you get it in the format

YYYY-MM-DDTHH:MM:SS.XXX

As salesforce wants it 🙂

SSAS Tip -> Format seconds as HHHH:MM:SS

A common way of showing durations in SSAS, is to format them as HH:MM:SS. A common scenario is to store the durations in seconds. To display them in the HH:MM:SS format, first we have to convert the duration to hours, and then put on the format, like so

CREATE MEMBER CURRENTCUBE.[MEASURES].[someMeasure] AS

[Measures].[someMesureInSeconds]/86400,FORMAT_STRING = “HH:MM:SS”,…

However, this only works if the duration is less than 24 hours.

If it is more, then there is no out the box way to show it in the correct format. To get around that problem, you have to create a mearure as a string. First calc the hours, and then add the minutes and seconds formated as above, like so:

CREATE MEMBER CURRENTCUBE.[Measures].[MesureX]
AS
CSTR(INT([Measures].[LongDurationInSec]/(60*60))) + “:”
+ FORMAT(CDATE([Measures].[LongDurationInSec]/86400
– INT([Measures].[LongDurationInSec]/86400)), “mm:ss”)
FORMAT_STRING = “[HH]:MM:SS”, …

SQL TIP ->Other uses of the SQL GO CMD

As most developers know, the SQL GO CMD signals the end of a batch of SQL statements.

However, another use of it, is to use it to execute a SQL statement multiple times. Just add a number after it. For instance:

SELECT GETDATE() as now
GO 5

Which will execute the Select statement 5 times.
Or, something maybe more useful, create a table with integers from 1-5:

CREATE TABLE #numbers(number int)
GO --> Execute the first batch
INSERT INTO #numbers
VALUES((SELECT COALESCE(MAX(number),0) FROM #numbers) +1)
GO 5-->Do 5 times
SELECT * FROM #numbers

Which will give you the result:

Batch execution completed 5 times.
number
-----------
1
2
3
4
5
(5 row(s) affected)

Well, that was it. Another use of the go cmd. Have fun 🙂

SQL TIP -> Recursive CTEs

This morning I  had to do some recursive common table experessions. It is acctually relativly simple to do.

A common scenario:

We have a parent child relation in a selfreferencing table. That is common when we have an employee manager relationship or organisation relationship.

Let’s just look at  the  employee manager relationship.

Let’s create an example tmp table like so

CREATE TABLE #Employee (
EmployeeID  INT,
EmployeeName NVARCHAR(50),
ManagerID INT
)

And let’s insert some test data into the table

INSERT INTO #Employee(EmployeeID, EmployeeName,ManagerID) VALUES (1,'Mr. Manager',NULL)
INSERT INTO #Employee(EmployeeID, EmployeeName,ManagerID) VALUES (2,'Jensen',1)
INSERT INTO #Employee(EmployeeID, EmployeeName,ManagerID) VALUES (3,'Hansen',1)
INSERT INTO #Employee(EmployeeID, EmployeeName,ManagerID) VALUES (4,'Clausen',2)

Where, as you can see,  Mr. Manager is at the top, and Clausen is at the bottom.

We now want to use as recursive common table expression(CTE) to show what level a person is on, in the employee hierarchy.

To do that, we start with the highest level(Where managerID is Null). Then we do a union all with a select which is referencing it self, like so:

WITH EmployeeCTE (EmployeeID,EmployeeName, ManagerID , Level) AS
(
SELECT EmployeeID, EmployeeName,ManagerID, 0 AS Level
FROM #Employee WHERE
ManagerID IS NULL
UNION ALL
SELECT e.EmployeeID, e.EmployeeName,e.ManagerID, Level+1
FROM #Employee   e
INNER JOIN EmployeeCTE  cte ON cte.EmployeeID=e.ManagerID
)
SELECT * FROM EmployeeCTE

And the result can be seen below. This SQL shows what level an employee is on.

EmployeeID  EmployeeName ManagerID   Level
1           Mr. Manager NULL        0
2           Jensen      1           1
3           Hansen      1           1
4           Clausen     2           2

If we want to see an employees way to the top, we can select from the same CTE like so:

SELECT  lvl0.EmployeeName as lvl0Name,  lvl1.EmployeeName AS lvl1Name,      lvl2.EmployeeName AS lvl2Name FROM EmployeeCTE lvl2
INNER JOIN EmployeeCTE lvl1 ON lvl1.EmployeeID=lvl2.ManagerID
INNER JOIN EmployeeCTE lvl0 ON lvl0.EmployeeID=lvl1.ManagerID
WHERE lvl2.Level=2

And the result:

lvl0Name     lvl1Name     lvl2Name
Mr. Manager Jensen        Clausen

Now you know how to do recursive CTE 🙂

Have fun