Claus on Code

A data dudes random musings about code, bugs, products, life etc.


Partitioning revisited

When building big data warehouses(DWH), partitioning tables is something you want to consider.

There is some good examples/explanations in BOL, but with this blog post I just want to give som easy hands on examples, so you can get started right away.

What
Partitioning tables is splitting up the data in a table, so the data is stored sepearatly, and can be managed seperatly. The SQL user can use the table without knowing the table is partitioned.

Why
Why do we want to partition big tables in for instance an enterprise DWH?
Well some benefits are listed on the SQL CAT(See resources) page, and they say it all

  • Partitioning can provide manageability and often performance benefits.
  • Faster, more granular index maintenance.
  • More flexible backup / restore options.
  • Faster data loading and deleting

How
Partitions can be managed both via management studio and with raw SQL. It is actually much easier to understand when using SQL, so I will only use SQL in this blog post.

Lets just start completely from scratch, by creating the partition function. It is simply used for defining the ranges in the partitions.

CREATE PARTITION FUNCTION myRangePF1 (datetime)
AS RANGE LEFT FOR VALUES ('2013.01.01', '2013.02.01', '2013.04.01') ;

After that, we want to create the partition scheme, which is used for mapping the partitions to a filegroup. For the case of simplicity we just use the primary filegroup In this example , but you don’t want to do that in production.

CREATE PARTITION SCHEME myRangePS1
AS PARTITION myRangePF1
TO ([PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY]) ;

Next, let’s create som partitioned tables

CREATE TABLE FactSomething (DW_TS DATETIME, factValue int)
ON myRangePS1 (DW_TS) ;

--Create an archieve table, where we want to store old values
CREATE TABLE FactSomethingArchive (DW_TS DATETIME, factValue int)
ON myRangePS1 (DW_TS) ;

Then let us insert some test data, so we have some data to work with.

INSERT INTO [dbo].[FactSomething] ([DW_TS] ,[factValue])
VALUES ('2010.01.01',1)

INSERT INTO [dbo].[FactSomething] ([DW_TS] ,[factValue])
VALUES ('2010.01.01',1)

INSERT INTO [dbo].[FactSomething] ([DW_TS] ,[factValue])
VALUES ('2010.01.01',1)

INSERT INTO [dbo].[FactSomething] ([DW_TS] ,[factValue])
VALUES ('2013.01.01',1)

INSERT INTO [dbo].[FactSomething] ([DW_TS] ,[factValue])
VALUES ('2013.02.02',1)

INSERT INTO [dbo].[FactSomething] ([DW_TS] ,[factValue])
VALUES ('2014.01.01',1)

You can use this select stm. to get all the necessary information about the partitions:

--Long SQL with all the information regarding the partition table
SELECT t.name AS TableName,p.rows,
p.partition_number,
p.partition_id,
f.function_id,
f.type_desc,
r.boundary_id,
r.value AS BoundaryValue
FROM sys.tables AS t
JOIN sys.indexes AS i
ON t.object_id = i.object_id
JOIN sys.partitions AS p
ON i.object_id = p.object_id AND i.index_id = p.index_id
JOIN sys.partition_schemes AS s
ON i.data_space_id = s.data_space_id
JOIN sys.partition_functions AS f
ON s.function_id = f.function_id
LEFT JOIN sys.partition_range_values AS r
ON f.function_id = r.function_id and r.boundary_id = p.partition_number
WHERE t.name = 'FactSomething' -- AND i.type <= 1
ORDER BY p.partition_number

Which in our case will show the following result.

Where the rows column show the number of rows in the partitions. Lets say we want to archive the oldest partition. Then we just switch the partition like so

ALTER TABLE dbo.FactSomething
SWITCH PARTITION 1
TO dbo.FactSomethingArchive PARTITION 1

If we run the partition info SQL again, we now get the following result

Where you can see, that we now have 0 rows in the first partition. If you do a select * from the archieve table, we will now have 4 rows there.

Lets say, that we don’t want to have the old partitions in our fact table anymore, we just run this merge stm.

ALTER PARTITION FUNCTION myRangePF1()
MERGE RANGE ('2013-01-01')

partition info SQL will now return only 3 rows as shown below.

To sum up 🙂

I have shown how to create partitioned tables, and how to use them. Next step will be managing there indexes, and how to make sliding window partitioning. Maybe there will be a post about those subjects here later 🙂

Resource

http://sqlcat.com/sqlcat/b/top10lists/archive/2008/02/06/top-10-best-practices-for-building-a-large-scale-relational-data-warehouse.aspx

http://msdn.microsoft.com/en-us/library/ms190787.aspx



One response to “Partitioning revisited”

  1. Why does the partition cable have to be considered?

Leave a Reply

Your email address will not be published. Required fields are marked *