Archive

Posts Tagged ‘TSQL’

T-SQL: Using common table expressions (CTE) to generate dates

January 11th, 2010 Kasper de Jonge No comments

This is something I have used many times, and always had to look up on internet. I’m designing a new datawarehouse and have to populate my date dimension. In SQL server 2008 we have common table expressions (CTE),  We can use it to run a sequence from to a date. This blog post will explain how you can generate dates: http://smehrozalam.wordpress.com/2009/06/09/t-sql-using-common-table-expressions-cte-to-generate-sequences/

You can set you regional date / time by using the Set language dutch (where dutch is your regional setting).

Categories: SQL Server Tags: ,

How to trace the execution of a stored procedure?

August 27th, 2009 Kasper de Jonge No comments

I was at a client and they had some performance issues calling stored procedures from code. They were unable to find the sql statementfor responsible the performance degrade.

You can use the SQL profiler to show you the TSql statement within your stored procedure and their trace properties, to do this go to trace properties, check Show all events and you will get the possibility to check 2 events that will look inside the SP’s and show you alle the Tsql statements.

trace

Categories: SQL Server Tags: ,

Remove duplicates from table using SQL Rank() function

August 27th, 2009 Kasper de Jonge No comments

Having to de-duplicate rows from source tables is a very common task in the data integration world, we were trying to do all different of Tsql to filter out the duplicate values. We needed to pick the row based upon a certain criteria. On the internet i found the TSQL function Rank()

It works like this:

No matter where the source data is stored, as a part of the ETL workflow, we will stage source data in a SQL Server 2005 table.

This is the staging table:

create table SourceWithDuplicates
(SurrogateKey int identity(1,1),
Code varchar(12),
Description varchar(20),
OtherAttribute1 int,
OtherAttribute2 Varchar(10),
TieBreakerColumn Varchar(12))

As a general rule I always add a surrogate key column to my staging tables (that makes some DBAs happy); an in this case it would actually help in the de-duplication process.

Some sample data:

Insert into SourceWithDuplicates values(‘Abc’, ‘Row Description 1′, 1, ‘Other1′,’1′)
Insert into SourceWithDuplicates values(‘Abc’, ‘Row Description 1′, 1, ‘Other1′, ‘1′)
Insert into SourceWithDuplicates values(‘Abc’, ‘Row Description 1′, 1, ‘Other2′,’2′)\
Insert into SourceWithDuplicates values(‘Def’, ‘Row Description 2′, 1, ‘Other4′,’a')
Insert into SourceWithDuplicates values(‘Def’, ‘Row Description 2′, 1, ‘Other5′,’a')
Insert into SourceWithDuplicates values(‘Ghi’, ‘Row Description 3′, 1, ‘Other5′,’a')
Insert into SourceWithDuplicates values(‘Jkl’, ‘Row Description 4′, 1, ‘Other5′,’a')

The requirements:

Code and description values need to be unique in the target table; in case they are duplicates, choose the row with the highest value in the tie breaker column. If the value in the tiebreaker column is also duplicated; then choose any of the rows.

So the data looks like:

SurrogateKey Code Description OtherAttribute1 OtherAttribute2 TieBreaker ———— ———— ——————– ————— ————— ———-
1 Abc Row Description 1 1 Other1 1
2 Abc Row Description 1 1 Other1 1
3 Abc Row Description 1 1 Other2 2
4 Def Row Description 2 1 Other4 a
5 Def Row Description 2 1 Other5 a
6 Ghi Row Description 3 1 Other5 a
7 Jkl Row Description 4 1 Other5 a

Following the requirements, only 4 rows kept should be chosen; those with surrogate keys:2 3, 4 or 5, 6 and 7
This is the query using the Rank() function that resolve our issue:

Select *,
rank() over (Partition by Code, description order by TieBreakerColumn desc, SurrogateKey) MyRank
from SourceWithDuplicates

and this is its output:

Surrogatekey Code Description MyRank
———— ———— ——————– ——————–
3 Abc Row Description 1 1
1 Abc Row Description 1 2
2 Abc Row Description 1 3
4 Def Row Description 2 1
5 Def Row Description 2 2
6 Ghi Row Description 3 1
7 Jkl Row Description 4 1

Notice that if we filter the result from the previous query using WHERE MyRank=1 we will get the rows to be kept.

Very nice ! I  found this at:

http://rafael-salas.blogspot.com/2007/04/remove-duplicates-using-t-sql-rank.html

Categories: SQL Server Tags: ,