Archive

Posts Tagged ‘SSIS’

Developing SSIS, Visual studio not responsive

September 28th, 2009 Kasper de Jonge No comments

Last week i was with a customer who had trouble developing his package because visual studio was very slow responding. This happens because SSIS Designer tries to connect to each data source that is used by your package to validate the metadata associated with sources and destinations. This causes validation errors when the data sources are not available or slow to respond when there are a lot of query’s (multiple dataflows) or big query’s.

I made a blog post a while ago that will make sure that the validation is postponed or not done at all:
Open SSIS packages without validation using the properties
. This will speed things up a lot.

Categories: SSIS Tags:

Optimizing SSIS Package Performance

September 28th, 2009 Kasper de Jonge No comments

Duane Douglas (also of the great twitter aggregator http://twitter.com/SSISBI) had made two greatblog posts about “Optimizing SSIS Package Performance”:

Needless to say, SSIS packages should run as fast as possible.  However, accomplishing this is easier said than done.  Performance tuning SSIS packages isn’t for the faint of heart — it involves a lot of work and and a low-level understanding of what happens under-the-hood when a package is executing.

Very much worth the read:

Optimizing SSIS Package Performance – Part 1
Optimizing SSIS Package Performance – Part 2: The Data Flow Components

And while you’re at it the Top 10 SQL Server Integration Services Best Practices could gain you another great insight under the hood of SSIS.

Categories: SSIS Tags:

How we did it: Near real time BI with SSIS, SSRS, ASP .NET 3.5 hosted in the Cloud

September 2nd, 2009 Kasper de Jonge No comments

We recently released the first version of an operational, near real time, BI platform for monitoring packages in a package sorter for a dutch company. This blog post will describe how we solved their problem using Reporting Services, Integration Services, ASP .Net, hosted in the cloud using Amazon Elastic Compute Cloud.

The scenario

There are 4 sorting centers in the Netherlands where parcels are collected from the region where it is in. Parcels receives a unique code (Bar code) on collection.  The sorting centre then has to determine by postal code the destination of a parcel, parcels are eventually delivered by regional delivery stations. The Parcels are grouped by the sorting centre and during the night shipped to the delivery stations, the delivery station that is the farthest away gets its parcels first. When a parcel destination is closer then another sorting center it is send to that sorting center first, so parcel not within the range of the sorting center are coarsely sorted by region and parcels that are in range are finely sorted by street. Parcels are loaded upon a assembly line of a sorter machine in the sorting center, the postal code is scanned and translated to a digital postal code that will be attached to the bar code. The sorter then determines what the destination assembly line is, called chutes.

The current information is brought by reports full of numbers that are hard to interpret and an excel sheet full of VBA code that is mailed around the organisation. Each sorting centers hosts it’s own server running Reporting services 2000 and a Visual Basic service.

The challenge

The challenge the client had for us was:

  • Consolidate data from different sources on one report
  • Replace local machines with one central machine
  • Bring the right data to the right people, data from sorting center to a manager, data from a team to a team lead
  • Show the data as soon as it is available, near real time BI
  • The data has to be extendable with data from other sources
  • New ad-hoc reports have to be created in a short development cycle

How we did it: Project Startup

The project started with some brown paper session with different users from the organisation.  In these sessions we determined the essential data needed to manage the sorting centers. Focus of the first release was replacing the current reports and excel sheet with the new reporting  environment. The data will be delivered to our system by text files, every minute new files will be created and have to be imported and transformed by the application, estimate is 4 files per minute.

After designing reports for the different user groups, determining the MoSCow and analyzing the different input files we decided for the following architecture:

  • Loading and transforming data with SSIS
  • Database with SQL Server 2008, data in report tables with different aggregation levels, some data is needed at the lowest grain, but most data is over a larger time dimension.
  • Reports with Reporting Services 2008
  • Reporting portal with ASP.Net 3.5, reportviewer control and ASP.Net dynamic data for master data tables

While brainstorming with the client it appeared they would like to host the system outside their own infrastructure, we already had some experience with hosting at the Amazon Elastic Compute Cloud and decided to try to determine the feasibility of using Amazon for our bi system. Since the data is delivered in small files every minute a solution was quickly found. Servers hosted in the cloud files can fetch files as easily as local servers.  The server in the cloud fetches data using SFTP. The Amazon cloud is very flexible that let you quickly scale capacity, both up and down, as your computing requirements change.

Implementation

Below is a graphical representation of the architecture used:

how we did it

The application consists of the following steps:

  • Data from the system is enriched with describing master data, this describing data is stored in a relational datamodel that can be maintained by pages that are generated by ASP.Net dynamic data
  • Files are transferred to the system by a SSIS package that calls a SFTP application that will move (copy and remove) the files to the BI server, this package is scheduled every minute.
  • Files are loaded by two SSIS packages per filetype. One package fetches all files currently in a configured directory using a for loop that traverses the directory. These files are passed through to a package that will import the file into the system. The data from the file is loaded into the SSIS memory and enriched using the lookup component, when describing data is not found these rows will be written to an error table. The remaining data will be written to a table, with data on minute level. These filewatching packages are scheduled to run every minute. The current system performs excellent with 30 files in one minute.
  • A lot of data is added to the lowest grain table so to keep performance acceptable some aggregation is applied. Every minute a SSIS package is scheduled that selects the last 2 hours of data from minute table into a temporary table grouped by hour and summing the measures. The data is then merged into an Hour aggregation table using the SQL Server 2008 MERGE statement,  with an update of the measure when the row existed and insert when not. This made sure the data is incrementally increased by each run.
  • Reports also show live data from the last 15 minutes so a package is run every minute that fetches data from the last 15 minutes and places it into a table, this table always show 15 minutes of data.
  • There are different reports, one for each user in the organisation. The sorting center manager gets a report showing all teams at work, he can then click on a team and drill down to a report with data of that team.
  • Users who have opened the report want to see the new data loaded, we used the AutoRefresh function in RDL to automatically refresh the report every minute

Finally a screenshot of one of the SSRS reports hosted in the ASP.Net application:

Miss

Overall a great project to work on and we are very pleased with the result :) I hope you get a good idea of how we used the MS BI stack to create this operation BI tool.

Categories: Cloud, SQL Server, SSIS Tags: , ,

Open SSIS packages without validation using the properties

August 26th, 2009 Kasper de Jonge No comments

Today i was working with a SSIS package that has to load 22 million rows of data, it takes a Stored Procedure as source. The problems starts when you put the sp into the query window. SSIS immediately starts validating the columns, but he can only do it after loading the SP, which takes a while.

To disable this checking you can turn off ValidateExternalMetadata on the data source or DelayValidation on the data flow level. It now only checks once on run time not on design time. The downsite of using this is that the component is not aware of changes to the metadata of external data sources.

You can set DelayValidation to True on package elements whose configuration is not valid at design time to prevent validation errors. For example, you may have a Data Flow task that uses a destination table that does not exist until an Execute SQL task creates the table at run time. The DelayValidation property can be enabled at the package level, or at the level of the individual tasks and containers that the package includes. Normally you must leave this property set to True on the same package elements when you deploy the package, to prevent the same validation errors at run time.

Categories: BI Technical, SSIS Tags:

SSIS: Using a Stored procedure inside a Ole DB Source

August 21st, 2009 Kasper de Jonge 1 comment

I wantend to use a stored procedure into a OLE DB Source, but this appeared not to work, with an error in SSIS “No Column information was returned by the sql Command” hmm, strange.

Thanks to the SSIS junkie blog i found an answer in the comments, it appears you to put

 SET FMTONLY OFF

 SET NOCOUNT ON

In the top of your sproc, SSIS then finds the necessary metadata to pass the columns of your result to SSIS.

Categories: SSIS Tags:

SSIS: Put the current connectionstring in a variable

August 21st, 2009 Kasper de Jonge No comments

I’m creating a SSIS package that will call a console application, and as a argument i want pass the current connection string. As i’m using a shared XML config file over multiple packages that sets my connection, i am unable to let the xml config file fill a variable.

You can use a Script task to get the connection string directly from the connection manager into a variable:

public void Main()
        {
            ConnectionManager conMgr = Dts.Connections["Miss"];
            //No connection string available, fail this step.
            if (String.IsNullOrEmpty(conMgr.ConnectionString))
            {
                Dts.TaskResult = (int)ScriptResults.Failure;
            }
            else
            { //Else fill the variable and return succes
                Dts.Variables["ConnectieString"].Value = conMgr.ConnectionString;
                Dts.TaskResult = (int)ScriptResults.Success;
            }
        }

Categories: BI Technical, SSIS Tags:

Import Excel in SSIS and get Null values in stead of Numeric values

July 8th, 2009 Kasper de Jonge No comments

I regulary have to load some Excel sheets into a datawarehouse, what happens a lot is that some values in an Excel column may be returned as Null when the underlying value is not a Null. This typically occurs when numeric and text datatypes are intermixed within the same Excel column.

This appears to be a strange behavior of the Excel ISAM driver, when you load a column it reads the first 8 rows and tries to determine what it should load the data values as. Now if your data has a mix of characters and numbers, you need to make sure you reflect this in the first 8 rows or it will not load correctly. The official explanation and workarounds can be found at ms support: http://support.microsoft.com/default.aspx?scid=kb;en-us;194124

Categories: BI Technical, SSIS Tags:

Assigning surrogate keys to early arriving facts using Integration Services

May 14th, 2009 Kasper de Jonge No comments

SQLCAT has created a great guid on how to  Assigning surrogate keys to early arriving facts using Integration Services, in data warehouses, it is quite common that fact records arrive with a source system key that has not yet been loaded in the dimension tables. This phenomena is known as “late arriving dimensions” or “early arriving facts” in Kimball terminology.

Read the entire article here:  http://sqlcat.com/msdnmirror/archive/2009/05/13/assigning-surrogate-keys-to-early-arriving-facts-using-integration-services.aspx

Categories: BI Technical, SSIS Tags: ,

Handling and Logging Sproc errors with SSIS

May 12th, 2009 Kasper de Jonge No comments

I frequently use SSIS to schedule some maintence tasks using stored procedures. I want to use transactions to control my data and want to use SSIS to control my work flow. To accomplish this I use try..catch and Raiserror in the stored procedure and SSIS logging to keep track of the errors.

For example I use this sproc :

CREATE PROCEDURE MoveData
AS
INSERT INTO History.ReportTable
SELECT * FROM Reporting.ReportTable
WHERE reportdate < cast(dateadd(DAY,-2,GETDATE())as date);
 
 DELETE FROM Reporting.ReportTable
 WHERE reportdate < cast(dateadd(DAY,-2,GETDATE())as date);

But I want to control the data so all or nothing is commited to the data, since SQL Server 2005 you can use the try..catch function and use transactions with that, so adding that maken:

CREATE PROCEDURE MoveData
AS
BEGIN TRANSACTION;
BEGIN TRY

INSERT INTO History.ReportTable
SELECT * FROM Reporting.ReportTable
WHERE reportdate < cast(dateadd(DAY,-2,GETDATE())as date);
 
 DELETE FROM Reporting.ReportTable
 WHERE reportdate < cast(dateadd(DAY,-2,GETDATE())as date);

END TRY
BEGIN CATCH

 IF @@TRANCOUNT > 0
        ROLLBACK TRANSACTION;
END CATCH;
IF @@TRANCOUNT > 0
    COMMIT TRANSACTION;

This stored procedure makes sure that when the insert of delete fails for any reason it will be rollbacked. But my sproc will just run and not return a error, my SSIS execute SQL task won’t know anything went wrong. That’s were RAISERROR comes in. Adding that to the sproc makes:

CREATE PROCEDURE MoveData
AS
BEGIN TRANSACTION;
BEGIN TRY

INSERT INTO History.ReportTable
SELECT * FROM Reporting.ReportTable
WHERE reportdate < cast(dateadd(DAY,-2,GETDATE())as date);
 
 DELETE FROM Reporting.ReportTable
 WHERE reportdate < cast(dateadd(DAY,-2,GETDATE())as date);

END TRY
BEGIN CATCH
  DECLARE @ErrorMessage NVARCHAR(4000);
    DECLARE @ErrorSeverity INT;
    DECLARE @ErrorState INT;

– Get the current error
SELECT @ErrorMessage = ERROR_MESSAGE(),
           @ErrorSeverity = ERROR_SEVERITY(),
           @ErrorState = ERROR_STATE();

    — Use RAISERROR inside the CATCH block to return
    — error information about the original error that
    — caused execution to jump to the CATCH block.
    RAISERROR (@ErrorMessage, — Message text.
               @ErrorSeverity, — Severity.
               @ErrorState — State.
               );
 IF @@TRANCOUNT > 0
        ROLLBACK TRANSACTION;
END CATCH;
IF @@TRANCOUNT > 0
    COMMIT TRANSACTION;

You now can create your own error handling, whether in a workflow or use the SSIS logging to monitor and log your errors, like I do.

Categories: SQL Server, SSIS Tags:

SSIS Logging when callig sub packages

April 17th, 2009 Kasper de Jonge No comments

I have one parent package which calls one sub package with the execute package task. I put on OnError logging to a database on both packages but it seems the parent package logs alle events in the sub package as well, so disable the logging on the sub package or you get the message twice which will give a rather strange view.

Categories: BI Technical, SSIS Tags: ,