Gemini Does Cloud!
Andrew Brust made a great blogpost about opening a Azure database with gemini:
http://www.brustblog.com/archive/2009/09/03/gemini-does-cloud.aspx
Andrew Brust made a great blogpost about opening a Azure database with gemini:
http://www.brustblog.com/archive/2009/09/03/gemini-does-cloud.aspx
We recently released the first version of an operational, near real time, BI platform for monitoring packages in a package sorter for a dutch company. This blog post will describe how we solved their problem using Reporting Services, Integration Services, ASP .Net, hosted in the cloud using Amazon Elastic Compute Cloud.
The scenario
There are 4 sorting centers in the Netherlands where parcels are collected from the region where it is in. Parcels receives a unique code (Bar code) on collection. The sorting centre then has to determine by postal code the destination of a parcel, parcels are eventually delivered by regional delivery stations. The Parcels are grouped by the sorting centre and during the night shipped to the delivery stations, the delivery station that is the farthest away gets its parcels first. When a parcel destination is closer then another sorting center it is send to that sorting center first, so parcel not within the range of the sorting center are coarsely sorted by region and parcels that are in range are finely sorted by street. Parcels are loaded upon a assembly line of a sorter machine in the sorting center, the postal code is scanned and translated to a digital postal code that will be attached to the bar code. The sorter then determines what the destination assembly line is, called chutes.
The current information is brought by reports full of numbers that are hard to interpret and an excel sheet full of VBA code that is mailed around the organisation. Each sorting centers hosts it’s own server running Reporting services 2000 and a Visual Basic service.
The challenge
The challenge the client had for us was:
How we did it: Project Startup
The project started with some brown paper session with different users from the organisation. In these sessions we determined the essential data needed to manage the sorting centers. Focus of the first release was replacing the current reports and excel sheet with the new reporting environment. The data will be delivered to our system by text files, every minute new files will be created and have to be imported and transformed by the application, estimate is 4 files per minute.
After designing reports for the different user groups, determining the MoSCow and analyzing the different input files we decided for the following architecture:
While brainstorming with the client it appeared they would like to host the system outside their own infrastructure, we already had some experience with hosting at the Amazon Elastic Compute Cloud and decided to try to determine the feasibility of using Amazon for our bi system. Since the data is delivered in small files every minute a solution was quickly found. Servers hosted in the cloud files can fetch files as easily as local servers. The server in the cloud fetches data using SFTP. The Amazon cloud is very flexible that let you quickly scale capacity, both up and down, as your computing requirements change.
Implementation
Below is a graphical representation of the architecture used:

The application consists of the following steps:
Finally a screenshot of one of the SSRS reports hosted in the ASP.Net application:

Overall a great project to work on and we are very pleased with the result
I hope you get a good idea of how we used the MS BI stack to create this operation BI tool.
Microsoft announced earlier this week that a CEP/stream processing product will be included in SQL 2008 R2. Complex Event Processing, or CEP, is primarily an event processing concept that deals with the task of processing multiple events with the goal of identifying the meaningful events within the event cloud. CEP employs techniques such as detection of complex patterns of many events, event correlation and abstraction, event hierarchies, and relationships between events such as causality, membership, and timing, and event-driven processes.
Microsoft called out four reasons to me why CEP might be needed in addition to ordinary database processing. Two are the standard reasons for data reduction:
1. Without CEP, you can’t bang the data into the database fast enough.
2. You don’t want to keep most of the data past a short time window anyway.
The other two are also fairly standard reasons for using CEP:
3. Standard SQL isn’t all that great for time series anyway.
4. CEP use cases often call for incremental processing and/or parameterization of queries, something CEP engines are commonly better designed for than are DBMS.
However, Microsoft seems to be taking a somewhat different approach to time-based SQL extensions than some other vendors. To quote email Microsoft sent today:
Microsoft Research (MSR) introduced the temporal extensions to relational algebra based upon a notion of application time that is independent of system time. It matters when the event originated instead of when they arrived at the processing system. Further it treats each event as being associated with an interval of time as opposed to a point in time. This helps in modeling certain real life phenomenon naturally. [StreamBase et al.] also reason about multiple streams. Both the approaches are extensions to relational algebra. The MSR approach took the algebra as the starting point while StreamBase took an existing language over the algebra – SQL as the starting point. The MSR approach consequently avoids having to rework other elements of the SQL surface. The primary language extensions through which this algebra will be exposed initially is LINQ.
What are the implications of this? Can we use the CEP algorithm to monitor real time data from the cloud and extract only the necessary data to our datawarehouse ? or am i going to far with this ?
Found at: http://www.dbms2.com/2009/05/13/microsoft-announced-cep-this-week-too/