Wednesday, July 15, 2009

A Close Look at Look Up


Pre-requisite: SQL Server Integration Services (SSIS)

Are you new to SSIS’ Look-Up?
Bing it! Or visit, http://msdn.microsoft.com/en-us/library/ms141821.aspx.

Presented is the SSIS internal about the Look-Up component.

• Memory: The Look-Up component operates in the Full Cache mode, i.e. while executing it caches the entire reference table/dataset into memory, this is done in the pre-execute phase and the database is not contacted again. The other modes being Partial Cache or No Cache.

To read more on the caching this would be a good read.
http://www.sql-server-performance.com/articles/biz/SSIS_New_Features_in_SQL_Server_2008_Part1_p1.aspx

• Case-Sensitivity: The Look-Up component is Case Sensitive. Hence, it has to be appropriately utilized based on the data. We can use the upper () or lower () functions.

Performance Tuning:Suggested are some methods that would help us use the Look-Up Transformation Component efficiently.

1. Being Choosy with Columns:While configuring the Look-Up transformation we have an option to choose either a “Table or View” or “Use Result of a SQL Query” as the Reference Data Set. It is advisable to use the latter with the desired reference columns.

Reason: If the Look-Up operates in the Full Cache mode, it is an unnecessary effort expended by SSIS (by fetching non-looked up columns) and we would be saving on the memory. :)

2. Being Choosy with Rows:If appropriate filters are applied on the SQL query, the records in the Referenced data set reduce and hence the take away would be:
o Input Data Set would be compared against a smaller result set.
o Memory required would be reduced again.
Hence, gaining an edge on the time front!! :)



3. Restrict Memory: Performing look-up on a large (reference) data set; there can be cases where the Memory Buffer Manager runs out of memory (Full Cache Mode). This may hamper the execution of the SSIS package.

The Advanced tab in the “Look-Up Transformation Editor” enables us to restrict the memory to be allocated for Look-Up. On restricting the memory, two things happen:
o The amount of memory that the Lookup is allowed to use is limited. A cache policy is used and new rows are added to the cache on demand.
o The new rows are added to the cache individually; i.e., SSIS will query the database whenever data for a row cannot be located in the internal cache kept by SSIS. These are singleton (single-row) queries, unlike the large set-based table queries that occur when memory restriction is not enabled.

Advantage: Look-Up can be performed on Large (reference) data.

Disadvantage: The cost is that singleton queries are used, which are slower than a single table query.

Suggestion: The above setting is subject to every project case. It is advisable to trial run with appropriate memory setting! :)

4. Think Re-location:
Certain resource consuming operations can be pushed to the Source component itself. For Instance:
i. Look-Up operation can be performed at the source (while reading from the table).
ii. Data Conversion tasks can be performed at the source (while reading from the table).
iii. Upper () and Lower () functions can be performed using a Derived Column
Transformation component.

Wassup with SQL Server 2008 (Code Named: Katmai)????
Cache Transformation - Introduced in SQL Server 2008, aids in performing look-up better!
To be read as home work ;)

So, wishing you luck with development and customer WoWs!! :)

References: http://blogs.msdn.com/sqlperf/archive/2007/04/24/getting-optimal-performance-with-integration-services-lookups.aspx

Add to Technorati Favorites

Monday, July 13, 2009

Your First SSIS Package

Having written about SSIS Environment; It's now time for Dosage2.....

SSIS Packages basically are made of components, variables and connection managers (which helps in establishing a connection to a database / SMTP server / file).

Let’s begin with a simple package, to extract data from a flat (.csv) file and load it into an Oracle Table. Achieve this in 10 Easy steps:

1) Choose the “Data Flow Task” component from the Toolbox and drag it on to the Control Flow Area.

2) Double Click the “Data Flow Task” (Note: you’d be in the Data Flow Area)
OR
Click the “Data Flow Task” and click on the “Data Flow” Tab.

3) Choose “Flat File Source” from the Data Flow Sources present within the toolbox.
(Note: The components in the Control Flow and Data Flow are not the same)


4) Double Click the component to configure the same:
i. Creation of Flat File Connection Manager: Click on the button “New” from the Dialog box. (The reason is we are interested in creating a Flat File Connection manager, who would help us, read data from the “.csv “ file)

ii. Give it a name and provide a description (Good Practice: Name it on its functionality Test_Flat_File_Source_Conn) and browse the “.csv” file.

iii. The Format is delimited and can be retained. The Check box can be checked if the (.csv) file contains the header.

iv. Click Columns on the left side tab and you would be able to preview the data.

v. Click OK. (Note: A connection manager with the name “Test_Flat_File_Source_Conn” would be seen in the Connection Managers area).

5) Choose “OLEDB Destination” from the Data Flow Destinations in the Toolbox and drag it on to the Data Flow Area.

6) Click on the “Flat File Source” component and drag the green connector to the “OLEDB Destination” (This would make the data available as input to the OLEDB Destination).

7) Double Click on the component to configure the same.
i. Creation of an OLEDB Connection Manager: on the button “New” from the Dialog box.

ii. Click on the button “New” from the second Dialog box.

iii. Choose the appropriate Provider from the drop down (In our case Microsoft OLEDB Provider for Oracle).

iv. Enter the TNS Name / Server name and provide the credentials for the schema.

v. Click OK.

vi. Returning back to dialog box #1, from you can choose the desired table from the drop down list.

vii. Click on the Mappings Tab in the pane on the left hand side; establish the desired mapping.

viii. Click OK.


8) If the above steps were performed appropriately, there shouldn’t be any errors or warning!

9) Save your work, and Choose Debug -> Start Debugging and there goes your maiden packages run.

10) The graphics provided by SSIS would make the process easy to comprehend. To Stop Debugging choose Debug -> Stop Debugging.

Add to Technorati Favorites

Tour Around SSIS

What to expect from this essay?

Beginner: You get a free ride to SSIS
Others: Get to debug the article!!

So let’s get started…

SQL Server Integration Services (SSIS) is a Data Integration tool; it comes bundled with SQL Server. And is the successor of DTS (Data Transformation Services).

After the Installation of SQL Server you need to do the usual ritual

Start -> Programs -> Microsoft SQL Server -> SQL Server Business Intelligence Development Studio (BIDS). File ->New Project (Choose Integration Services Project) from the dialog box that pops up.

Name it and save it in the desired location.

First Look:

For Minds acquainted with Visual Studio, SSIS would be the very same, for the rest here’s what you’d get to see…





On the left hand side

1) Tool Box
This guy is our palette wherein, all our components (which enable us to create an SSIS Package) are present.

2) Variables Window
All system and user defined variables are housed here. The users can add and delete variables. The window is not showcased in the above fig., we need to click the Variables tab.

On the right hand side

3) Properties Window
This window contains all properties pertaining to the component chosen. This is very similar to what is seen in the Visual Basic IDE!

4) Solution Explorer Window
Helps the developer to navigate and manage his/her solution. The definition goes like: a solution can contain many projects which in turn is comprised of one or more SSIS packages!

At the bottom

5) Connection Managers Area
All the connection managers to various source and destination would be housed in this area.


In the middle are

6) Control Flow Area
Where in all task/components (found in Toolbox) that aid in the movement of data from the Source to the destination. This pretty much serves likes a Work Flow

7) Data Flow Tab
Contains the components (found in Toolbox) to connect to the source, destination, define mappings, transformation etc., the gate way to this tab is the Data Flow Task from the Control Flow area.

8) Event Handler Tab
Helps user to perform tasks during specific events viz, OnError, PostExecute etc.,

Congrats!! You now are on your way to develop your hello world project. :)

Add to Technorati Favorites