Another BI & Programming Blog - Jason Yousef: July 2011

Wednesday, July 20, 2011

SSIS Performance considerations

Performance is the key factor for any successful data integration project, knowing the type of transformation that you’re using is the first step on optimizing the SSIS flow performance, by utilizing the correct transformation or the design alternative technique you could gain much more performance out of your SSIS flow.

The recommendations and suggestions below represents real life experience and excerpts from the Microsoft Technet article : http://technet.microsoft.com/en-us/library/cc966529.aspx along with influence from the other links referred to at the bottom of this Article. While this Article presents a summary and major points to focus on from those resources, you are encouraged to follow the links for more specific and in depth discussion.

SSIS (SQL SERVER INTEGRATION SERVICES) is using a buffer-oriented architecture to efficiently load and manipulate datasets in memory, the benefits is avoiding the I/O to the disk and physically copying the data to the disk, and therefore it’s transferring the data from the source to the destination through the pipeline without touching the disk.

As this data flows through the pipeline, SSIS attempts to reuse data from prior buffers as much as possible when additional operations are performed. How buffers are used and reused depend on the type of transformations that you use in a pipeline.

There’re 3 types of transformations types:

1-Row Transformation (synchronous transformation) also known as (Non-blocking transformations):

These process the data in Row-by-row basis, Row transformations have the advantage of reusing existing buffers and do not require data to be copied to a new buffer to complete the transformation.

• Do not block data flow in the pipeline.
• Data is not copied around, only pointers.

Examples:

• Audit
• Cache Transform
• Character Map
• Conditional Split
• Copy Column
• Data Conversion
• Derived Column
• Export Column
• Import Column
• Lookup
• Multicast
• OLE DB Command
• Percentage Sampling
• Script Component
• Slowly Changing Dimension

2-Partially Blocking Transformation (asynchronous transformation):

Are often used to combine several datasets. They tend to have multiple data inputs. As a result, their output may have the same, greater, or fewer records than the total number of input records.

• Introduces new buffers in memory layout.
• Transformed data is copied into new buffers.

Examples:

• Data Mining
• Merge
• Merge Join
• Pivot
• Unpivot
• Term Lookup

3-Blocking Transformation (asynchronous transformation):

Blocking transformations must read and process all input records before creating any output records. Of all of the transformation types, these transformations perform the most work and can have the greatest impact on available resources.

• Must see all data before passing on rows.
• Blocks the data flow – can be heavy on memory
• May also use “private buffers” to assist with transforming data.

Examples:

• Aggregate
• Fuzzy grouping
• Fuzzy lookup
• Row Sampling
• Sort
• Term Extraction

Evaluating the performance:

1-You could monitor and observe the performance using the PROGRESS tab, while running the package, it’ll tell you when the task started, ended and how long it’s taking to finish, which would be a great way of discovering which tasks is pulling the performance down.

progressTab.png (42 KB) (File Type Details)

Progress Tab

2-You could also use a free add-on for BIDS (Business Intelligence Development Studio) called BIDS HELPER, can be downloaded from http://bidshelper.codeplex.com/

It offers whole new features for the BIDS environment which would help you such as deploying your packages, dtsConfig file formatting and SSIS performance Visualization which I’ll cover now.

Using the SSIS performance Visualization is simple, you just right click on your package after installing the BIDS HELPER and you’ll have new options, you need to select “Execute and Visualize Performance”.

bidsw.png (30 KB) (File Type Details)

Bids Helper

You’ll be introduced to another screen with Performance Tabs, and you choose to visualize the performance by looking at Gantt Bars, Statistics Grid or Statistics Trend, which compares the performance from one execution to another. It adds a new column for each execution and highlights whether the duration of that piece of the package was faster or slower than before.

SSISPerformanceTrendGrid.png (16 KB) (File Type Details)

Performance Trend

3-Using Perfmon (formerly known as Windows Performance Monitor and synonymously referred to as Perfmon)

There're several counters to monitor SSIS performance, such :

SQLServer:SSIS Service:

SSIS Package Instances - Total number of simultaneous SSIS Packages running

SQLServer:SSIS Pipeline:

BLOB bytes read - Total bytes read from binary large objects during the monitoring period.
BLOB bytes written - Total bytes written to binary large objects during the monitoring period.
BLOB files in use - Number of binary large objects files used during the data flow task during the monitoring period.
Buffer memory - The amount of physical or virtual memory used by the data flow task during the monitoring period.
Buffers in use - The number of buffers in use during the data flow task during the monitoring period.
Buffers spooled - The number of buffers written to disk during the data flow task during the monitoring period.
Flat buffer memory - The total number of blocks of memory in use by the data flow task during the monitoring period.
Flat buffers in use - The number of blocks of memory in use by the data flow task at a point in time.
Private buffer memory - The total amount of physical or virtual memory used by data transformation tasks in the data flow engine during the monitoring period.
Private buffers in use - The number of blocks of memory in use by the transformations in the data flow task at a point in time.
Rows read - Total number of input rows in use by the data flow task at a point in time.
Rows written - Total number of output rows in use by the data flow task at a point in time.

But basically you need to monitor the 3 famous kinds of memory buffers:

"Buffers"
"Private Buffers"
"Flat Buffers"

Read More....

Evaluating Design and performance considerations:

As I always say, everything is life could be done in many different ways; it’s just YOU who needs to decide which approach is better for you as it really depends on your environment and business requirements.

I’m going to point some points that you’ll need to do more research and also included some links so you can look it up in more depth.

1-Remember is that synchronous components reuse buffers and therefore are generally faster than asynchronous components, that needs a new buffer.

2-Break complex ETL tasks into logically distinct smaller packages. More...

3-Maximize Parallelism. Utilizes the available resources as much as you can. More ...

4-Maximum insert commit size and Rows per batch has a great effect for not treating the whole incoming rows as one patch.

5-Minimize staging as possible otherwise use RawFiles. More ...

6- Eliminate redundant columns:

• Use SELECT statements as opposed to selecting the tables dropdown.
• Be Picky! Select only what columns you need VS SELECT *

7-Minimize blocking as much as you can:

• Sort your query in the engine, or even using the SQL command in OLE DB Source instead of using Sort transformation.
• Merge transform requires Sort but not Union All, so use Union All wherever you can. More ...

8- Optimize the BufferTempStoragePath and BLOBTempStoragePath wisely. More ...

9-For Flat File Source use FAST PARSE option for columns of integer and date data types. More ...

10- If the SQL server is local, consider using SQL Server destination instead of OLE DB Destination.

11-Consider indexes fragmentation and performance when inserting high volume of data.

12-Use “Fast Load” when using OLEDB, it uses BULK INSERT and would be faster.

13-Optimize the packet size in your connection managers to 32K (32767) and that’s the fastest option.

More resources on the subject:

Top 10 SQL Server Integration Services Best Practices

Designing and Tuning for Performance your SSIS packages in the Enterprise (SQL Video Series)

Integration Services: Performance Tuning Techniques

That's it...I hope my article helps someone gaining more performance from their ETL solution, and feel free to add more performance considerations

Tuesday, July 19, 2011

Last Day of the Month SSRS Expression....

A question came on one of the fourms about how to get the last day of the month using months as intgers, the person has 1,2,3....etc in his column and need to convert that to month and get the display only the last day of the month in the MM/DD/YYYY format....

Use that for January (1)

=Left(DateSerial(Year(Now()), 1, "1").AddMonths(1).AddDays(-1) , 10)

Use that for February (2)

=Left(DateSerial(Year(Now()), 2, "1").AddMonths(1).AddDays(-1) , 10)

Hope that helps...

Friday, July 15, 2011

Derived Column to convert string to DB Date.

I have a date field in string format "0MMDDYY" and some value has "0000000".
I need to convert the string to normal db date and also replace "0000000" with NULL.

I'm using a derived column, and assuming that the column name is [Column].

[Column] == "0000000" ? "NULL" : (DT_WSTR,2)20 + (DT_WSTR,2)RIGHT([Column],2) + "-" + (DT_WSTR,2)SUBSTRING([Column],2,2) + "-" + (DT_WSTR,2)SUBSTRING([Column],4,2)

and if it was an INT field....

[Column] == (DT_WSTR,7)0000000 ? "NULL" : (DT_WSTR,2)20 + (DT_WSTR,2)RIGHT([Column],2) + "-" + (DT_WSTR,2)SUBSTRING([Column],2,2) + "-" + (DT_WSTR,2)SUBSTRING([Column],4,2)

That's assuming that your dates are in the year 2000 and above...

What if the String has a date such "0110540"

The above code will give you 2040-11-05

so we'll have to check and fix the year to 1940

@[User::Column] == "0000000" ? "NULL" : ( (DT_I4)(DT_WSTR,2) RIGHT(@[User::Column],2) <= 11 ? ((DT_WSTR,2)20 + (DT_WSTR,2)RIGHT(@[User::Column],2) + "-" + (DT_WSTR,2)SUBSTRING(@[User::Column],2,2) + "-" + (DT_WSTR,2)SUBSTRING(@[User::Column],4,2) ) : ( (DT_WSTR,2)19 + (DT_WSTR,2)RIGHT(@[User::Column],2) + "-" + (DT_WSTR,2)SUBSTRING(@[User::Column],2,2) + "-" + (DT_WSTR,2)SUBSTRING(@[User::Column],4,2) ))

That's the only way since we have only 2 digits for the year, plus i'm calculating patient's age, so I don't think any of them over 100 years old !

I'm open to discussion if you have a better way of doing it..

Hope that helps..

Sunday, July 10, 2011

SSIS "MM/DD/YYYY" Format expression.

(DT_DBTIMESTAMP)(SUBSTRING([date string], 7, 4) + "-" SUBSTRING([date string], 1, 2) + "-" + SUBSTRING([date string], 4, 2))

Friday, July 8, 2011

FTP files list to text file using SSIS.

I got a requirment to dump all the FTP file listing for all folders and subfolders to a text file.
I'm using here a free FTP server which you could use too
ftp://ftp.secureftp-test.com/ User:"test" Password: "test")

what SSIS FTP Task can you do nativly?

The FTP Task, supports, sending, receiving, deleting folders and directories, but doesn’t support watching for a specific files or getting a list of the FTP directory listing.

Get list of files and folders to a text file.

A.Drag a script task

B. Select “Microsoft Visual Basic 2008” as the scripting language, then click “Edit Script”

C. Add the following script

And the result will be at D:\FTPList.txt

Hope that helps!!

Download the package from http://www.box.net/shared/77tqeitp99czj6feefxv

Imports System

Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.Collections.Generic
Imports System.Text
Imports System.Net
Imports System.IO

<System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="")> _

<System.CLSCompliantAttribute(False)> _
Partial Public Class ScriptMain
Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase

Enum ScriptResults

  Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success
  Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
End Enum

    Public Sub Main()
        Dim FilePath As String = "D:\FTPList.txt"      'save to location - You can use a variable here.

'delete if file exists, Comment out if you need to append to the existing TXT file

        If File.Exists(FilePath) Then
            File.Delete(FilePath)
        End If

Try

            'Create the connection to the ftp server
            Dim cm As ConnectionManager = Dts.Connections.Add("FTP")
            Dim Folders As String()
            Dim Files As String()
            Dim FileName As String
            Dim Folder As String
            Dim Dir As String

'Set the properties like username & password

            cm.Properties("ServerName").SetValue(cm, "ftp://ftp.secureftp-test.com/")          'Server address
            cm.Properties("ServerUserName").SetValue(cm, "test")                        'user name
            cm.Properties("ServerPassword").SetValue(cm, "test")                        'password

Dim ftp As FtpClientConnection = New FtpClientConnection(cm.AcquireConnection(Nothing))

'Connects to the root of the ftp server,

            Dir = "/"                                'root directory
            ftp://ftp.connect/()
            ftp://ftp.setworkingdirectory(dir/)
            ftp://ftp.getlisting(folders/, Files)

Dim SW As System.IO.StreamWriter 'Writer to write to the file

SW = System.IO.File.AppendText(FilePath)

'Write date time header (getting fancy)

SW.WriteLine(Now().ToString + vbCrLf + vbCrLf + vbCrLf)

'write root file list

            SW.WriteLine("Folder : " + Dir + vbCrLf + "----------------------" + vbCrLf)
            If Not Files Is Nothing Then 'checking for no files
                For Each FileName In Files
                    SW.WriteLine(FileName)
                Next
                SW.WriteLine(vbCrLf)
            End If

'write other folders and their files' list

            If Not Folders Is Nothing Then 'checking for no other folders
                For Each Folder In Folders
                    SW.WriteLine("Folder : " + Folder + vbCrLf + "----------------------" + vbCrLf)
                    Dir = "/" + Folder
                    ftp://ftp.setworkingdirectory(dir/)
                    ftp://ftp.getlisting(folders/, Files)

If Not Files Is Nothing Then 'checking for no files

                        For Each FileName In Files
                            SW.WriteLine(FileName)
                        Next
                    End If
                    SW.WriteLine(vbCrLf)

            End If
            'Close ftp connection
            ftp://ftp.close/()
            'Close writer
            SW.Close()

Catch ex As Exception 'Catch errors, make up your own error as you can see.

Dts.Events.FireError(911, "", "Errrrrrrrrrrrrrrrrrror, something bad happened!!!", "", -1)
Dts.TaskResult = ScriptResults.Failure 'reprot failure

End Try

Dts.TaskResult = ScriptResults.Success 'report success
End Sub
End Class

Wednesday, July 6, 2011

SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Complete Downloadable List

Pinal Dave has a huge list of qestions for the DBA interview questions.
http://blog.sqlauthority.com/2011/07/01/sql-server-interview-questions-and-answers-frequently-asked-questions-complete-downloadable-list-day-0-of-31/

Tuesday, July 5, 2011

Failure is not my option.. Control SSIS flow success or failure options..

How to control the success and failure of the package and execute it successfully? Sometimes the requirement is to continue the execution even if an error has raised, might be a package that has nondependent tasks.

There are 3 ways to do that…

1- Setting up ‘MaximumErrorCount’ property of the container as ‘0’ instead of the default ‘1’.
2- Setting up ‘ForceExecutionResult’ property to ‘success’ for the task properties instead of the default ‘None’.

3- In tasks sequence, we can set ‘On Completion’ precedence constraint instead of ‘Success’.

Hope that helps.