Monday, December 3, 2012

SSIS - Split records and export to flat files

Today's issue that we've a source database Sybase ASE, and old version is being used as a replica, I'm trying to extract some records, it's almost 750K to 1 Mil records and it needs to be exported to flat files and sent somewhere for further processing.

The files has to be no more than 10K records a file and it needs to get a specific naming convention...

I've done it differently than what I'm showing in the demo here... and I'll explain why...


the main component in the package is the "For Loop Container", then inside the container I'll have a "Data Flow Task".

I'll create 4 variables as shown below..

 
 
 
Add a "For Loop Container", configure as attached.. to use the variables and set the limit to 100K here


 
 
 
 Basically the first task, is a "Script Task", I'm only using it to debug my variables in a message box, enable or disable to debug, you could use "BreakPoints" which I've used in my original package and here only it's for demo.

 
 The way I'm debugging is so easy, just to display a message with the current variables' value to check if I'm heading in the right way!  again use breakpoints, add a watch and check your variables values...


 
 The data flow task...

I'm using a data generator task, to create my input rows....you can use your own source tables of course, I was reading from a sybase ASE database, an old version which doesn't support the Row_Number() function, so I couldn't build a column for the row numbers, and yes...I thought about a SQL server staging table...yes but it wasn't an option !


 
 I'm generating 520,000! yes weired number but I want to show the rest of the 20K records will get into a file of their own!

 

Just a row count of all the rows , please note that it'll run 6 times and it's not efficient...
You could use an "Execute sql task" to get you count(*) in the "Control Flow" and that will run 1 time, but here I'm using the data generator task....


 
Here it's my optional "Script component" to create a Row Number column, again I'm using that because I'm using the data generator task..or if you're reading from a flat file directly or if Row_number is not an option and no staging SQL table either !


 
 Conditional split to filter which rows where exactly need to write to our destination...


And here I'm setting the flat file destination..


 


 




I've an expression over the connection string to get me the file name dynamically to reflect which records in the file.

 

The last step here is a script task to update the variables' values...I'm not really sure why I've it in the control flow, but you can do it in the data flow task !
 
 
 
 
That's it! the folder when it's empty...
 
 
The folder when it got the files !!
 
 
Wrap up... again.. you don't need to debug your variables
use the data generator task...
use the row number task
 
or even the split task ..Yes !!  if you're reading from your source table and not from a data generator task, you might build your query to use the 2 variables and get only rows where the RN between these 2 variables... I didn't illustrate that here for simplicity.
 
it'd look like that...
 
 And yes I've an index on the RN column ...


The last screen shot here is not meant for people with weak hearts !! it's scary!!



 
 
Hope I've explained it clearly... and hit me up if you have a better way of doing it...