Tuesday, February 14, 2012

Can a Data Flow Task mimic Bulk Insert?

Hi Guys,

The way I understand a Data Flow Task is that it inserts the rows from the source to destination one by one. Is there a way to make it act like a bulk insert task? We have been experiencing performance issues when inserting a lot of rows from one table to another. If there's no way to actually do it, can a bulk insert task functionality be scripted? Coz what I need is a table to table insert, and the bulk insert task only accepts data files as sources.

Thanks!

Kervy

What destination object are you using? The OLEDB Destination and the SQL Server destination both enable bulk insert options for SQL Server.

How many rows are you inserting?

Donald

|||

Oh you mean to say, by default, an OLEDB source inserts to an OLEDB Destination in bulk? We are only inserting a couple of hundred thousand rows, but the source table is in the US. destinations are in Europe and Asia Pacific, so we need to optimize this as much as we can given the proximity of the servers.

We are using OLEDB source and OLEDB Destination

Thanks

|||

If you need to performance tune your data-flow then you should read a whitepaper that I have mentioned here: http://blogs.conchango.com/jamiethomson/archive/2006/04/09/3594.aspx

There's also more useful information here: http://blogs.conchango.com/jamiethomson/archive/2005/04/20/1319.aspx

Note that SSIS does not insert records one at a time - that would be a performance killer. The SSIS pipeline processes data in buffers and the contents of each buffer is inserted at the same time! A large part of performance tuning your application is finding the optimum buffer size.

Note that in your scenario the bottleneck is likely to be the georgraphic disparity of the source and destination, not how fast SSIS can process it. Try and reduce the amount of data that is coming over the wire if possible - i.e. Filter the data at source and only select columns that you require.

-Jamie

|||

hi Jamie,

Thanks for that response. I will read more on how to performance tune my data flow. One accepted cause for the horrible execution time is the distance between servers (destinations are in 2 different continents!), that is why we are trying to figure out if there are other ways to improve on this. We tried testing with a destination server in the same area, and it took about 9 minutes.. compared to the 16 hours it took when the target was Asia Pacific.. we may have to accept that performance issue for now..

total number of rows to be inserted = 200,000

total number of columns = 167

|||

KervyChoa wrote:

hi Jamie,

Thanks for that response. I will read more on how to performance tune my data flow. One accepted cause for the horrible execution time is the distance between servers (destinations are in 2 different continents!), that is why we are trying to figure out if there are other ways to improve on this. We tried testing with a destination server in the same area, and it took about 9 minutes.. compared to the 16 hours it took when the target was Asia Pacific.. we may have to accept that performance issue for now..

total number of rows to be inserted = 200,000

total number of columns = 167

Exactly, hence my previous comment "Try and reduce the amount of data that is coming over the wire if possible - i.e. Filter the data at source and only select columns that you require."

The point here is to reduce the amount of data coming over the wire because that is quite obviously the bottleneck in your scenario.

-Jamie

|||I tried using perfmon to see what can be done, but I can't find the SQL Server: SSIS Pipeline Performance object in perfmon.. Am I missing something here?|||

KervyChoa wrote:

I tried using perfmon to see what can be done, but I can't find the SQL Server: SSIS Pipeline Performance object in perfmon.. Am I missing something here?

As we have already determined (twice) the bottleneck is not within the SSIS pipeline so looking at this perf counter is not going to help you.

Unless you pick one of the machines up and move it to a different continent, or improve the speed of the link between them, you're not going to see any eprformance improvement.

-Jamie

No comments:

Post a Comment