SSIS, Raw Files and Local IO

Our architecture places SSIS packages on their own servers, away from our SQL Servers.  This reduces contention on the sql server boxes themselves and makes the SSIS servers independent, allowing resources to be adjusted accordingly.  There are a few processes where we use SSIS 'Raw Files' to improve performance.  Originally we put the raw files on a remote file server and performance was acceptable.  The idea was floated that putting the raw files locally on the SSIS server should improve performance by reducing network traffic.  It turns out that performance went down by over 80%.  A little investigation determine it was the load on the SSIS box.  The SSIS box has 2 dual cores and 8 GB of memory.  The packages we run use 8 DataFlow Tasks, which is much more than the 4 CPUs installed on the machine.  The reason for this is that the sql we run some time (anywhere from a few seconds to a few minutes) so not all 8 DFTs will be active at the exact same time.  Also there are times where multiple SSIS jobs are running at the same time so the machine is almost always at full capacity.  Back to why performance for local IO is worse than networked IO.

It seems as though connecting to a remote file share enables the server to offload the IO requests to the network cards (2x1GB) vs. servicing the requests locally.  While this might seem counter intuitive, it makes some sense.  You are using the network card to process the IO that is being sent, so you simply queue it up and off it goes.  When you are asking for local IO, it requires processing power, since the OS needs to satisfy the request.  That local request needs a cpu, which takes away from the SSIS processes.