Parallel processing in SQL Server

There have been manyoccasionswhere I've needed to run multiple copies of the same process for performance reasons. Let's say for example that a stored procedure can process 10 records/sec, but I have 1,000,000 records to process. It would take 27.7 hours to process at the previous rate. If I could break up this process into discrete pieces I could run multiple copies and reduce the total run time in a linear fashion (that is if it takes 10 hours to run 1 copy, it would take 5 hours to run 2 copies...1 hour to run 10 copies). There is a computer operation known as, Modulo.

The Modulo operation, available in nearly all programming languages (% or Mod), returns the remainder of a division rounded up to the nearest whole number. Here are some examples:

0%3=0
1%3=1
2%3=2
3%3=0
4%3=1
5%3=2
6%3=0

If you look at the above pattern carefully, you will see that it repeats and that is the handy part of the modulo operation.

Let's go back to the example above and try to break up the task of working through a 1,000,000 records. I will create a table as follows:

Create table Base
(First_Name varchar(30),
Last_Name varchar(100),
workidint identity(1,1))

After you enter 1,000,000 records into the table Base,workidwill have values ranging from 1-1,000,000. Now you have a stored procedure which needs to take the first/last name combination and do some elaborate work on it (so much so that you can only do 10 records/second). Within the stored procedure you will no longer just do this:

select First_Name,Last_Name
from Base

You will do this:

select First_Name,Last_Name
from Base
where workid%x=y

To solve for and ywe need to figure out how many simultaneous processes. I say 5. That means x=5 and would range from 0-4. The simple formula would be, the range of y=0 through x-1.

select First_name, Last_Name
from Base
where workid%5=0

select First_name, Last_Name
from Base
where workid%5=1
......

How does this help? The first procedure to run, would pick up records workid%5=0or any multiple of 5 (5,10,15,20, etc). The second procedure would pick up records workid%5=1 ,records 1,6,11,etc. The final procedure would pick up workid%5=4 (4,9,14,etc). Work is evenly divided and never duplicated no matter how many or how few you use.

Make sure you understand the above technique and those in both previous (cryptographic hashes) and future posts. I will start combining many of these base techniques into more complex scenarios that I use in many of my SSIS processes.