Welcome to SQL-Us Advanced SSIS course. This week long course will provide you with some advance techniques to help you tackle those complex ETL issues. Today's topic is the issue with having multiple paths within a Data Flow component.
The most complex ETL I have is the Document Load package. This package is the example I use for over 85% of posts and is the one that has pushed me to the limit of my knowledge. Pictured below is the core data flow for the Document Load package.
I have a single source (yellow) and 4 destinations (green and magenta). The Data Flow is responsible for processing a single document at a time, using my ETF (Enhanced Threading Framework). I will go over this later in the week. Now, for any given document there is only one valid path. These paths are split in half, with the left side (green and magenta) for invoices and the right side for statements. This seemed simple enough, but it turned out to be a nightmare.
What I found out was that all of the components are run, even if there is no path to them. In other words, although a document might take the path of a new invoice, which would be the green destination on the left, SSIS evaluates and runs all components. Yes, I know they all turn green, but I wasn't aware that they were run. I thought it was more like an If Statement, where it logically follows a path.
Now to be honest, I don't know if I would have noticed this, had I been using standard destination components, which take this into account. I am sure you have noticed that you can have a source that retrieves 0 records and your destination turns green, but there nothing was transfered. The component turns green, because it has a mechanism that handles the incoming data flow, even if there are 0 records. The issue with the custom destinations, is not the inability to handle 0 records, but how they handled 0 records. We will see in the upcoming days how this, coupled with interaction with package variables works and what to be aware of.
Today was intentionally brief and used to set up the next 4 days, which will include how to make a custom source component from a stored procedure, a custom destination components (to obtain an identity column from an insert), how improve importing multiple files from multiple sources using a custom script(s), and finally the ETF, which will explain how to improve package design by using 'Engines'.
I'll see you all tomorrow where we will dive right in with Sourcing a Stored Procedure.