I have been working for a few months now on refining a data extraction process. This process goes against a third party scheduling tool and the myriad of tables they have. The reason for this is rather simple, the vendor's canned reports are poor and provide maybe 5% of what is needed by the business. The 'pay-to-play' reports are evidently expensive and provide little more than the free ones. It was during a Eureka moment that I figured out how to actually take their 'pseudo' parent child tables and unfurl it.
Let me take some time to explain how the scheduling jobs are set up and and the complications it causes. In this system I create a job that acts as a header, a sort of container. It is not scheduled and takes no resources. Under that header I can have dozens or hundreds of jobs that can run simultaneously or be linked together.
Here is a simplified drawing of how the graphical interface would look:
In the table that maintains this relationship I would have something like this:
If you take a close look, I can see the parent of Step1a, which will be Step1. I really would like to see the parent of Step1a, be Test Job Stream, because that is what it will ultimately effect. I tried using CTE to walk up the list, but some of these go very deep and I would hit depth limits and a whole bunch more nastiness. I decided to extract the information and place it in a '|' delimited flat file which I would later consume and present as a table with Parent,Child records. This layout would show Step1, Step1a, Step2 as children of Test Job Stream. I have other tables which show the dependencies, which is what Step1 and Step1a share.
I decided to create a table with 2 columns (parent int, child_lineage varchar(max)). Taking the above example, that I create 1 entry in a holding table for each of the top level parents ( those with a parentid of null). Then I would walk the entire table and build the lineage, by adding a '| between each child and associate it with the parent, but in a unique way. Given the above image,I would have a record at the beginning:
Then I find the children of 444 and link that as such:
Here is the procedure which does this unwinding:
While figuring out the how to unwind this was not an easy task, figuring out how to get SSIS to work on a varchar(max), which it interprets as a BLOB took me about 12 hours and immense frustration to change and work smart. I decided to pushed the table to a flat file and then read in the file, parse it and create a new file which can easily be pushed into a table. Here is the flat file created by the result of the stored procedure (output of the table):
The above file is torn apart and rebuilt with this script (C#)
The new file built by this script looks like this:
I then use a simple flat file import to pull that into a table with two columns (parent_id,child_id). Thinking outside the box (outside SQL Server) presents simple solutions to complex problems.