Script Component for creating Digital Fingerprint

I thought I would share the component I use to create the digital fingerprint.

DigitalFingerprint.png

It uses the native .Net libraries and performs quite fast. The Row object is how you interact with the incoming stream. The outbound column name is Fingerprint, defined in the output tab as DT_BYTES with a size of 20.  FileData is defined as a DT_IMAGE input column. You will have to add the Fingerprintto the output stream before you can use it within the script component.

Cryptographic Hash Functions

Cryptographic hash functions take an arbitrary number of input bytes and reduce it to a fixed size. This resultant size is dependent upon the function you use, be it MD4,MD5,SHA-1, etc. I prefer to use SHA-1 since it is the most secure of the 3 listed above. There are many other cryptographic functions, but they are not included in all libraries.

One-Way hashing, HashBytes, and SSIS

I helped develop an application that stores documents within SQLServer (v2005) and are then retrieved for use by a web-based front end. I had the task of co-developing the back end pieces; table structures, indexes, relationships, and anSSISpackage to manage the load. The design is straight forward: Documents are stored in a varbinary column with ancillary data stored in additional tables to facilitate look ups. The current document form is a .pdf around 80k each. The current logic states that we keep only one copy of a given document and the first copy we get is the one we keep. It is easy to maintain in that I do a look up in the ancillary tables prior to loading up the "new" .pdf and if this document exists, I don't bother loading up the new one. Then came the curve.