How to Generate a Hash in SSIS

It’s quite common to have a requirement to generate a SHA1 or MD5 Hash in SSIS to help with comparing non-key columns for “delta” management.

One question that comes up is “What’s the fastest way to generate a Hash”. I just did a little test to compare two methods:

Method a) SQL Servers HashBytes(‘SHA1’, Column1+ coalesce(Column2,’’)) function

Method b) In SSIS Using the MultipleHash Component http://ssismhash.codeplex.com/

The target data set was a 10 million row table based on the Adventure Works Sales.SaleOrderHeader table scaled up on my 64 bit laptop with an 8 core i7 CPU.

MethodTime (Secs)Avg %Processor TimeDTEXEC Working Set (MB)
  Method A – SQL HashBytes function on source1045047
Method B – SSIS MultipleHash Component644107156

Conclusions

Not sure if anyone else has got better performance from generating the Hash within SSIS as opposed to SQL Server but my conclusion so far is:

  • SQL Server HashBytes function is a lot faster than doing the Hash calculation in the Dataflow (x6 in my test).
  • There are other factors to choose which method is appropriate:
    • The SSIS Custom Component is easier to use (point and click)
    • The SSIS Custom Component allows you to hash columns which are  “introduced” in the pipeline
    • The SSIS component required the shipping of a DLL (bad)
    • The HashBytes function does not support nulls so needs a lot of casting to strings and checking for nulls.
    • A checksum function is faster again, but carries more risk of not detecting a change.

Leave a Reply