Hashing Algorithm Roundup

As a follow-up to yesterday's roundup of encryption algorithms, I bring you a roundup of popular hashing algorithms.  Unlike encryption, which is generally a 2-way function (encrypt, then decrypt), hashing is generally one-way.  There is no way to determine the original text based on a hash (except brute force) if it is properly implemented.  So what's the point of that?  Well, primarily it's used to verify the integrity of a given bit of data.  Say, for instance, you are doing forensics on a computer system, and you generate hashes of all the files on the file system.  You can demonstrate later, in a court of law perhaps, that those files have not changed since.  Because if even one bit had changed the resulting hash would be different now than it was when you started your investigation.

Hashing algorithms are also useful for storing passwords, because the system doesn't need to record what your password is.  It only needs to be able to verify that the hash for the password you are giving the system now matches the hash in your user profile.  Since hashes are one-way, even if someone did manage to get your password hash out of the database there is no way for them to reverse that hash into a usable password.

A word about collisions:
Theoretically, hashing functions should produce a unique result (digest) for a given piece of content being hashed.  If two pieces of content produce the same hash digest it's referred to as a "collision".  Collisions are bad, because it casts doubt on the integrity of the hash.

All hash functions have potential for collisions.  Let me say that again:  All hash functions have potential for collisions.  There is no way you can produce a fixed-length digest from a variable length bit of data and not have potential for collisions.

The goal of a good hash function is not to eliminate collisions, which is impossible.  The goal is to make collisions so astronomically infrequent that it is computationally infeasible to find them.

OK, with that said, I give you The List: 

Popular Hashing Algorithms

Algorithm

Digest Length (bits)

Passes

Cracked?

HAVAL 128-256 3, 4 or 5 128-bit/3-pass cracked
HAVAL-160 160 4x20 No
MD2 128 1 Yes
MD4 128 1 Yes
MD5 128 1 Yes
Panama 256
Yes
RadioGatun Variable Variable No
RIPEMD 128 4x16 Yes
RIPEMD-160 160 5x16 No
RIPEMD-320 320 4x16 No
SHA-1 160 80 Theoretical
SHA-2 Family 224-512 64 or 80 No
SMASH 256
Yes
Whirlpool 512 10 No


HAVAL and HAVAL-160: Introduced in 1992, HAVAL showed a lot of promise with its variable digest length and number of passes.  However, HAVAL has some fatal flaws, which have been largely fixed by its successor, HAVAL-160, which uses a fixed length digest and a set number of passes.  HAVAL-160 has no known weaknesses at this time and is considered secure.

MD Family (MD2, MD4, MD5): Designed in the late 80s and early 90s by Ron Rivest, the MD-series of algorithms were some of the earliest, and MD5 is to this day the most popular.  MD5, the most secure of the bunch, has been shown to be abnormally susceptible to collisions, and its use is now actively discouraged.  Some consider all current hash functions to be some strengthened variant of MD4.

Panama and Radio-Gatun: Panama was introduced in 1998, and has influenced a number of other popular designs.  Unfortunately, in 2007, researches demonstrated a practical attack on Panama that demonstrated an unacceptable collision frequency.  Luckily, in 2006, a variant of Panama was introduced, called Radio-Gatun (where do they get all these cool names!), that does not have the weaknesses that were exploited in its parent algorithm.  Panama is considered broken, and Radio-Gatun is considered secure at this time.

RIPEMD Family: The original RIPEMD was based on MD4, and unfortunately inherited MD4's fundamental flaws.  It is considered broken.  The successors, RIPEMD-160, RIPEMD-256 and RIPEMD-320, are considered secure.  There was also a RIPEMD-128, but that was found to have similar flaws as its parent, and is not in widespread use any longer.

SHA-1: Designed by the NSA, and published by NIST, SHA-1 was the darling of the federal government for quite a while, and as a result it's probably the most widely used hashing algorithm in place today aside from MD5.  While no attacks have been reported in SHA-1, research has discovered a number of theoretical flaws in the algorithm, and NIST has mandated that all federal agencies must stop using it by 2010.  It is being replaced by its successor, aptly named "SHA-2"

SHA-2 family: The SHA-2 family is actually 4 separate algorithms, distinguished by the length of their resulting digests.  SHA-224, SHA-256, SHA-384 and SHA-512 are all technically SHA-2 algorithms.  Along with SHA-1, these algorithms are the hashing algorithms currently recommended by NIST.  Unfortunately, since SHA-2 is mathematically similar to SHA-1, there are concerns that it may share some of its weaknesses.  A search for its successor, to be called SHA-3, is currently underway at NIST, using the same process that brought us the wildly popular AES encryption algorithm.  The winning SHA-3 algorithm will be crowned in 2012.  In the meantime, NIST is still recommending the SHA-2 family of algorithms for all purposes.

SMASH: Debuting in 2005, SMASH showed a lot of promise, and was a refreshing departure from the usual MD4 pedigree.  Unfortunately, it was quickly broken.

Whirlpool:  Released in 2000, Whirlpool actually harnesses the AES algorithm in its computations.  Which isn't all that surprising when you consider that it was designed by one of Rijndael's original co-authors.  There have been 3 releases, but even the early versions have stood up well to attack.  Subsequent releases have improved the algorithm's speed, taking advantage of advances in processor technologies.  It has been adopted by ISO for its 10118-3 standard.

Which to choose?
My current preference for hashing algorithm, due to its widespread adoption, is the SHA family, either SHA-1 of a variant of SHA-2 if possible.  If I can get it, I'll use Whirlpool.

Print | posted @ Tuesday, January 27, 2009 10:13 PM

Comments on this entry:

Gravatar # Hashing Algorithm Roundup
by DotNetShoutout at 1/28/2009 4:50 PM

Thank you for submitting this cool story - Trackback from DotNetShoutout
  
Gravatar # Reflective Perspective - Chris Alcock » The Morning Brew #275
by Pingback/TrackBack at 1/29/2009 5:20 AM

Reflective Perspective - Chris Alcock » The Morning Brew #275
  
Gravatar # re: Hashing Algorithm Roundup
by Jon von Gillern at 1/29/2009 12:42 PM

Hi Beau, thanks for the post. I think you might want to clarify that purpose of creating the hash is important in deciding which hashing algorithm to choose. Sometimes it is ok to pick an algorithim that makes it easy to find collisions, if speed is a more important concern. i.e. You probably don't want to use SHA-512 to compute a hash when overriding GetHashCode on an object, because the integrity of the hash is not that important to a hashtable, speed is. Most of the time, you'll be just fine if you do a simple XOR of hashes of the object's fields.
  

Your comment:

Title:
Name:
Email:
Website:
 
Italic Underline Blockquote Hyperlink
 
 
Please add 4 and 2 and type the answer here: