Day: June 15, 2013

Which hash should I use?

As far as secure message hashes go, MD5 and SHA1 are considered broken. MD5 is very broken, while SHA1 is quite damaged. In either case they are not good choices for secure message hashes in uses such as digital signatures. But digital signatures is only one use of hashes. There are other situations where the security implications of a hash are not important.

If, for example, you wish to use a hash to verify data transfer then in most cases any of the hash algorithms will be sufficient. The weaknesses in MD5 and SHA1 require a malicious adversary to expend a lot of computer effort in creating a hash collision. If you are simply trying to verify data has not been corrupted while transferred over a medium, then this is not a problem. Take for example a protocol where a data buffer is transmitted followed by a hash of that buffer. There is nothing in this protocol that prevents a malicious person interfering with the data to change it what they want. They can simply change the hash value as well as the data. The only thing the protocol is protecting is accidental corruption of the data, such as line noise. The odds of random data corruption producing an identical hash are so astronomically unlikely that it can be consider 100% guarantee of message integrity for all for all intents and purposes. In which case the best choice of hash algorithm is most likely to be the one that is most efficient and fast.

As MD5, SHA1, and SHA256 all use 32 bit operations, with each successive hash algorithm being more complicated than the preceding one, it seems likely that MD5 will be the fastest and SHA256 the slowest. SHA512 uses 64 bit operations and therefore processes twice as much data at a time. Rather than just guessing which one is faster, it is fairly easy to write a speed testing program. Remember its not just the algorithm itself, its the implementation that affects the speed. Two MD5 hash implementations can be very different in speed.

I have written a speed test for the four hashes I have now.

HashSpeedTest.zip – Full C source code and x64 binaries for OSX and Windows.

I measure the speed of the hash algorithms in two ways. First the number of hashes per second of small data (that is data that only requires on hash calculation). This measures the rate that would be used in something like the Proof of Work program. The second measurement is the amount of data per second that can be hashed on large data buffers.

Depending on the application either measurement might be more relevant. For example when hashing files, in general it will be the amount of data per second that is relevant for affecting overall speed. But in some applications, such as Bitcoin hashes, then it is the number of hashes per second that is relevant.

Here is the output running the OSX version on a 2.6 GHz Intel i7

MD5     6.498 MHashes/s   463.16 MBytes/s
SHA1    2.409 MHashes/s   338.21 MBytes/s
SHA256  2.254 MHashes/s   156.83 MBytes/s
SHA512  1.520 MHashes/s   244.93 MBytes/s

Note, this is single threaded. Depending on the application (such as proof of work in Bitcoin) the rate can be increased by the number of processors available. However for calculating the hash of a single file the calculations can not be performed in parallel as it is a serial operation.

It can be seen that MD5 is clearly the winner of speed in both counts. So if your application does not require secure hashes then MD5 could be a good choice. Remember these results are just from this implementation compiled with Xcode. While looking for MD5 algorithms I first was using a different implementation that was half the speed of this one.

Intrestingly the otimisation of the compiler plays a big role in the result. The difference between Xcode and MS Visual Studio is interesting in that it is not consistent between algorithms. Because they optimise in different ways some operations are faster in one system, but others faster in the other. Running the Windows version of the program on the same computer gave the following results:

MD5      5.995 MHashes/s    401.29 MBytes/s
SHA1     3.317 MHashes/s    390.18 MBytes/s
SHA256   2.562 MHashes/s    184.98 MBytes/s
SHA512   1.618 MHashes/s    289.01 MBytes/s

While MD5 performed worse when compiled with MS Visual Studio, SHA1 was better.

Despite it being 12 years since SHA2 came out, MD5 and SHA1 are still the main standards used. And even when SHA3 arrives in the near future (the algorithm has already been picked) its likely we’ll still see MD5 and SHA1 for quite some time in the future.

Advertisement

SHA-2

One of the goals I have with this blog is to build up a collection of public domain C libraries for various cryptographic functions. Already I have MD5 and SHA1, the next obvious one is SHA2. Once again I have made an example program and compiled for x64 on OSX and Windows that calculates SHA256 and SHA512 hashes of a string from the command line.

Sha2String.zip – Full source code and Windows and OSX x64 binaries.

SHA1 became a replacement for MD5 as MD5 had weaknesses. Unfortunately SHA1 also has weaknesses, but it took longer before they came apparent. SHA2 was designed to replace SHA1 and was created in 2001. Unfortunately SHA2 was designed by committee, and as often is the case when designing a replacement the end result is far more complicated than required to fix the original problem.

MD5 produces 128 bit hashes. SHA1 produces 160 bit hashes. It would seem logical SHA2 would be an algorithm that produced a fixed hash length, most likely 256 bit. Instead SHA2 is a set of functions producing six different hashes, with four different sizes.

SHA256 is the name given to a SHA2 function that produces a 256 bit hash. Like MD5 and SHA1 this function uses 32 bit operations and works well on a 32 bit processor. The same algorithm is used but with 64 bit operations instead of 32 bit. This is SHA512. It produces a 512 bit hash. There are a few other differences such as the number of rounds and the initial values.

In my opinion if SHA2 had to have more than one hash size it should have stopped here with two functions, a 256 bit one and a 512 bit one. Besides if you need a smaller size you can simply truncate the final output of the hash to a smaller size. However the SHA2 committee decided that two additional sizes were required, 224 bit and 384 bit. Rather than just truncating the output of the larger hashes, they also changed the initial values used in the hash calculations. So SHA224 is a truncated version of SHA256 but with different initial values and the same goes with SHA384 being a truncated version of SHA512. The different initial values don’t add anything to the strength of the hashes, they just make it impossible to create a SHA224 hash from a SHA256 hash, you have to perform the entire calculation again.

To me the 224 and 384 variations are worthless. If you have an application that absolutely needs 384 bits then use a SHA512 and truncate the result to 384 bits. Having a different official standard doesn’t offer anything useful. So now there are four SHA2 functions: SHA224, SHA256, SHA384, and SHA512. Unfortunately it gets worse, there are actually two more.

Because SHA512 works on 64 bit values, it is more efficient than SHA256 on a 64 bit processor as it works through the data in larger chunks at a time. However on a 32 bit processor the code takes about four times as long. When SHA2 was invented 64bit processors were not common place in desktop computers, however now they are standard. On a 64 bit processor a SHA512 calculation is faster than a SHA256 because of the higher data throughput. In 2012 two more functions were added to the SHA2 family: SHA-512/224 and SHA-512/256. Both these functions use the SHA-512 function, but again with different initial values, and the final output truncated to 224 or 256 bits.

As of this writing all SHA2 functions are considered secure, and not broken.

As I personally only like SHA256 and SHA512, I will ignore the other four variations and not add them to my hash library. I have written SHA256 and SHA512 as independent modules. You can take either one without requiring the other.

The functions are defined with the same prototypes as MD5 and SHA1 for easy compatibility. Once again I have produce string to hash programs. In this case two programs: Sha256String and Sha512String which can be used to verify the hash functions.

Sha2String.zip – Source code and binaries.

This is free and unencumbered software released into the public domain.