Monthly Archives: June 2013

SHA-1

MD5 is a widely used hash system; it was invented in 1991. In 1996 a flaw was found in the system making it vulnerable to certain types of attacks, although it wasn’t until 2004 that a more serious flaw was found and the system declared “broken”. SHA-1 was created as a replacement for MD5. It uses similar principles, but is a more conservative design. It outputs a 160 bit hash rather than 128 bit. Due to the flaws found in MD5, SHA-1 was promoted. Over the years weaknesses in SHA-1 have been found, however it is still better than MD5. As of current MD5 is considered completely broken, whereas SHA-1 is still reasonable against the most concentrated attacks. The best know attack requires 2^61 SHA-1 calculations. The computing resources required to perform this are still enormous.

MD5 and SHA-1 are probably the most common hashes systems in use. SHA-1 having replaced MD5 in many cases. The larger hash size alone makes it better as the chances of collisions are further reduced. SHA-1 does come at a cost though. It takes more work to calculate a SHA-1 hash than an MD5 hash.Also the output hash takes up more storage. However if only 128 bits are wanted of a hash, it is acceptable to take the first (or in fact any) 128 bits of a SHA-1 hash. This is still better than using MD5.

I wrote the Proof-Of-Work program using MD5 as it was fast. However I want to build up a library of public domain C source code of several hash functions. SHA-1 is the next obvious one. So I present here source code for SHA-1 hashes, including a program to calculate a SHA-1 hash from the command line.

Sha1String.zip – Contains full source code for SHA-1 and the program Sha1String. Includes x64 binaries for OSX and Windows.

The functions and types in LibSha1.h match the ones in LibMd5.h presented earlier. This makes it very easy to substitute one system with the other. When I add more hash algorithms in the future, they will also follow the same format.

This is free and unencumbered software released into the public domain.

 

Advertisements

Which hash should I use?

As far as secure message hashes go, MD5 and SHA1 are considered broken. MD5 is very broken, while SHA1 is quite damaged. In either case they are not good choices for secure message hashes in uses such as digital signatures. But digital signatures is only one use of hashes. There are other situations where the security implications of a hash are not important.

If, for example, you wish to use a hash to verify data transfer then in most cases any of the hash algorithms will be sufficient. The weaknesses in MD5 and SHA1 require a malicious adversary to expend a lot of computer effort in creating a hash collision. If you are simply trying to verify data has not been corrupted while transferred over a medium, then this is not a problem. Take for example a protocol where a data buffer is transmitted followed by a hash of that buffer. There is nothing in this protocol that prevents a malicious person interfering with the data to change it what they want. They can simply change the hash value as well as the data. The only thing the protocol is protecting is accidental corruption of the data, such as line noise. The odds of random data corruption producing an identical hash are so astronomically unlikely that it can be consider 100% guarantee of message integrity for all for all intents and purposes. In which case the best choice of hash algorithm is most likely to be the one that is most efficient and fast.

As MD5, SHA1, and SHA256 all use 32 bit operations, with each successive hash algorithm being more complicated than the preceding one, it seems likely that MD5 will be the fastest and SHA256 the slowest. SHA512 uses 64 bit operations and therefore processes twice as much data at a time. Rather than just guessing which one is faster, it is fairly easy to write a speed testing program. Remember its not just the algorithm itself, its the implementation that affects the speed. Two MD5 hash implementations can be very different in speed.

I have written a speed test for the four hashes I have now.

HashSpeedTest.zip – Full C source code and x64 binaries for OSX and Windows.

I measure the speed of the hash algorithms in two ways. First the number of hashes per second of small data (that is data that only requires on hash calculation). This measures the rate that would be used in something like the Proof of Work program. The second measurement is the amount of data per second that can be hashed on large data buffers.

Depending on the application either measurement might be more relevant. For example when hashing files, in general it will be the amount of data per second that is relevant for affecting overall speed. But in some applications, such as Bitcoin hashes, then it is the number of hashes per second that is relevant.

Here is the output running the OSX version on a 2.6 GHz Intel i7

MD5     6.498 MHashes/s   463.16 MBytes/s
SHA1    2.409 MHashes/s   338.21 MBytes/s
SHA256  2.254 MHashes/s   156.83 MBytes/s
SHA512  1.520 MHashes/s   244.93 MBytes/s

Note, this is single threaded. Depending on the application (such as proof of work in Bitcoin) the rate can be increased by the number of processors available. However for calculating the hash of a single file the calculations can not be performed in parallel as it is a serial operation.

It can be seen that MD5 is clearly the winner of speed in both counts. So if your application does not require secure hashes then MD5 could be a good choice. Remember these results are just from this implementation compiled with Xcode. While looking for MD5 algorithms I first was using a different implementation that was half the speed of this one.

Intrestingly the otimisation of the compiler plays a big role in the result. The difference between Xcode and MS Visual Studio is interesting in that it is not consistent between algorithms. Because they optimise in different ways some operations are faster in one system, but others faster in the other. Running the Windows version of the program on the same computer gave the following results:

MD5      5.995 MHashes/s    401.29 MBytes/s
SHA1     3.317 MHashes/s    390.18 MBytes/s
SHA256   2.562 MHashes/s    184.98 MBytes/s
SHA512   1.618 MHashes/s    289.01 MBytes/s

While MD5 performed worse when compiled with MS Visual Studio, SHA1 was better.

Despite it being 12 years since SHA2 came out, MD5 and SHA1 are still the main standards used. And even when SHA3 arrives in the near future (the algorithm has already been picked) its likely we’ll still see MD5 and SHA1 for quite some time in the future.

SHA-2

One of the goals I have with this blog is to build up a collection of public domain C libraries for various cryptographic functions. Already I have MD5 and SHA1, the next obvious one is SHA2. Once again I have made an example program and compiled for x64 on OSX and Windows that calculates SHA256 and SHA512 hashes of a string from the command line.

Sha2String.zip – Full source code and Windows and OSX x64 binaries.

SHA1 became a replacement for MD5 as MD5 had weaknesses. Unfortunately SHA1 also has weaknesses, but it took longer before they came apparent. SHA2 was designed to replace SHA1 and was created in 2001. Unfortunately SHA2 was designed by committee, and as often is the case when designing a replacement the end result is far more complicated than required to fix the original problem.

MD5 produces 128 bit hashes. SHA1 produces 160 bit hashes. It would seem logical SHA2 would be an algorithm that produced a fixed hash length, most likely 256 bit. Instead SHA2 is a set of functions producing six different hashes, with four different sizes.

SHA256 is the name given to a SHA2 function that produces a 256 bit hash. Like MD5 and SHA1 this function uses 32 bit operations and works well on a 32 bit processor. The same algorithm is used but with 64 bit operations instead of 32 bit. This is SHA512. It produces a 512 bit hash. There are a few other differences such as the number of rounds and the initial values.

In my opinion if SHA2 had to have more than one hash size it should have stopped here with two functions, a 256 bit one and a 512 bit one. Besides if you need a smaller size you can simply truncate the final output of the hash to a smaller size. However the SHA2 committee decided that two additional sizes were required, 224 bit and 384 bit. Rather than just truncating the output of the larger hashes, they also changed the initial values used in the hash calculations. So SHA224 is a truncated version of SHA256 but with different initial values and the same goes with SHA384 being a truncated version of SHA512. The different initial values don’t add anything to the strength of the hashes, they just make it impossible to create a SHA224 hash from a SHA256 hash, you have to perform the entire calculation again.

To me the 224 and 384 variations are worthless. If you have an application that absolutely needs 384 bits then use a SHA512 and truncate the result to 384 bits. Having a different official standard doesn’t offer anything useful. So now there are four SHA2 functions: SHA224, SHA256, SHA384, and SHA512. Unfortunately it gets worse, there are actually two more.

Because SHA512 works on 64 bit values, it is more efficient than SHA256 on a 64 bit processor as it works through the data in larger chunks at a time. However on a 32 bit processor the code takes about four times as long. When SHA2 was invented 64bit processors were not common place in desktop computers, however now they are standard. On a 64 bit processor a SHA512 calculation is faster than a SHA256 because of the higher data throughput. In 2012 two more functions were added to the SHA2 family: SHA-512/224 and SHA-512/256. Both these functions use the SHA-512 function, but again with different initial values, and the final output truncated to 224 or 256 bits.

As of this writing all SHA2 functions are considered secure, and not broken.

As I personally only like SHA256 and SHA512, I will ignore the other four variations and not add them to my hash library. I have written SHA256 and SHA512 as independent modules. You can take either one without requiring the other.

The functions are defined with the same prototypes as MD5 and SHA1 for easy compatibility. Once again I have produce string to hash programs. In this case two programs: Sha256String and Sha512String which can be used to verify the hash functions.

Sha2String.zip – Source code and binaries.

This is free and unencumbered software released into the public domain.

Threading and synchronisation

Not that many years ago most desktop computers had single core processors. This means the processor can execute just one instruction at a time. Of course operating systems have been multi-threaded for many years to allow users to run more than one program at once. A lot of programs written in Windows were multi threaded, as it is a convenient method for programming interactive GUI applications. However if you were writing a program to perform long calculations, you could quite reasonably write it as a single threaded application. After all you will get close to 100% CPU usage out of your single thread.

Most computers these days have multi-core proce parallized ssors, or multiple processors. This means if you run your same single threaded application that used to take 100% on a uniprocessor system, it will now only get a fraction of the total computing power. On an 8 core system a single thread can only get 12.5% of the CPU. In order to get full performance you need as many threads as there are cores or processors. You will also need to have a task that can be broken up into parallel components. For example calculating a MD5 of a single file is a serial task. You can not calculate the next part of the hash until you have performed the calculation of the previous part. If instead your task was to calculate the MD5 sums of each block of the file, then this is can easily be performed in parallel. As each block calculation is independent of each other.

If two tasks are entirely independent of each other they can be performed in separate threads with no interaction between them. In general though tasks will be related and at some points need to interact with each other. Some data may need to be shared between the threads. At this point synchronisation becomes important. With a single threaded program you know only one thing is changing any data at any one time. With multiple threads working on the same data set then one thread can corrupt the results of another thread unless care is taken to keep them synchronised.

The simplest form of thread synchronisation is the spinlock. A spinlock is a shared data value that can be compared and updated with an atomic instruction. An atomic instruction is one that has on intermediate states that another thread could see. The common atomic instruction used to implement spinlocks is the compare and exchange instruction. The x86 family of processors has the cmpxchg instruction, which when used with the lock prefix becomes an atomic instruction. This means while one thread is performing this instruction, no other thread will be see of change the value of the spinlock.

C does not provide any threading or synchronisation systems as part of the language. In unix base systems there is pthreads which implement threads and synchronisation methods, while Windows has its own system. I wrote a very simple module that provides basic threads and spin locks that works on Windows and OSX.

The spinlock is defined as typedef volatile long THREADSPINLOCK; 

The volatile keyword instructs the C compiler not to make any optimisation assumptions about the state of the variable, and to check it every time it is requested.

A spinlock is created an initialised simply by declaring the value and assigning it 0. A spinlock will usually be a global value as it has to be seen by more than one thread. I have one function for acquiring the spinlock.

////////////////////////////////////////////////////////////////////////////////
//  ThreadSpinLockAcquire
//
//  Acquires the thread spinlock. This will wait (spin) until the lock is
//  acquired. Use this to protect a resource from multi thread access. Do not
//  hold the spinlock for long periods of time.
////////////////////////////////////////////////////////////////////////////////
void
    ThreadSpinLockAcquire
    (
        THREADSPINLOCK*     SpinLockPtr
    )
{
#if defined( _WIN32 )
    while( InterlockedCompareExchange( SpinLockPtr, 1, 0 ) )
    {
        Sleep( 0 );
    }
#elif defined( __APPLE__ )
    while( !__sync_bool_compare_and_swap( SpinLockPtr, 0, 1 ) )
    {
        pthread_yield_np( );
    }
#endif
}

In Windows I use the InterlockedCompareExchange function, while in OSX I use __sync_bool_compare_and_swap. These functions perform the same task, though inconveniently take the parameters in a different order.

The spinlock acquire function runs in a loop. In an atomic operation it compares the value of the spinlock with 0 and if it is 0 it changes the value to 1, otherwise it leaves it alone. If the value was 0, it is now 1 and the loop exits. If the value is 1 to start with the loop then yields execution for the thread and then tries again. It keeps on trying until the value becomes 0, then it “acquires” it by setting it 1 and leaving the loop.

Whenever a thread returns from the ThreadSpinLockAcquire function it is the only thread that currently posses the spinlock. All other threads attempting to acquire it will spin until it is released.

The release function is simply setting the value back to zero

//////////////////////////////////////////////////////////////////////////////////
//  ThreadSpinLockRelease
//
//  Releases the thread spinlock. Do not call this unless you have acquired the
//  spinlock.
//////////////////////////////////////////////////////////////////////////////////
void
    ThreadSpinLockRelease
    (
        THREADSPINLOCK*     SpinLockPtr
    )
{
    *SpinLockPtr = 0;
}

The spinlock mechanism is very simple to implement, it also does not require code to initialise unlike most synchronisation mechanisms. It is however inefficient. While other threads are waiting to acquire the spinlock they “spin”. Meaning they are constantly running and checking the value. To improve performance we put a thread yield instruction in there. When there are multiple threads waiting on a spinlock, it is just luck which thread will get it once the lock has been released. OS provided synchronisation methods will generally have a better scheduling system to determine who gets the lock next. Also the threads don’t get scheduled in until they are ready to receive the lock. So for highly contented locks, a system provided mechanism such as a Mutex should be used. But the spinlock is simpler and if used right can be sufficient.

The higher the contention of a lock the more inefficient it becomes. If you have 8 threads spending most of their time waiting on a lock, then you haven’t gained much with multiprocessing. Ideally a lock is rarely contended, which means thread execution is not interrupted. A spinlock should be held for the shortest amount of time possible. Perform all the operations you can before acquiring it. For example if you were entering an element into a linked list. It is best to allocate the new element first, fill in all its entries. Then acquire the spinlock, attach the element to the global linked list, and then release. Don’t perform the allocation and the setting of the entries within the spinlock as that is holding up other threads unnecessarily.

A C program starts (usually in its main function) running one thread. In order to become multithreaded you have to create more threads. This is is platform specific. Unix like platforms use pthreads, Windows uses its own thing (CreateThread). There are a lot of functions relating to threads. Once you create one, you can monitor them, wait for them to close etc. For my task I just simple want to create threads and let them run until the program ends. I wrote a simple wrapped for Window’s CreateThread and pthread’s pthread_create

////////////////////////////////////////////////////////////////////////////////
//  ThreadLaunchNewThread
//
//  Launches a new thread. Passes the pointer Context through to the new thread
//  function. Returns zero value on failure, or 1 if success
////////////////////////////////////////////////////////////////////////////////
int
    ThreadLaunchNewThread
    (
        ThreadStartFunction     StartFunction,
        void*                   Context
    )
{
#if defined( _WIN32 )
    HANDLE                          threadHandle = 0;
#elif defined( __APPLE__ )
    pthread_t                       threadHandle = 0;
    pthread_attr_t                  attributes;
    int                             errors;
#endif
    uint32_t                        success = 0;
    InternalThreadStartParameters   startParams = {0};

    startParams.StartFunction = StartFunction;
    startParams.Context = Context;
    ThreadSpinLockAcquire( &startParams.StartSpinLock );

#if defined( _WIN32 )
    threadHandle = CreateThread(
        NULL,
        0,
        InternalThreadStart,
        &startParams,
        0,
        NULL );
    if( NULL != threadHandle )
#elif defined( __APPLE__ )
    pthread_attr_init( &attributes );
    errors = pthread_create(
        &threadHandle,
        &attributes,
        InternalThreadStart,
        &startParams );
    if( 0 == errors )
#endif
    {
        // The thread will release the spinlock, so attempt to reacquire it. This will mean we
        // will be in sync.
        ThreadSpinLockAcquire( &startParams.StartSpinLock );

        // Close the handle as we do not need it anymore. Thread has been created
#if defined( _WIN32 )
        CloseHandle( threadHandle );
#elif defined( __APPLE__ )
        pthread_attr_destroy( &attributes );
#endif
        success = 1;
    }
    else
    {
        // Failed to create thread
        success = 0;
    }

    return success;
}

////////////////////////////////////////////////////////////////////////////////
//  InternalThreadStart
//
//  This is the function stub that is called by The windows CreateThread
//  function. This will launch our thread.
////////////////////////////////////////////////////////////////////////////////
#if defined( _WIN32 )
static
DWORD
WINAPI
    InternalThreadStart
    (
        LPVOID          Parameter
    )
#elif defined( __APPLE__ )
static
void*
    InternalThreadStart
    (
        void*            Parameter
    )
#endif
{
    InternalThreadStartParameters*      parameters
        = (InternalThreadStartParameters*)Parameter;
    ThreadStartFunction                 startFunction = 0;
    void*                               context = 0;

    startFunction = parameters->StartFunction;
    context = parameters->Context;

    // Release spinlock, After this we can no longer access parameters.
    ThreadSpinLockRelease( &parameters->StartSpinLock );

    // Call start function
    startFunction( context );

    // Thread ends    
    return 0;
}

This function ThreadLaunchNewThread takes two parameters. The first is the thread function you want the thread to start with. The second is an arbitrary pointer that will be passed to the thread function. The thread function has a simple prototype:

typedef
void
    (*ThreadStartFunction)
    (
        void*       Context
    );

The thread will run until this function returns.

I now have all the components I need for my article on Proof-of-Work. I added just one more function to the thread module. A function that lowers the priority of the program. I want my multithreaded program to use the maximum performance of the computer. I don’t however want to disable my computer from any other use. So I want to set the priority low so that all other programs get priority over it. That way it just uses up the CPU cycles that are otherwise not being used, rather than hogging the CPU itself.

//////////////////////////////////////////////////////////////////////////////////
//  ThreadSetProcessPriorityToBackground
//
//  Sets the process priority to Background mode
//////////////////////////////////////////////////////////////////////////////////
void
    ThreadSetProcessPriorityToBackground
    (
        void
    )
{
#if defined( _WIN32 )
    // Set priority of process to background
    SetPriorityClass( GetCurrentProcess(), PROCESS_MODE_BACKGROUND_BEGIN );
#elif defined( __APPLE__ )
    setpriority( PRIO_PROCESS, 0, 10 );
#endif
}

This function will lower the priority of the the process. In Windows it uses a PROCESS_MODE_BACKGROUND_BEGIN which is only available since Vista. For XP and earlier you can use BELOW_NORMAL_PRIORITY_CLASS or IDLE_PRIORITY_CLASS. In OSX this applies “nice 10″ to the process.

In the next article I will put this together with the MD5 and RC4 routines to create my Proof-of-Work example program.

 

Proof of Work

This article is about the concept of Proof of Work. I have written a program in C demonstrating the concept. It is downloadable here: ProofOfWork.zip

The term Proof of Work in computing is an interesting concept. It means producing some output that other people can verify would “have taken a certain amount of work” to have accomplished. The important part is that the verification is easy, while the actual work is hard.

There are many computational things that take a long time to perform, but easier to verify the result. For example finding a prime number. It is harder to find a prime number than it is to verify that a given number is prime. So a proof of work system could be worked using this. The required work could be to find a random prime number greater than some large threshold. In order to “prove” you have performed the work you present the prime number. I can verify that it is a prime number without as much work as it took to find it. Therefore “proving” that you probably spent a measurable amount of work finding it. This would not make a very good scheme, for one the threshold would have to be changed all the time to avoid repeats, also verifying a number is prime is still relatively expensive and becomes more difficult as the numbers get larger.

A better proof of work system is one involving hash calculations. The nature of a good hash algorithm is that it is impractical to influence the output of the hash value. Basically any slight change in the input will cause a completely random-like hash value. A hash value can be considered a large integer value. For example MD5 produces 128 bit hashes, while SHA2 typically produces 256 or 512 bit hashes. For the remainder of the article I am going to use MD5, but note MD5 is NOT a good hash algorithm. However it is fast and still demonstrates my point. Though for an actual system, the hash would have to be replaced with a more secure hash such as SHA2.

A hash looks like a random number. If we hash any arbitrary data  we will get a 128 bit number that appears to be random. If we change the original data a bit and rehash we will get another 128 bit random looking number. Assuming the hash is good (MD5 isn’t, but lets ignore that for the moment) there is no way I can manipulate the data in order to produce a hash value that I want. Basically any change I make will cause a new random hash. Pretty much like rolling a 2^128 sided die.

In the case of a regular six sided die, fair throwing of the die will produce a random value from 1 to 6. If the “desired” outcome of a die throw is a 6, then I have to keep throwing the die until I get a 6. I might get it on the first outcome, I might have to throw it 1000 times before I get a 6. But probability theory suggests that there is a bit over 50% chance that I will get it will get it within 3 throws. Taken another way, if we have thousands of people in a room who have to roll a die until it is a 6, and then report the number of throws they did, then we will see a nice statistical distribution of the numbers of throws required. About 1/6 of the people will throw a 6 on the first try. The remaining 5/6 people will have to throw it more times, out of which 1/6 will get it in there next try, this is 5/36 (or 13.9%) of all the people who took exactly two throws. The following table shows the proportions of people getting the throw after n throws, and also lists the accumulation (ie people who got it within n throws)

Throws     % On this throw     % By this throw
1          16.7%               16.7%
2          13.9%               30.6%
3          11.6%               42.1%
4          9.6%                51.8%
5          8.0%                59.8%
6          6.7%                66.5%
7          5.6%                72.1%
8          4.7%                76.7%
9          3.9%                80.6%
10         3.2%                83.8%

As a rather contrived example, lets make a proof of work system based on rolling a die and trying to get a six. The “work” is rolling the die until it is a six. The “verification” is looking at the number on the die and seeing that it is a six. If someone presents a die rolled as 6 I can verify it easily without much work, I can also make some assumptions based on probability of how much work was required to get that. Although you may have only had to roll it once, it is more likely that you had to roll it more times. There is an almost 70% chance that it took you at least three throws to achieve it. Obviously the die rolling scheme is not particularly good as the verification system is rather flawed. Unless I actually watched you throw the die, I can’t tell that you didn’t just present me a die with the six side up and didn’t throw it at all. However in the case of our 2^128 sided die (aka the hash of a random value) I don’t need to watch the generation, as I can simply verify it from the original input.

In the case of the hash proof of work we can make a system similar to the die roll, but instead of rolling a die cube, a random value is hashed and the result of the hash is used as the die value. Instead of just six outcomes there are 2^128 outcomes (That is 340282366920938463463374607431768211456 which is a lot). Its rather unreasonable to specify a single value as the required result as it is incredibly unlikely that anyone would ever get it within the life of the universe. However we can set an arbitrary threshold. We specify a number that the hash must be greater than. For example we could set a threshold being a number where the first 40 bits must be all 1s (or 10 Fs when looking at it in hexadecimal). The work required is to keep on hashing random data until a hash is produced that is larger than this threshold (ie starts with 10 Fs). I can verify with one hash operation that the data you claim achieves this hash does in fact do so. I can also be relatively confident that it took a while to generate, as there would have had to be a great deal of hashes generated in order to find on that reached the threshold.

50% of all hashes start with the first bit being 1. 25% of all hashes start with the first 2 bits being 1, and 12.5% have the first 3. Just like the die roll calculation on the six sided die we can do the same for the hash outputs. Of course you could have come up with the hash that reached the threshold on the very first try, but the odds are against you. If you produce some data that has a hash value of the first 40 bits being 1s then I can feel reasonably confident that it probably took you 10s or 100s of billions of hash values in order to find that one. For example the MD5 hash of the string Xnpbsz2zBw  is fffffffff62bf72e14b9ee1b8b923835. This has the first 36 bits all 1s. It took my computer over 10 billion attempts before reaching that threshold, however it can be verified by a single hash operation now by anyone else.

The Bitcoin system uses a similar system of proof of work in order to control the generation of bitcoins. Bitcoin uses 256 bit SHA2 for its hash function, and the threshold is a value that the hash must be lower than. In other words starting with several 0s. As of the time of writing the current bitcoin threshold requires a hash with at least the first 55 bits zero.

Of course just having the requirement that any random data produces a hash that meets the threshold is not good enough for a proof of work system, as there is nothing that proves when the data was generated, or if its been used before. In the case of bitcoin, it is the current transaction block that must be hashed along with some random data. This means that once that block has been done, there is no use trying to reuse that value.

There is a system called HashCash which was devised as a method to slow spam. The concept was that each email sent would require a hashcash token which proves a certain amount of computer work was performed. The threshold was low enough so that it wouldn’t disrupt normal users, but would be a hinderance to spammers attempting to send millions of emails at once. Anti spam filters could verify the hashcash was good with just one single SHA1 calculation.

The whole concept of Proof-of-work fascinates me, and so I wrote a program to demonstrate it. This program requires the components I have talked about in the previous articles. Namely MD5 for the Hashing, RC4 for Random number generation, and Threading in order to get maximum performance out of a multi core CPU.

ProofOfWork.zip contains the full source code and also a Windows x64 executable and an OSX x64 executable for generating proof of work hashes.

The program is simple to operate

Syntax:
    ProofOfWork: <string> <numCharsAdded> <numThreads>

The first parameter is the base string. The second parameter specifies the number of characters that will be added to the base string to feed into the hash function. And the third parameters specifies how many threads will be launched to perform the calculations. To get maximum CPU usage, the number of threads should equal the number of CPU cores on the system.

Each thread seeds a RC4 output stream with some time values and thread addresses. The RC4 streams are used to produce random values that get appended to the string. The values are restricted to a 55 character alphabet, containing letters and numbers without certain confusing characters such as 0 and o and 1 and l etc. The program maintains two thresholds, a current lowest hash and a current highest hash. Everytime a hash is generated that reaches a threshold, the hash is printed and that becomes the new threshold.

Here is some sample output from the program

>ProofOfWork waterjuice.org: 8 8
waterjuice.org:vWJz6CBN  000000e099ef29e46a15bf1e9734ebc3
  Is Smaller (24.19)  0.036 GHashes  in 00:00:02 (h:m:s)
waterjuice.org:eE2yDHW6  0000009f3f1f7cc5b2038bbfc9ca5dec
  Is Smaller (24.68)  0.040 GHashes  in 00:00:02 (h:m:s)
waterjuice.org:cz4J7ams  ffffff1af55a568e5287ca52d5d007af
  Is Larger  (24.16)  0.054 GHashes  in 00:00:03 (h:m:s)
waterjuice.org:24874LFj  fffffffe5408c859e30a1f5e9b6b8ca4
  Is Larger  (31.26)  0.062 GHashes  in 00:00:04 (h:m:s)
waterjuice.org:dczbepJq  0000001a57ec47e8433f0042d1dedad7
  Is Smaller (27.28)  0.088 GHashes  in 00:00:06 (h:m:s)
waterjuice.org:L7tP3Cdk  000000096a5572842de52782e4b78ff1 
  Is Smaller (28.76)  0.420 GHashes  in 00:00:42 (h:m:s)
waterjuice.org:fmsWH6fL  000000030ae1ecdc25feafeb2cc2de53
  Is Smaller (30.39)  0.642 GHashes  in 00:01:05 (h:m:s)
waterjuice.org:BkB88FY6  00000002517de480c6b9c095027d18eb
  Is Smaller (30.79)  1.265 GHashes  in 00:02:11 (h:m:s)
waterjuice.org:H42UeFbU  ffffffffab3ec2782e3918082b39f579
  Is Larger  (33.59)  1.523 GHashes  in 00:02:38 (h:m:s)

Each line displays the text used to create the hash followed by the MD5 hash. It indicates if the hash was smaller or larger than the current threshold. It displays the number of GHashes (Billions of hash calculations) performed before the hash was found, along with the time taken. The program quickly puts out the first results, but as the thresholds get harder the time taken takes much longer.

The value in brackets is a measure of difficulty. The integer part of that value indicates how many of the first bits are all 0s or all 1s.

I have also experimented with various other thresholds besides just having a “large” or “small” hash value. For example a hash with the fewest or greatest number of 1 bits (ie the hamming weight of the hash). Or a hash with only numeral digits or only letter (a-f) digits.

I found the whole exercice a lot of fun, and maybe someone else will too. I have put my program and all its source code in the public domain. Feel free to use it how you wish and experiment with other it.

Download source and binaries: ProofOfWork.zip

This is free and unencumbered software released into the public domain. 

 

RC4

In 1985 I got a book out from the school library. I can’t remember the title, but it was something to do with code making. I remember it talked about Pigpen cipher which interested me. Ever since reading that book I have been fascinated with all things cryptographic. In 2001 I read a book which is my favourite of all time “Applied Cryptography Second Edition” by Bruce Schneier. Prior to reading this I had done what many people thing is an easy task, invented my own cipher system. I later discovered how rubbish this system was. In fact there is something like 10 standard terrible systems that almost everyone who thinks cryptography is easy ends up “inventing”. This book explained a very important truth: cryptography is hard.

Its very easy to assume that its easy to make a good crypt system, after all it appears your algorithm completely scrambles up the original message, so how could anyone ever hope to recover it? Well Arthur Scherbius obviously thought this when he created the Enigma machine, but history tells us how that was broken and Winston Churchill told King George VI that it was due to the success of breaking Enigma that World War 2 was won. The lesson to be learnt is that there are some incredibly smart people who study these things and can find weaknesses. Therefore unless you are equally as smart, you probably shouldn’t try inventing your own. Of course it depends what you want the cryptography for. If its just for fun, then of course feel free to make up your own system. Just make sure it doesn’t end up being used to “secure” some important system in the future.

A lot of cipher systems have quite complex code to look at. In fact, not only is it recommended not to invent your own cipher, it is usually recommended against creating your own implementation of an existing cipher, because it can be easy to make a mistake. However one cipher stands out as an exception to this: RC4.

RC4 was created by Ron Rivest in 1987. It was originally a trade secret, back in the days when people thought obscurity made things more secure. It was leaked in 1994, and has become common place in systems throughout the world. The beauty of RC4 is its simplicity. The code is incredibly small, and yet it produces a very good pseudo random number stream. Because it is some small, its fairly hard to mess up when implementing, and its not hard to test the output anyway against some known values.

RC4 is known as a stream-cipher, or otherwise a pseudo-random generation algorithm (PRGA). stream ciphers have a simple mode of operation. You initialise a generator with a key, then you can output as many bytes of the stream as you wish. A stream cipher does not actually encrypt anything on its own, it simply outputs a stream of bytes. Typically this stream of bytes is XORed onto the plain text to produce the cipher text. As many bytes as there are in the plain text is produced by the stream. Basically you are using the stream to create a one-time pad, which is applied to the plain text.

RC4 has always been my favourite cipher because of its simplicity. It is also very fast. Unfortunately it tends to have a bad reputation as being a broken cipher. This is quite unfair. RC4 is still a good system. It has a few weaknesses that are easily overcome. Other than that it just has to be used correctly. The inventors of WEP encryption for WiFi have caused the most damage to RC4′s name. The significant thing about RC4 is that it is a stream cipher, which is essentially a tool for generating one-time pads. The critical thing about one-time pads is that you only ever use them once. In the case of WEP they effectively use the same pads over and over again.

The poor use of RC4 in WEP is not a weakness in RC4, and would have applied with any stream cipher being used in the same manner. RC4 does have some intrinsic faults of its own though. In particular the first few bytes of the stream are slightly biased. This is easily overcome by discarding the first few bytes of the stream after initialising it. This system is known as RC4-dropN. Where N is the number of bytes to discard. RC4-drop256 is a commonly recommended value. Also RC4 output can be distinguished from random after about 1 GByte. If using RC4 to encrypt long streams, it would be good to rekey the system after about 1 Gbyte of output.

The key schedule of RC4 is not as strong as it could be, in particular related keys are not great for use. So implementing a system where you have a fixed portion of key, and then append on an index value for the complete key is not a good idea. The solution to this is don’t do that. If you need a system with has related keys, then perform a hash of the key first then feed it into the RC4 generator.

So in summary, if you follow the following rules, then RC4 can be good for you.

  1. Never use the same stream (ie generated from same key) more than once. EVER.
  2. Discard at least the first 256 bytes after generating the stream.
  3. Do not use the key for a stream more than 1Gbyte of output.
  4. Do not use related keys. Preferably hash the key first.

Rc4Output

I have written a module in C that initialises an RC4 stream and outputs the stream bytes. It consists of two functions: Rc4Initialise and Rc4Output, and it uses a context type calledRc4Context.

////////////////////////////////////////////////////////////////////////////////
//  Rc4Initialise
//  
//  Initialises an RC4 cipher and discards the specified number of first bytes.
////////////////////////////////////////////////////////////////////////////////
void
    Rc4Initialise
    (
        Rc4Context*     Context,
        void*           Key,
        uint32_t        KeySize,
        uint32_t        DropN
    );

////////////////////////////////////////////////////////////////////////////////
//  Rc4Output
//  
//  Outputs the requested number of bytes from the RC4 stream
////////////////////////////////////////////////////////////////////////////////
void
    Rc4Output
    (
        Rc4Context*     Context,
        void*           Buffer,
        uint32_t        Size
    );

I have a sample program that uses it called Rc4Output which I have written for Windows and OSX. This is a simple command line tool that outputs (in hex) a specified number of bytes of an RC4-dropN stream that has been initialised with the provided ascii key. This can be used to verify the output against known values.

The following example outputs the first 16 bytes of an RC4 stream (with no drop) keyed with the ascii value “waterjuice.org”

>Rc4Output.exe waterjuice.org 16 0
3a32db5844a2bc37491a6108aec488e8

C source code and binaries for RC4

Rc4Output.zip contains the source code for the above program, including the RC4 module. This contains workspace for Visual Studio 10 for Windows, and Xcode 4.6 for OSX.

This is free and unencumbered software released into the public domain.

 

Public Domain MD5 C Source code

In this article I will present complete code for performing a MD5 hash of a string provided on a command line. For the Proof of Work project I need an MD5 hash library. I could use any hash, but chose to use MD5 for its simplicity and speed.

Full source code and Windows x64 and OSX x64 binaries are provided: Md5String.zip

The MD5 code is in a file called LibMd5.c, and it along with its header LibMd5.h contain all the code required to perform MD5 hash calculations. There are three functions. The header file is as follows, and describes the operation.

//////////////////////////////////////////////////////////////////////////////////
//  LibMd5
//  
//  Implementation of MD5 hash function. Originally written by Alexander Peslyak.
//  Modified by WaterJuice retaining 
//  Public Domain license.
//  
//  License: Entered into the Public Domain by WaterJuice - June 2013
//////////////////////////////////////////////////////////////////////////////////

#ifndef _LibMd5_h_
#define _LibMd5_h_

//////////////////////////////////////////////////////////////////////////////////
//  IMPORTS
//////////////////////////////////////////////////////////////////////////////////

#include <stdint.h>
#include <stdio.h>

//////////////////////////////////////////////////////////////////////////////////
//  TYPES
//////////////////////////////////////////////////////////////////////////////////

// Md5Context - This must be initialised using Md5Initialised. Do not modify the 
// contents of this structure directly.
typedef struct 
{
    uint32_t     lo;
    uint32_t     hi;
    uint32_t     a;
    uint32_t     b;
    uint32_t     c;
    uint32_t     d;
    uint8_t      buffer[64];
    uint32_t     block[16];
} Md5Context;

#define MD5_HASH_SIZE           ( 128 / 8 )

typedef struct
{
    uint8_t      bytes [MD5_HASH_SIZE];
} MD5_HASH;

//////////////////////////////////////////////////////////////////////////////////
//  PUBLIC FUNCTIONS
//////////////////////////////////////////////////////////////////////////////////

//////////////////////////////////////////////////////////////////////////////////
//  Md5Initialise
//  
//  Initialises an MD5 Context. Use this to initialise/reset a context.
//////////////////////////////////////////////////////////////////////////////////
void 
    Md5Initialise
    (
        Md5Context*     Context
    );

//////////////////////////////////////////////////////////////////////////////////
//  Md5Update
//
//  Adds data to the MD5 context. This will process the data and update the 
//  internal state of the context. Keep on calling this function until all the
//  data has been added. Then call Md5Finalise to calculate the 
//////////////////////////////////////////////////////////////////////////////////
void 
    Md5Update
    (
        Md5Context*         Context,
        void*               Buffer, 
        size_t              BufferSize
    );

//////////////////////////////////////////////////////////////////////////////////
//  Md5Finalise
//
//  Performs the final calculation of the hash and returns the digest (16 byte 
//  buffer containing 128bit hash). After calling this, Md5Initialised must be 
//  used to reuse the context.
//////////////////////////////////////////////////////////////////////////////////
void 
    Md5Finalise
    (
        Md5Context*         Context,
        MD5_HASH*           Digest
    );

//////////////////////////////////////////////////////////////////////////////////
#endif //_LibMd5_h_

Md5String

The project Md5String is a simple command line program that takes a string on the command line and calculates the MD5 hash and prints it out in hex. This demonstrates how to use LibMd5

int
    main
    (
        int             ArgC,
        char**          ArgV
    )
{
    char*           string;
    Md5Context      md5Context;
    MD5_HASH        md5Hash;
    uint16_t        i;

    if( 2 != ArgC )
    {
        printf( 
            "Syntax\n"
            "   Md5String \n" );
        return 1;
    }

    string = ArgV[1];

    Md5Initialise( &md5Context );
    Md5Update( &md5Context, string, strlen(string) );
    Md5Finalise( &md5Context, &md5Hash );

    for( i=0; i<sizeof(md5Hash); i++ )
    {
        printf( "%2.2x", md5Hash.bytes[i] );
    }
    printf( "\n" );

    return 0;
}

I have compiled a Windows x64 binary and an OSX x64 binary which I am including with the source code. Note to provide a string with spaces you need to surround the string with quotes.

At the end of this article is the link to the source code and binaries.

Testing MD5

When using any cryptographic functions it is important to test them so you can be confident they actaully work. I have a test project that tests the MD5 hashes against a set of known test vectors. I have not provided it here though as that would not be sufficient to provide assurance. I may have put the wrong hashes into the test code, so how would you know they are actaully correct. So instead you should test it yourself. You can use Md5String itself to test it (as long as its just ascii strings you provide).

There are various sources of test vectors. NIST provide a small set of informal MD5 test vectors at http://www.nsrl.nist.gov/testdata/. RFC1321 is the official description of MD5 and contains some test vectors at the end. These can be found at http://tools.ietf.org/html/rfc1321

Download

Md5String.zip – This contains the source code of Md5String. This can be compiled as is with Visual Studio 2010, and Xcode 4.6. Additionally a precompiled x64 binaries are included for both platforms. Note The Windows executable is compiled for Windows Vista and above, while the OSX ones is compiled for Lion (10.7) and above.

License

All source code presented in WaterJuice is public domain. Either written by myself, or taken from other public domain sources. I dislike the numerous licenses that are attached to most open source projects. I could sum up the license I wish in one simple sentence “This is public domain, do what ever you want with it, don’t blame me”. Unforunately the world is run by lawyers and lawsuits, so to avoid any complications I release this to public domain using the “unlicense” legalise

This is free and unencumbered software released into the public domain.

Anyone is free to copy, modify, publish, use, compile, sell, or distribute this software, either in source code form or as a compiled binary, for any purpose, commercial or non-commercial, and by any means.

In jurisdictions that recognize copyright laws, the author or authors of this software dedicate any and all copyright interest in the software to the public domain. We make this dedication for the benefit of the public at large and to the detriment of our heirs and successors. We intend this dedication to be an overt act of relinquishment in perpetuity of all present and future rights to this software under copyright law.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

For more information, please refer to http://unlicense.org/