Day: June 16, 2013

Type 3 and 5 GUIDs

This article talks about GUIDs, and in particular type/version 3 and 5 GUIDs (or UUIDs).

UrlGuid.zip – Public domain C source code for generating type/version 3 and 5 GUIDs. Includes x64 executables for OSX and Windows. This is free and unencumbered software released into the public domain.

I love the concept of GUIDs (or UUIDs). They are 128 bit IDs in a standardised format that are unique across all namespaces. It is somewhat unclear as to whether the name is actually GUID (Globally Unique Identifier) or UUID (Universally Unique Identifier). The best explanation of the difference is that Microsoft use the term GUID, and generally UUID used else where. But they both refer to the format as defined by RFC4122 (“A Universally Unique IDentifier (UUID) URN Namespace“). Microsofts main contribution seems to have been to add the curly braces around the string that is commonly seen. I am going to use the term GUID from now on.

Most people have probably seen a GUID, they look like this: {855273c6-bca6-439e-8880-7a875c4394ed}. Sometimes they don’t have curly braces, and sometimes they are uppercase, and occasionally mixed case. In all cases they are just a string representing a 128 bit number in big endian format. Due to the fact it was designed by committee it has a lot more irrelevant complexity than is required. They could have simply displayed the GUID as one long hex string, instead they decided to break it up into groups of 4, 2, 2, 2, and 6 bytes. RFC4122 defines the 128 bit structure into various elements. However the fields really only make sense in a type 1 GUID. For the most part no one cares about any of the fields and it should be treated as one single 128 bit number.

The interesting thing about the GUID is that every GUID created should be unique without requiring a central registration authority. It gets this property mainly from the fact that 128 bits make up a large space and that if you randomly pick a 128 bit number it is incredibly unlikely anyone else will pick the same. The most common GUIDs seen are type (or version) 4 GUIDs. 122 bits of a type 4 GUID are purely randomly generated. There are 4 bits set as the type number, and another 2 bits set as the RFC4122 variant. The version/type number of every GUID can easily be seen, it is the 13th hex digit (or the 1st digit in the 3rd group). As you can see from the example GUID I showed before {855273c6-bca6-439e-8880-7a875c4394ed}, the version value is 4.

Type 1 GUIDs, identified by a 1 value in the 13th digit, have a more structured format. Rather than use pure random to hope that the same GUID is not generated twice, a more defined procedure is used. The last block of 12 digits is set as the 48 bit MAC address of the computer that generated the GUID. As MAC addresses are supposed to be unique, this partitions the space into unique groups for each computer. Additionally a high precision time value is used to partition the space based on time. Then there are some random bits as well as sequence values to ensure the same computer doesn’t accidentally generate the same GUID twice even if its clock gets messed up.

While Type 1 GUIDs use a good scheme to avoid collisions, they are unpopular because they give away information. In particular they give away when the GUID was generated and, more significantly, the MAC address of the computer that generated it. In fact it was the use of Type 1 GUIDs in Microsoft Office documents that helped lead to the arrest of the “Melissa Worm” author over a decade ago. That was a high profile case and made people very uneasy about the use of GUIDs which could track people. For example I may use my computer for two distinctively different roles and not wish to have things produced for one role in any way relatable to the second role.

After the Melissa Worm incident MS changed to using Type 4 GUIDs. As these are completely randomly generated, it can not be determined if two GUIDs were created by the same computer or person.

The most common use of GUIDs is to have a unique ID for some new item just created. It could be a database entry, it could be a ticket number, or anything. For this use a Type 1 or Type 4 GUID is perfect, and almost all the literature on the Internet refers only two these two types. However there are cases where you want a unique, but repeatable, identifier for an object. For example URLs.

A URL is a unique Id. However because its size is variable, it may not be convenient to use as an UD. The obvious thing to do is to take a hash of the URL and use that as an ID value, now you have a fixed size ID. There are two versions of GUIDs that do exactly this. The advantage of using these over straight hashes is that it remains in the same namespace as the rest of GUIDs, while guaranteeing not to clash with Type 1 and 4 ones. The two versions are 3 and 5.

To create a version 3 or 5 GUID you first need a namespace ID. This is a GUID itself. You can generate your own GUID to represent your name space. This is to prevent items in different name spaces but having the same name generating the same GUID. For example I might want a namespace which has colours as the values. eg my name space consists values from the set { red, blue, green, yellow }. Another name space may be that of all English words, which clearly would contain the values in my colour set. The same GUID should only be generated for “red” when used in the same namespace.

The 128 bits of the name space ID is prepended to the named value (eg “red”) and either an MD5 or a SHA1 hash performed. Note the named value does not have to be text. It is whatever the name space ID represents, it can be of any form including a hash itself, or a GUID. The output of the hash is used mostly as the GUID. However only 122 bits of the GUID value can be arbitrarily set. The 13th hex digit of the hash is replaced by the version/type value. If the hash is MD5 then the value 3 is used, if the hash is SHA1 then the value 5 is used. Additionally the top two bits of the 17th hex digit (1st digit in 4th group) are set to 10 (in binary). This defines the “variant” of the GUID as RFC4122 compliant. This means that the 17th hex digit of any RFC4122 GUID (ie almost all of them) will only have 4 possible values: 8, 9, a, b.

RFC4122 defines four standard name space IDs.

NameSpace_DNS = {6ba7b810-9dad-11d1-80b4-00c04fd430c8}
NameSpace_URL = {6ba7b811-9dad-11d1-80b4-00c04fd430c8}
NameSpace_OID = {6ba7b812-9dad-11d1-80b4-00c04fd430c8}
NameSpace_X500 = {6ba7b814-9dad-11d1-80b4-00c04fd430c8}

So for example, to make a type 3 GUID from the dns name waterjuice.org, the 128 bit value of NameSpace_DNS (in big endian) is prepended to the ascii string “waterjuice.org” and then a MD5 hash calculation made. This results in the MD5 hash: bed9e1fe7b92c85c82aa104c8c45bccf. The 13th digit is replaced with a 3, and the top 2 bits of the 17th digit are set to 10 binary (In this case those bits already were set that way). The final result, displayed as a GUID, is:

{bed9e1fe-7b92-385c-82aa-104c8c45bccf}

Using SHA1 instead of MD5 the following type 5 GUID represents waterjuice.org:

{c1a7f72b-f2ee-5455-990c-308ae1450a0e}

Unfortunately RFC4122 contains an error in the example it provides for creating a type 3 GUID. Appendix B gives an example of a type 3 GUID from the DNS name “www.widgets.com”. It incorrectly displays the result as {e902893a-9d22-3c7e-a7b8-d6e313b71d9f}. This was corrected in errata 1428. The correct result is {3d813cbb-47fb-32ba-91df-831e1593ac29}. Unfortunately the incorrect value is often requoted on other sites on the internet.

There are countless programs that generate type 1 and type 4 GUIDS. There are many online generators that you can use. Additionally OSX comes with uuidgen which is a command line tool to create a type 4 GUID. While Visual Studio has a uuidgen.exe which will generate type 1 or type 4 GUIDs. There are, however, very few examples of generators for type 3 and type 5. So I have written a program in C to create type 3 and type 5 GUIDs from either a DNS name or a URL.

UrlGuid.zip – Public domain C source code for generating type/version 3 and 5 GUIDs (UUID). Includes x64 executables for OSX and Windows.

This is free and unencumbered software released into the public domain.

Example usage:

>UrlGuid.exe http://waterjuice.org/2013/06/type-3-and-5-guids/
{cf0aff65-65c3-35a2-b04f-923e1f8bf1f3}
{3a8df732-bc01-57ab-bf7f-94461ee1057b}

As there are few examples on the internet (and some of them are wrong) I will include some type 3 and 5 GUIDs for some common DNS names and URLs.

google.com

    {9a74c83e-2c09-3513-a74b-91d679be82b8}
    {64ee70a4-8cc1-5d25-abf2-dea6c79a09c8}

www.wikipedia.org

    {ea68c8dd-6a38-39ac-8fa4-d55bfb7018fd}
    {02c295f5-54a8-5d29-8d1f-b619216b20c0}

unlicense.org

    {a209c2da-4d39-395e-97b3-38e58d800777}
    {d89fa53a-36cb-5b1c-819a-8b3327182ced}

http://en.wikipedia.org/wiki/Globally_unique_identifier

    {734d5d9c-ea54-3ef9-8d40-fa7db2281e24}
    {d18e9b50-d196-558d-ade3-6f54da5608fb}

https://www.google.com/webhp?hl=en&tab=ww

    {5404423a-b81e-323c-be9b-d3446d606b35}
    {c2b80bf8-7e38-54d2-b0d7-d842eb3e173d}

http://unlicense.org/UNLICENSE

    {456e7994-3b66-316f-839c-68b4b30b5b85}
    {b889eec7-f11a-592e-899b-382086d69067}
Advertisement

SHA-1

MD5 is a widely used hash system; it was invented in 1991. In 1996 a flaw was found in the system making it vulnerable to certain types of attacks, although it wasn’t until 2004 that a more serious flaw was found and the system declared “broken”. SHA-1 was created as a replacement for MD5. It uses similar principles, but is a more conservative design. It outputs a 160 bit hash rather than 128 bit. Due to the flaws found in MD5, SHA-1 was promoted. Over the years weaknesses in SHA-1 have been found, however it is still better than MD5. As of current MD5 is considered completely broken, whereas SHA-1 is still reasonable against the most concentrated attacks. The best know attack requires 2^61 SHA-1 calculations. The computing resources required to perform this are still enormous.

MD5 and SHA-1 are probably the most common hashes systems in use. SHA-1 having replaced MD5 in many cases. The larger hash size alone makes it better as the chances of collisions are further reduced. SHA-1 does come at a cost though. It takes more work to calculate a SHA-1 hash than an MD5 hash.Also the output hash takes up more storage. However if only 128 bits are wanted of a hash, it is acceptable to take the first (or in fact any) 128 bits of a SHA-1 hash. This is still better than using MD5.

I wrote the Proof-Of-Work program using MD5 as it was fast. However I want to build up a library of public domain C source code of several hash functions. SHA-1 is the next obvious one. So I present here source code for SHA-1 hashes, including a program to calculate a SHA-1 hash from the command line.

Sha1String.zip – Contains full source code for SHA-1 and the program Sha1String. Includes x64 binaries for OSX and Windows.

The functions and types in LibSha1.h match the ones in LibMd5.h presented earlier. This makes it very easy to substitute one system with the other. When I add more hash algorithms in the future, they will also follow the same format.

This is free and unencumbered software released into the public domain.