2. Basic Cryptography: Hash Function

Lawrence E. Hughes¹

(1)

Frisco, TX, USA

There is another type of cryptographic algorithm similar to encryption, but with several very important differences. The technical term for this is hash function . It is sometimes also called a message digest algorithm. Technically though, the algorithm is a hash function, and the output of that function is a message digest.

Here, the object is not to hide information by scrambling it in a reversible manner, but to characterize a digital document (of any length) with a single large number (e.g., 224 to 512 bits in length). Since the result is always a fixed length for a given hash algorithm, it can’t contain the entire source file as encryption does. One of the goals is to make it extremely difficult to recover any of the source document from the message digest (unlike encryption which must be reversible given the key, a message digest must not be reversible).

Unlike encryption, there is no key. A given hash function algorithm (e.g., SHA2-256) will always produce exactly the same digest from a given source document, no matter who does it or when it is done. On the other hand, it is very sensitive to changes in the source document. Even a single bit change in a ten-megabyte source document will result in a completely different message digest.

There were early attempts at creating good hash algorithms like MD2, MD4, MD5 (128 bits), and more recently SHA-1 (160 bits). SHA stands for Secure Hash Algorithm . Those have all been deprecated (found to be weak and no longer recommended for use). Currently, there are two sets of message digest algorithms that are approved for use, called SHA-2 and SHA-3. Both have versions with 224-, 256-, 384-, and 512-bit digests. The longer the digest, the better characteristics it has (of course, longer digests also take more computing power for a given source document than shorter digests). SHA-2/256 bits is the most commonly used message digest today. It is sometimes referred to as SHA256. SHA-3 is kind of a backup algorithm in the event that SHA-2 is broken.

Characteristics of a Good Message Digest Algorithm

Good cryptographic dispersion, so that tiny changes are amplified – ideally, every bit of the message affects the final digest.
It is created using many one-way transformations (compare to encryption where every transformation must be one to one onto or reversible). There is no way to recover any part of the message given in the digest.
It should be extremely difficult or impossible for someone to make changes to a file and then make offsetting changes and still produce the same digest (this can easily be done with simpler schemes like a checksum or even CRC-32).

While there are a very large number of 160-bit digests (2 to the 160th), there are far more possible messages (most of which are total gibberish). While it is possible that two different emails or books could produce the same digest, it is very, very difficult to cause that to happen on purpose. That is called a collision, and finding a way to cause collisions with a message digest algorithm is one way to “break” it.

The total number of books ever published, or even the total number of email messages sent, is a vanishingly tiny number compared to the number of possible 160-bit (let alone 256-, 384-, or 512-bit) digests. It is unlikely that any two books or emails ever published or sent would produce the same message digest, but it is not impossible, just very, very unlikely. These are called collisions, and a message digest can be broken by being able to produce a collision on purpose.

One approach to breaking hash algorithms is to precompute a ton of hashes in rainbow tables.

Conceptual Representations

You can think of message digest as a mathematical function or transform:

MD = SHA(message)

Or for those more visually oriented, refer to Figure 2-1.

Figure 2-1
Visual of a message digest as a mathematical function or transform

Primary Uses

The main use today for message digests is in digital signatures.

To digitally sign a message, you produce a message digest of it and then encrypt that digest with an asymmetric algorithm and your own private key.

To verify a signature, you decrypt the signature with the signer’s public key (from their digital certificate) to recover the original digest and then produce a new digest of the message. If those match, the signature is valid. This lets you know two things:

Message integrity – the message has not changed in any way since it was signed.
Signer authentication – only the owner of the private key corresponding to the certificate used to validate the signature could have created such a signature.

If the signature fails, then one or both of the following are true (there is no way to know which is true, but in either event, you should not trust the message):

Something has changed in the file since it was signed (could be malicious or from a transmission or storage error).
It was signed by some other person than the one whose certificate you are using to validate the signature.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 2. Basic Cryptography: Hash Function

Create new playlist

Sign In

Sign Up

2. Basic Cryptography: Hash Function

Characteristics of a Good Message Digest Algorithm

Conceptual Representations

Primary Uses

Table of Contents for
2. Basic Cryptography: Hash Function