To view parent comment, click here.

To read all comments associated with this story, please click here.

This is the part which I don't understand. I don't get how it is possible to create a hash function and publicly distribute it, in a way that mathematicians are not able to find collisions just by studying the form of the hash function itself.

Well, I have a course on it later this school year (jan-feb 2012), so I can send you lecture notes if you want. It's targeted towards physicists, though, so maybe you would experience a feeling of unbalanced complexity, the mathematical part looking over-explained while the physical part would look under-explained.

*Edited 2011-09-26 19:32 UTC*

Neolander,

"I don't get how it is possible to create a hash function and publicly distribute it, in a way that mathematicians are not able to find collisions just by studying the form of the hash function itself."

This has to do with "diffusion".

It is actually rather easy to correlate bits for a single SHA1/2 round, such that one can derive the internal state of the hash function from the output bits. Hash/crypto functions are routinely cracked for a limited number of rounds.

But when the process is repeated a sufficient number of rounds, there is no record of which round(s) are responsible for changing a bit. All traces of the original bits are diffused and all that remains is unintelligible entropy. Algebraic solutions become exponentially complex and offer no benefit over brute force scanning. (As always, we're assuming the hash has no mathematically exploitable weaknesses).

This may be similar (or not) to dropping a pebble in a pool and then backtracking the point at which the pebble was dropped by observing the waves. As the waves bounce against the edge of the water, they become more and more diffused until one can no longer determine the point of origin.

"Well, I have a course on it later this school year (jan-feb 2012), so I can send you lecture notes if you want."

I'm curious at a high level, but I don't really feel like reading long papers. Whereas I used to buy computer books and read them through and through, today I can hardly bother to open the cover. I can't explain it, maybe it's the influence of the real world after college. I learned all this cool & interesting theory, but haven't much chance to really apply it in my jobs, and no ROI.

"It's targeted towards physicists...the mathematical part looking over-explained while the physical part would look under-explained."

I'd still need to learn the mathematics anyways.

*Edited 2011-09-26 22:20 UTC*

Member since:

2011-01-28

Neolander,

"I'm also wondering if it's possible to mathematically reverse a hash algorithm in a way that provides..."

Lets go on the assumption that our function is an ideal hash function with no mathematical weaknesses. Since we already know that a broken hash function will limit the scope of search. An idea hash function means that the only way to find a collision is to brute force various inputs until we generate a collision.

sha256sum(x1) = y

sha256sum(x2) = y

How would we find x2, such that it produces the same hash as x1? This can be as simple as taking a known payload, and modifying it with a nonce until we generate the collision we're looking for. It's trivial, and it's been done with MD5. However, this task becomes exponentially more difficult as bits are added to the hash.

Let's reduce the difficulty of the problem:

1 bit hash function

sha256sum(x1) & 0x1 = y

sha256sum(x2) & 0x1 = y

Here, y can only be 0 or 1, therefor every other X value will produce a collision.

2 bit hash function:

sha256sum(x1) & 0x3 = y

sha256sum(x2) & 0x3 = y

Now, y can be 0,1,2,3, every 4th X value will produce a collision, twice as much work as 1 bit.

3 bit hash function:

sha256sum(x1) & 0x7 = y

sha256sum(x2) & 0x7 = y

Every 8th X produces a hash collision, twice as much work as 2 bits.

So with this exponential growth, a 256 bit hash function would collide every 1157920892373161954235709850086879078532699846656405640394575840079131 29639936 X values on average.

Assuming we have 1 billion computers, each able to forward hash 1 billion X values every second, then we might expect a collision every 3764568028158688209515806576697354474006124657512762 years on average (double check my math).

This is if we stick to classical computing, quantum computing introduces yet a whole new dimension to the problem. It's too bad quantum computing was not offered at my university, since I don't know that much about it.

Edit: I'd be happy to leave my cheap web development clients to work on this stuff instead, if anyone's willing to pay me to do it.

Edited 2011-09-26 19:24 UTC