To view parent comment, click here.

To read all comments associated with this story, please click here.

Neolander,

"I'm also wondering if it's possible to mathematically reverse a hash algorithm in a way that provides..."

Lets go on the assumption that our function is an ideal hash function with no mathematical weaknesses. Since we already know that a broken hash function will limit the scope of search. An idea hash function means that the only way to find a collision is to brute force various inputs until we generate a collision.

sha256sum(x1) = y

sha256sum(x2) = y

How would we find x2, such that it produces the same hash as x1? This can be as simple as taking a known payload, and modifying it with a nonce until we generate the collision we're looking for. It's trivial, and it's been done with MD5. However, this task becomes exponentially more difficult as bits are added to the hash.

Let's reduce the difficulty of the problem:

1 bit hash function

sha256sum(x1) & 0x1 = y

sha256sum(x2) & 0x1 = y

Here, y can only be 0 or 1, therefor every other X value will produce a collision.

2 bit hash function:

sha256sum(x1) & 0x3 = y

sha256sum(x2) & 0x3 = y

Now, y can be 0,1,2,3, every 4th X value will produce a collision, twice as much work as 1 bit.

3 bit hash function:

sha256sum(x1) & 0x7 = y

sha256sum(x2) & 0x7 = y

Every 8th X produces a hash collision, twice as much work as 2 bits.

So with this exponential growth, a 256 bit hash function would collide every 1157920892373161954235709850086879078532699846656405640394575840079131 29639936 X values on average.

Assuming we have 1 billion computers, each able to forward hash 1 billion X values every second, then we might expect a collision every 3764568028158688209515806576697354474006124657512762 years on average (double check my math).

This is if we stick to classical computing, quantum computing introduces yet a whole new dimension to the problem. It's too bad quantum computing was not offered at my university, since I don't know that much about it.

Edit: I'd be happy to leave my cheap web development clients to work on this stuff instead, if anyone's willing to pay me to do it.

*Edited 2011-09-26 19:24 UTC*

This is the part which I don't understand. I don't get how it is possible to create a hash function and publicly distribute it, in a way that mathematicians are not able to find collisions just by studying the form of the hash function itself.

Well, I have a course on it later this school year (jan-feb 2012), so I can send you lecture notes if you want. It's targeted towards physicists, though, so maybe you would experience a feeling of unbalanced complexity, the mathematical part looking over-explained while the physical part would look under-explained.

*Edited 2011-09-26 19:32 UTC*

Member since:

2010-03-08

I'm also wondering if it's possible to mathematically reverse a hash algorithm in a way that provides, say, all files within a size range between X and Y that result in a given hash Z when going through the hashing algorithm. This operation could be followed by analyzing those files in the hope of finding one which has some desirable properties (like, in the context of secure boot, some simple code which is able to load other code)

And if it is, whether such a "clever" approach would have the potential to beat brute force random data injection to a tampered binary until it gets the same hash, in terms of execution speed.

Edited 2011-09-26 17:41 UTC