64 bits ought to be enough for anybody!

Thom Holwerda 2019-12-02 Privacy, Security 8 Comments

How quickly can we use brute force to guess a 64-bit number? The short answer is, it all depends on what resources are available. So we’re going to examine this problem starting with the most naive approach and then expand to other techniques involving parallelization.
We’ll discuss parallelization at the CPU level with SIMD instructions, then via multiple cores, GPUs, and cloud computing. Along the way we’ll touch on a variety of topics about microprocessors and some interesting discoveries, e.g., adding more cores isn’t always an improvement, and not all cloud vCPUs are equivalent.

About The Author

Thom Holwerda

Follow me on Mastodon @[email protected]

8 Comments

2019-12-02 7:50 pm

Alfman verbose=1
The difference between CPU and GPU computation for this problem is so dramatic (a lone V100 is about 18.5x as fast as 16 high-CPU cores) that it doesn’t make sense to use CPUs. Adding another GPU will always be a better use of resources than relying on CPU computation. To show how bad CPUs are at this problems, I made this handy graph:

Searching large spaces becomes easier if you compress that space with hashing or binary search, etc. His analysis only considered the most naive & trivial algorithm and disregarded high level algorithmic optimizations that a real programmer would employ to improve performance dramatically. Normally one does not do low level optimization without first optimizing the high level algorithm, but he only did the former, so I’m a bit iffy about the way he approached this.

I don’t want to miss his point, I know he disregarded better search algorithms in order to deliberately force it to be a brute force solution, but I kind of wish he had picked a more meaningful/realistic problem, even if he made it up himself. It could have been interesting to see how various types of GPUs/clusters perform then rather than an extremely subpar counting algorithm to begin with. Something that fits the bill could be factoring primes, or searching for patterns in digits of pi. Oh well.

It’s interesting to see in the graph how closely digital ocean’s and google’s virtual cpu units perform, though I don’t know why he omitted the macbook and core i3 from the graph. He should have plotted them even though they did not scale well. Anyways, these nitpicks aside, it does highlight just how much of an advantage GPUs have over CPUs for massively parallel workloads. You’d need hundreds of CPU cores to approach a GPU’s performance, and that would be nowhere close to the GPU’s efficiency. FPGAs could be interesting too.

2019-12-04 3:24 pm

JLF65
The whole point of the article was to take the most trivial brute-force approach to a problem and then show how even that can defeat the problem if you use the proper equipment. Not even rare equipment, but common equipment.

2019-12-04 4:02 pm

Alfman verbose=1
JLF65,

The whole point of the article was to take the most trivial brute-force approach to a problem and then show how even that can defeat the problem if you use the proper equipment. Not even rare equipment, but common equipment.

I’d give him half credit for that, for full credit he would have needed an algorithm that genuinely needed to be brute forced rather than only needed to be brute forced because it was sub-optimal 🙂

I’m not just making the distinction to be difficult, but rather because a real world a developer would have solved the problem faster and more efficiently than his fastest GPU solution, which throws his conclusions out the window. His conclusions only hold if we disallow optimization, which is unrealistic, and hence the push-back.

2019-12-02 11:52 pm

Brendan
His analysis only considered the most naive & trivial algorithm and disregarded high level algorithmic optimizations that a real programmer would employ to improve performance dramatically.

Yes. The first step would be changing the “eval()” function to return 1 (greater), 0 (equal) or -1 (smaller). Then it effectively becomes a binary search that will complete in less than 64 iterations.

2019-12-03 4:49 pm

Alfman verbose=1
Brendan,

Yes. The first step would be changing the “eval()” function to return 1 (greater), 0 (equal) or -1 (smaller). Then it effectively becomes a binary search that will complete in less than 64 iterations.

I wonder if there’s ever been a contest to take a simple problem and find the least efficient algorithm to solve it. It would need to provably make progress at every step, and do so elegantly, just inefficiently.

Hey Thom, maybe you could have this contest on osnews. Something like the obfuscated C contest, but instead of the goal being to sacrifice clarity, it would be to sacrifice performance in the most novel ways possible. 🙂
2019-12-04 10:22 pm

JLF65
That doesn’t work if the only result you get back is match or don’t match, which I’d HOPE would be the case for a key. Providing hints like “warmer” or “colder” would kinda defeat the whole purpose of the key. 😀

2019-12-05 1:57 am

Alfman verbose=1
JLF65,

That doesn’t work if the only result you get back is match or don’t match, which I’d HOPE would be the case for a key. Providing hints like “warmer” or “colder” would kinda defeat the whole purpose of the key.

The author didn’t mention cryptography at all, but that would be a valid brute force application.

I should point out that even with crypto applications under sufficient mathematical analysis you often CAN get hints like warmer or colder to effectively compresses the search space (such that you don’t have to brute force the entire space).

https://en.wikipedia.org/wiki/Key_size
https://en.wikipedia.org/wiki/Advanced_Encryption_Standard

For example a 2048bit RSA and 224bit ECC have about 112 bits of attack complexity.
A 168 bit triple DES key only has 112bits of search space.
AES128 is currently known to have 126 bits of attack complexity.
AES256 has 254.3bits of complexity.

Over time, attacks tend to improve and we don’t actually know the lower bound for symmetric or PKI crypto problems. The fact of whether P==NP is an open question, which means it’s theoretically possible that mathematical advancements could break PKI entirely.

https://news.mit.edu/2009/explainer-pnp

768bits ought to be enough for anybody!
https://en.wikipedia.org/wiki/RSA_numbers

RSA-768 has 232 decimal digits (768 bits), and was factored on December 12, 2009 over the span of two years

High level optimizations can significantly cut down on search space over naive brute force algorithms, we can’t overlook this when we ask ourselves how tractable a problem is… the gory details that make up a problem really do matter and it will make a difference. Anyways, sorry if I’m beating a dead horse at this point, haha.

2019-12-05 8:04 am

Darkmage
The most bitter disappointment with clicking this article was after seeing V100 it was an Nvidia Volta not a Voodoo VSA 100 GPU.