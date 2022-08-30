Roughly seven years ago, Partha Ranganathan realized Moore’s law was dead. That was a pretty big problem for the Google engineering vice president: He had come to expect chip performance to double every 18 months without cost increases and had helped organize purchasing plans for the tens of billions of dollars Google spends on computing infrastructure each year around that idea.
But now Ranganathan was getting a chip twice as good every four years, and it looked like that gap was going to stretch out even further in the not-too-distant future.
So he and Google decided to do something about it. The company had already committed hundreds of millions of dollars to design its own custom chips for AI, called tensor processing units, or TPUs. Google has now launched more than four generations of the TPU, and the technology has given the company’s AI efforts a leg up over its rivals.
Google uses all kinds of custom hardware throughout its operations, but you rarely hear about it. This article provides some insight into the custom hardware Google uses for YouTube transcoding.
There is a very common misconception about Moore’s Law:
https://en.wikipedia.org/wiki/Moore%27s_law#/media/File:Moore's_Law_Transistor_Count_1970-2020.png
It is not about the speed, but rather about the number of transistors on chips. And those keep doubling.
The side effect was technology getting cheaper at the same time. If you can fit 2x transistors in the same chip, you can sell the older chip for less, or make a much better chip at the same price.
The problem is, new chip printing processes are much more expensive, hence there are no more cost savings passed to consumers.
ThreadRipper for example asks $4,000+, while the originals were less than a quarter of that price. If they actually followed the side effects of Moore’s Law (more computing for the same price), we would all be visiting OSNews on 64-core machines right now.
sukru,
When thinking of that many cores, I always question the utility for normal consumers. (Relatively) few people need 16 cores, much less 64. In the past having more transistors paved the way for more registers, larger registers, longer pipelines, etc, which brought general purpose benefits. But there’s diminishing return for all of these things. More often than not transistors are used to implement extra cores and special purpose application accelerators that will spend most of the time going unused.
While I think it’s awesome to have more parallelism in hardware, most general purpose software is not well optimized for 64cores on a CPU. Very frequently when I complain about software performance bottlenecks in everyday desktop software (take the gimp, libreoffice, etc), I look at the CPU utilization only to see that one core is spiking at 100% while the others are almost idle. Doh. Games may be getting a little better about this, but still many just don’t scale with CPU cores because either they’re GPU bound or they use a highly single threaded game loop.
So until we solve the software parallelism problem, the benefit of massively parallel hardware will remain modest for the masses.
Alfman,
You are right about scalability of parallelism, especially for general purpose application code. Only specific applications will benefit from 64 AMD64 cores (video processing, compiling, simulations, …)
However we can now have different kind of cores. GPUs, Tensors, DSP, FPGA, and whatnot.
But then, of course, there will be a period it will not be highly utilized. APIs and standard will take years to reach maturity.
A Video Conference application would benefit a lot from Tensor cores. But is there a Web API for Tensor cores? (… checking …) turns out, it is not there yet, but there are early extensions: https://www.secondstate.io/articles/wasi-tensorflow/
Going back, I would also be okay with the other benefit: same performance at half the cost. But that is not happening either.