Linked by Thom Holwerda on Fri 11th Aug 2017 19:46 UTC
AMD

In this review we've covered several important topics surrounding CPUs with large numbers of cores: power, frequency, and the need to feed the beast. Running a CPU is like the inverse of a diet - you need to put all the data in to get any data out. The more pi that can be fed in, the better the utilization of what you have under the hood.

AMD and Intel take different approaches to this. We have a multi-die solution compared to a monolithic solution. We have core complexes and Infinity Fabric compared to a MoDe-X based mesh. We have unified memory access compared to non-uniform memory access. Both are going hard against frequency and both are battling against power consumption. AMD supports ECC and more PCIe lanes, while Intel provides a more complete chipset and specialist AVX-512 instructions. Both are competing in the high-end prosumer and workstation markets, promoting high-throughput multi-tasking scenarios as the key to unlocking the potential of their processors.

As always, AnandTech's the only review you'll need, but there's also the Ars review and the Tom's Hardware review.

I really want to build a Threadripper machine, even though I just built a very expensive (custom watercooling is pricey) new machine a few months ago, and honestly, I have no need for a processor like this - but the little kid in me loves the idea of two dies molten together, providing all this power. Let's hope this renewed emphasis on high core and thread counts pushes operating system engineers and application developers to make more and better use of all the threads they're given.

Thread beginning with comment 647893
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE: Threads
by dpJudas on Sat 12th Aug 2017 09:57 UTC in reply to "Threads"
dpJudas
Member since:
2009-12-10

More use? No. Better use? Yes. Programs should definitely be thread agnostic and thus structured (layered) for usage patterns like work-stealing queues etc.

Problem is that once NUMA enters the picture it becomes much more difficult to be thread agnostic. A generic threadpool doesn't know what memory accesses each work task is going to do, for example.

Reply Parent Score: 3

RE[2]: Threads
by Alfman on Sat 12th Aug 2017 14:55 in reply to "RE: Threads"
Alfman Member since:
2011-01-28

dpJudas,

Problem is that once NUMA enters the picture it becomes much more difficult to be thread agnostic. A generic threadpool doesn't know what memory accesses each work task is going to do, for example.


I agree, multithreaded code can quickly reach a point of diminishing returns (and even negative returns). NUMA overcomes those bottlenecks by isolating the different cores from each other's work, but then obviously not all threads can be equal and code that assumes they are will be penalized. These are intrinsic limitations that cannot really be fixed in hardware, so personally I think we should be designing operating systems that treat NUMA as clusters instead of as normal threads. And our software should be programed to scale in clusters rather than merely with threads.

The benefit of the cluster approach is that software can scale with many more NUMA cores than pure multithreaded software. And without the shared memory constraints across the entire set of threads, we can potentially scale the same software with NUMA or additional computers on a network.

Reply Parent Score: 4

RE[3]: Threads
by FortranMan on Sat 12th Aug 2017 17:34 in reply to "RE[2]: Threads"
FortranMan Member since:
2011-12-21

This is really why I still use MPI for parallel execution even when running on a single node. This approach also has the added benefit of scaling up to small computer clusters without much extra effort.

I mostly write engineering simulation codes though, so I'm pretty sure this does not make sense for entire classes of program.

Reply Parent Score: 4

RE[3]: Threads
by tylerdurden on Sun 13th Aug 2017 00:04 in reply to "RE[2]: Threads"
tylerdurden Member since:
2009-03-17

It seems you want to get the worst of both worlds in order to not get the benefits of NUMA.

You can simply pin threads if you're that concerned with NUMA latencies. Otherwise let the scheduler/mem controller deal with it.

Edited 2017-08-13 00:06 UTC

Reply Parent Score: 2