So has AMD done the unthinkable? Beaten Intel by such a large margin that there is no contest? For now, based on our preliminary testing, that is the case. The launch of AMD’s second generation EPYC processors is nothing short of historic, beating the competition by a large margin in almost every metric: performance, performance per watt and performance per dollar.
Analysts in the industry have stated that AMD expects to double their share in the server market by Q2 2020, and there is every reason to believe that AMD will succeed. The AMD EPYC is an extremely attractive server platform with an unbeatable performance per dollar ratio.
This is one stunning processor family.
I love it, that’s an insane amount of cores 🙂
I kind of wish they hadn’t rushed the benchmarks though, it put intel at an artificial disadvantage. They should have compared the best tech from each company.
Anyways, I’d love to have one! This is the future of CPU scalability, To support this many cores they needed to split into more NUMA regions. For data center servers, NUMA gets you much higher parallelism for tasks that don’t need IPC between the regions. This is perfect for VMs and application gateways where processes execute mostly independently from each other. However for heavily multithreaded processes NUMA creates bottlenecks for SMP software. All of the shared data access and synchronization primitives used by multithreaded algorithms can quickly saturate a distributed NUMA memory architecture and anandtech’s benchmarks show a fairly severe latency cost for the 128core 2*AMD EPYC 7742 system. For this reason, I think this type of hardware is best for data center usage, which is good because it costs alot, haha. While I wouldn’t predict very high scores for gaming, it’d be fun to see those benchmarks never-the-less 🙂
But it has less NUMA regions… That is the thing they fixed. Dual Socket EPYC used to have 8 NUMA domains and 3 different paths with different latencies, now there are only 2 domains/2 paths.
Did I misunderstand something?
Nvm… Im dumb. I didn’t see the bit about the NPS tuning. Now I get what you were saying. Sorry.
I think it would make sense to treat these kinds of configurations more as a “virtual cluster” than as a high core SMP system. This way rather than ending up with high latency SMP, it could be treated as a low latency cluster! Maybe we could use containers and process pinning (something short of full virtualization).to achieve this? There must be someone doing this already. If it weren’t above my pay grade, I’d love to work on this stuff myself.
tuned and numad are supposed to help with this. They’ll dynamically adjust processes to get better performance. Of course, building a smarter scheduler for the OS would be a better solution.
Maybe do some async multiproccessing. Something like ARM’s big/little architecture. Each chip has a small quad core chip for the OS, and big chips for jobs. This might only be possible when RAM starts ending up on die.
Another thought is building a hardware hypervisor into the platform. Old Sun systems could carve up a high core count SPARC system into multiple domains, I don’t remember the actual name of the technology, and each one would be a bespoke server sharing baremetal hardware. It was part of the firmware to bootstrap the system, so the equivalent would be to build something into UEFI.
So many thoughts, and old examples.
Flatland_Spider,
Indeed, I’m not sure I’ll ever get to work on this, but I’ll bookmark and try to remember it for future reference, thanks!
https://linux.die.net/man/8/numad
(emphasis mine).
This reminds me of the PS3 Cell Processor
https://arcb.csc.ncsu.edu/~mueller/cluster/ps3/doc/CellProgrammingTutorial/BasicsOfCellArchitecture.html
It had a great deal of potential, but the developers rarely used it to it’s full potential. I’m not sure if developers will ever warm up to extremely asymmetric hardware designs even though it can help to address hardware problems..
That would be awesome! It could look just like a real blade server, without requiring all those physical blade modules.
Depends on what system builders do. CPU bench marks are great, but especially for servers, it really depends on the server configurations that HP, DELL, supermicro, etc put together and how they price them. Last time with opterons, the pricing was great per cpu, but there weren’t a lot of options when it came time to purchase the servers. Also, I’m kind of concerned that for many companies owning physical servers isn’t something they want. There are still needs for them, but its not the default obvious choice anymore to have one’s own hardware.
I love this. I’m computing terms I’m of the AthlonXP generation. I saw a market where 2 companies battled it out for x86, meanwhile there was the G4 chips and SPARC were keeping the market honest.
Now we are seeing that building again, with ARM keeping them on their toes. Again, loving it!