AMD Zen 2 microarchitecture analysis: Ryzen 3000 and EPYC Rome

Thom Holwerda 2019-06-12 AMD 16 Comments

We have been teased with AMD’s next generation processor products for over a year. The new chiplet design has been heralded as a significant breakthrough in driving performance and scalability, especially as it becomes increasingly difficult to create large silicon with high frequencies on smaller and smaller process nodes. AMD is expected to deploy its chiplet paradigm across its processor line, through Ryzen and EPYC, with those chiplets each having eight next-generation Zen 2 cores. Today AMD went into more detail about the Zen 2 core, providing justification for the +15% clock-for-clock performance increase over the previous generation that the company presented at Computex last week.

The 16c/32t Ryzen 9 3950X looks quite attainable at $750 – a price that is surely to come down after launch.

About The Author

Thom Holwerda

Follow me on Mastodon @[email protected]

16 Comments

2019-06-12 9:27 pm

sukru
There goes the value of old server Xeon chips:
https://www.reddit.com/r/PleX/comments/78ae67/plex_server_build_recommendation_750_20core_40/

They were very useful to get many cores (10, 20 were possible especially in multi-socket configurations) at very low price (compared to desktop processors), as long you were willing to go a few generations older.

However a $750 processor with 16 cores is now producing a good value.

For example, a dual Xeon E5-2670v2 would cost:
* 2x ~$110 for CPU: https://www.ebay.com/itm/Intel-Xeon-E5-2670v2-10-Core-2-5GHz-3-3GHz-LGA2011-CPU-SR1A7/283497487283?hash=item4201c45bb3:g:5zsAAOSwtmdc6xrT
* $150 for MOBO: https://www.ebay.com/itm/Supermicro-Dual-Socket-Server-Motherboard-X9DR3-F-LGA-2011-Socket-R/254261728690?epid=1800261072&hash=item3b332e15b2:g:da8AAOSwgmVdAO7Y

Which sounds better in terms of price. However it is an apples to oranges comparison, since:

Xeon would have 2.5GHz base frequency (up to 3.3GHz with Turbo Boost), while Ryzen easily goes over 4 GHz
Xeon would spend more than 100W per socket, while Ryzen would run the entire thing at the same power level
Xeon would have 2×10 cores, however there would be inefficiencies due to NUMA architecture
Xeon would support ECC RAM, which is useful, however it would be slower older DDR3 generation, and more expensive
Xeon motherboard has more features (like remote management), while it would not support PCIe 4 for the GPU (it is most likely PCIe 2 only)

So we finally are at affordable enthusiast workstation offerings. It would be fun to watch this space.

2019-06-13 6:31 am

gilboa
(NOTE: I’m currently typing this on a self-built V2 Xeon workstation)
Going the (semi-old) Xeon route has a number of upsides and some downsides.

Additional Xeon upsides:
1. Memory bandwidth.
2. Supports up to 768 / 1.5TB.
3. Stability. My previous workstation, an 8.5 year old (!) dual Xeon E5680 (w/ 36GB ECC RAM), is still working 24/7 with zero issues.

Additional Xeon downside:
1. Lack of Spectre / Meltdown hardware mitigation (Drop 10-15% performance on v2 CPUs).
2. Power usage. (Though, if you buy V2 or V3 L 65w CPUs, power usage should be OK).

– Gilboa

2019-06-14 9:44 am

laffer1
Refurbished workstations are a good budget item if you need compute performance over even a Ryzen build. I got a HP Z420 workstation for $400 on newegg two years ago. It’s been great and my first 8 core system. I’ve since built a Ryzen 7 2700 for gaming, but the HP is nearly as fast and significantly cheaper. It’s quite loud though!

2019-06-12 11:12 pm

Alfman verbose=1
sukru,

They were very useful to get many cores (10, 20 were possible especially in multi-socket configurations) at very low price (compared to desktop processors), as long you were willing to go a few generations older.
…
So we finally are at affordable enthusiast workstation offerings. It would be fun to watch this space.

I certainly welcome it, However I think typical consumers are experiencing far fewer gains now that CPU upgrades have gone from significantly improving base frequencies to increasing cores instead. I understand why manufacturers have done this and it’s exciting to see so many cores on paper, but in practice most desktop workloads will not benefit from the extra cores whatsoever because it’s rare for desktop software to actually take advantage of them. Even with games benchmarks rarely show an advantage for having extra cores. To make matters worse often these high core processors often compromise on single core performance.

These CPUs do have tons of potential, but the desktop software itself is usually the limiting factor. For example, the gimp and inkscape are both excellent candidates for SMP acceleration, however their graphic stacks are limited to a single thread such that most cores sit idle while the user waits. Making SMP hardware effective requires software developers to match the investments that CPU manufacturers are making, but so far the uptake outside of servers has been lousy IMHO.

2019-06-13 2:03 am

hdjhfds
I agree with you about desktop use cases. The law of diminishing returns is really felt on that front. I vividly remember my first P4 with hyperthreading. To someone who was used to a single processor, it was revolutionary, to see the computer not stalled whenever I compressed a large file or applied a filter in photoshop.
The first true dual-core processor I used, an athlon x2 brought some more responsiveness overall, but not as revolutionary as the earlier P4.
My now pretty old quasi 4-core laptop i5 is sure nice to have when a tab or two of the browser is stuck in some javascript f.-up, but essentially it is just an incremental update over 2 cores. And I have yet to jump on the 8 core train, but when I do it, I doubt I’ll feel any different given my ordinary use case involving a browser and an office software, with winamp (yeah still winamp) playing in the background.
2019-06-13 1:38 pm

Gargyle
> To make matters worse often these high core processors often compromise on single core performance.

Not so for Zen 2, as AMD’s Robert explained on reddit, cf:
https://i.redd.it/j4w3wywu9y331.png
https://www.reddit.com/r/Amd/comments/bzt50i/amd_robert_hallock_not_single_core_boost/
https://www.reddit.com/r/Amd/comments/b8a5ft/amd_ryzen_processor_features_defined/

TL;DR:
“Ryzen doesn’t really have a ‘single core turbo’ clock. Our boost algorithm pursues the highest possible clocks on as many cores as possible until you hit some sort of limit: socket power, core temps, VRM electrical limit, VRM thermal limit, mac clockspeed, etc.

Also note that for AMD Ryzen (even for the very first gen) it’s the highest core count CPUs that get the highest turbo clocks, most likely because they need that as a selling point AND because AMD needs to cherrypick its best binned chips for these SKUs.

2019-06-13 4:56 pm

Alfman verbose=1
Gargyle,

“Ryzen doesn’t really have a ‘single core turbo’ clock. Our boost algorithm pursues the highest possible clocks on as many cores as possible until you hit some sort of limit: socket power, core temps, VRM electrical limit, VRM thermal limit, mac clockspeed, etc.

The result isn’t all that different, from a high level, it’s an opportunistic boost. On an intel i9 processor, I’ve seen 4 cores momentarily boost to 5.00Ghz, but the 5.00Ghz speed is only sustainable with 2 cores otherwise they back off to the base frequency. Interestingly the behavior of mine changed after a bios update (turbo was less aggressive before the update), so it may very by system.

Also note that for AMD Ryzen (even for the very first gen) it’s the highest core count CPUs that get the highest turbo clocks, most likely because they need that as a selling point AND because AMD needs to cherrypick its best binned chips for these SKUs.

I can’t say what the future holds, but take a look at the fastest CPUs listed for every core count for current threadripper 2 products…

https://www.amd.com/en/products/ryzen-threadripper

AMD Ryzen™ Threadripper 2950X Processor
# of CPU Cores: 16
# of Threads: 32
Max Boost Clock: 4.4GHz
Base Clock: 3.5GHz

AMD Ryzen™ Threadripper™ 2990WX Processor
# of CPU Cores: 32
# of Threads: 64
Max Boost Clock: 4.2GHz
Base Clock: 3GHz

AMD Ryzen™ Threadripper™ 2970WX Processor
# of CPU Cores: 24
# of Threads: 48
Max Boost Clock: 4.2GHz
Base Clock: 3GHz

The 32 core 2970WX CPU has a lower clock frequency compared to the 2950X 16 core CPU. Furthermore, despite having half as many cores, the benchmarks for typical games and applications overwhelmingly favor the 2950X. Look at the GIMP benchmark (which does not take advantage of many cores)…The fastest AMD CPU is also the cheapest one with the fewest cores!!
https://www.anandtech.com/show/13516/the-amd-threadripper-2-cpu-review-pt2-2970wx-2920x/5

AMD Ryzen 7 2700X (8C/16T) = 3.97s
AMD TR2 2920X (12C/24T) = 4.35s
AMD TR2 2950X (16C/32T) = 4.55s
AMD TR2 2990WX (32C/64T) = 4.72s
AMD TR2 2970WX (24C/48T) = 5.01s
AMD EPYC 7601 (32C/64T) = 6.64s

In some cases the decrease in performance is severe. Look at Far Cry and GTA for example:
https://www.anandtech.com/show/13516/the-amd-threadripper-2-cpu-review-pt2-2970wx-2920x/17
The AMD TR2 2900WX is the second most expensive processor on that list, but by far the worst performer by a significant margin.

https://www.tomshardware.com/reviews/amd-ryzen-threadripper-2-2990wx-2950x,5725-10.html

So it’s really not as simple as ‘more cores==best performance’ for AMD or Intel. There are some pretty major obstacles to high performance at large core counts. Memory doesn’t scale up very well (it’s a bottleneck even at lower core counts). NUMA attempts to addresse this by creating local and remote memory regions for each processor. For servers running unrelated tasks concurrently, this works out well. But for highly multi-threaded code running in a single process naively creating threads and uses memory without regards to NUMA regions, the result is abysmal performance. AMD has a fix for this called “Game Mode”, but it involves disabling half the cores that you paid a premium to get in the first place 🙁

I’m not blaming AMD at all, but the truth is we can’t “hardware” our way out of this, it’s going to require the software industry to evolve and embrace new ways of programming to truly take advantage of the hardware, otherwise the performance will continue to be disappointing.

2019-06-14 8:39 am

Gargyle
I’m sorry, my intention to prove that higher core count doesn’t necessarily mean lower single threaded performance was implicitly bound to the Ryzen line-up, so not including Threadripper or Epyc.

For example: the top-tier (highest core count) CPU within the Ryzen line-up also always had the highest clocks, i.e. 1500X vs 1600X vs 1800X, or 2600X vs 2700X, and now 3600X vs 3800X.

Granted, the base clock of the 3950X is down from 3.9GHz to 3.5GHz, but given having the same thermal restrictions as the 3800X while also having double the amount of cores, it’s not surprising. But still the maximum boost clock of the 3950X is even 200MHz higher than that of the 3800X and since boost isn’t dependent on total number of cores or other fixed parameters but rather on variable parameters as explained in my previous post, it might as well have the highest single thread performance of all Zen 2 based Ryzen CPUs.

The crux is in the (relatively low) power limit imposed on a high core count CPU, where boost over more than some cores significantly drops to keep to total amount of consumed power in check, cf. the Average Clock Frequency vs Thread Count chart in the following link:

https://www.techpowerup.com/reviews/AMD/Ryzen_5_2600/16.html

2019-06-14 10:20 am

Alfman verbose=1
Gargyle,

I’m sorry, my intention to prove that higher core count doesn’t necessarily mean lower single threaded performance was implicitly bound to the Ryzen line-up, so not including Threadripper or Epyc.

For example: the top-tier (highest core count) CPU within the Ryzen line-up also always had the highest clocks, i.e. 1500X vs 1600X vs 1800X, or 2600X vs 2700X, and now 3600X vs 3800X.

Granted, the base clock of the 3950X is down from 3.9GHz to 3.5GHz, but given having the same thermal restrictions as the 3800X while also having double the amount of cores, it’s not surprising.

That’s kind of the point though, adding cores often results in lower real world performance for various reasons. There’s diminishing returns and even negative benefit. It doesn’t surprise me, but it might surprise others so it’s worth pointing out.

For desktop use cases, unless your software is specifically programmed to take advantage of high core counts, it’s quite likely that a cheaper CPU with fewer cores will actually perform better. The reason I linked to benchmarks is because many people wouldn’t believe it otherwise, it’s somewhat counter-intuitive.

The sweet spot for a desktop today is probably in the 4-8 core range. Admittedly AMD’s strategy has been to push up the number of cores dramatically, however they aren’t especially competitive with intel on the lower core counts which have higher single threaded performance.
2019-06-14 11:13 am

Gargyle
> Admittedly AMD’s strategy has been to push up the number of cores dramatically, however they aren’t especially competitive with intel on the lower core counts which have higher single threaded performance.

That statement is becoming more and more obsolete, fortunately, especially since A/ Intel is seeing its performance diminish each time one of their security flaws is mitigated through software fixes, and B/ AMD has almost started selling their Zen 2 CPUs which should be at least equally fast on singlethreaded loads as the newest Intels, which is imo marvellous news as that will finally break up the market somewhat, which is a good thing for us consumers.
2019-06-14 12:03 pm

Alfman verbose=1
Gargyle,

That statement is becoming more and more obsolete, fortunately, especially since A/ Intel is seeing its performance diminish each time one of their security flaws is mitigated through software fixes, and B/ AMD has almost started selling their Zen 2 CPUs which should be at least equally fast on singlethreaded loads as the newest Intels, which is imo marvellous news as that will finally break up the market somewhat, which is a good thing for us consumers.

I would like to support AMD, really, but the performance benchmarks consistently put intel ahead for single threaded performance (ie typical of desktop applications). If you want to disagree, that’s fine, but please provide empirical data rather than just telling me it. All of the independent benchmarks that I am seeing still favor intel. Can you find any credible independent benchmark showing amd beating intel on single threaded performance? If not, then what is your basis for suggesting they do? I think this is a fair request.

In AMD’s defense, I agree with sukru that by far and large most processors these days are plenty good enough for most things desktop users are doing, regardless of whether they can beat intel or not.

2019-06-13 2:53 pm

sukru
I agree that the “current” desktop apps do not benefit from it. In fact graphics, which is supposed to be the best thing to optimize for multi-core depends on older single threaded code.

However things are getting better.
– Browsers for example benefit from running tasks in the background
– Games are already there with separate threads for AI / procedural generation / etc
– Same video encoders are starting to take advantage (albeit very slowly)
– 7zip / WinRar can use multiple threads

And if all fails, having at least 2-3 cores prevents most desktop stutters / slowdowns in regular use.

All that aside, I don’t remember having a “slow” desktop in recent years, let is be 2GHz or 5GHz, as long as there is at least 8GB of RAM, and some soft of SSD. Even my older machines upgraded to these specs delivered for a long while. So yes we are at “diminishing returns” for desktop / office work. However there is still a lot that can be done in specialized workloads (especially multimedia).

2019-06-14 9:53 am

laffer1
There’s also benefits for specific users. A guy using facebook isn’t going to benefit much, but gamers, programmers, researchers, etc will benefit a great deal from at least an 8 core system. Consider gamers. While games themselves are often designed poorly so that they have dedicated threads for a task (sound processing, video, ai , etc), many gamers are now streaming their matches on twitch or other services. Those folks have extra compute power to encode the video, run twitch, etc with the additional cores. Some of us are also adding extras like philips hue play lighting, whirlwind devices (fans with heat sources for games) and other things that will require some background processing power. Game programmers will eventually have to learn how to use frameworks to do threading. libdispatch is quite useful in C for instance.

Anyone working on machine learning or AI can benefit from both CPU and GPU improvements. Regular programmers can compile in parallel. It can speed up build times significantly for some folks.

Then there’s specialized cases. The 16 core CPU is enticing for me as i frequently build 3000+ packages for my open source BSD project and need horsepower for that. I currently use a core i7 7770 running vmware esxi as well as a HP Z420 workstation with 8 core cpu and a ryzen 7 2700 for these builds. Building on the core i7 takes 3 days by itself. The ryzen system can do it in a day if i use a memory disk to allow for full utilization of the CPU. I could get a build done in hours potentially with a 16 core and an extra 32GB of RAM.

A new system based on this 16 core ryzen is likely faster than the new mac pro as well and likely 1/2 the cost to acquire.

2019-06-14 11:02 am

Alfman verbose=1
laffer1,

\

There’s also benefits for specific users. A guy using facebook isn’t going to benefit much, but gamers, programmers, researchers, etc will benefit a great deal from at least an 8 core system. Consider gamers. While games themselves are often designed poorly so that they have dedicated threads for a task (sound processing, video, ai , etc), many gamers are now streaming their matches on twitch or other services. Those folks have extra compute power to encode the video, run twitch, etc with the additional cores.

That’s a great point. I believe most games today are tuned to run on 4ish cores to maximize their market base, but background streaming could benefit from having more cores (not sure how much is offloaded to GPU?).

Some of us are also adding extras like philips hue play lighting, whirlwind devices (fans with heat sources for games) and other things that will require some background processing power.

This example made me laugh, because the amount of processing power these would need is negligible in light of the processing power available in the last several decades. Nevertheless, I can see the sales-rep at bestbuy trying to promote high core computer based on these features. Why not, a customer buying these accessories probably has the money for more cores too even if it doesn’t make a difference.

Anyone working on machine learning or AI can benefit from both CPU and GPU improvements. Regular programmers can compile in parallel. It can speed up build times significantly for some folks.
Then there’s specialized cases. The 16 core CPU is enticing for me as i frequently build 3000+ packages for my open source BSD project and need horsepower for that. I currently use a core i7 7770 running vmware esxi as well as a HP Z420 workstation with 8 core cpu and a ryzen 7 2700 for these builds. Building on the core i7 takes 3 days by itself. The ryzen system can do it in a day if i use a memory disk to allow for full utilization of the CPU. I could get a build done in hours potentially with a 16 core and an extra 32GB of RAM.

Your needs are similar to mine. Building software in parallel is a pretty good use case. My only gripe about it is that it addresses the symptoms rather than the root cause. The underlying reason compiling C code is so slow is due to the pre-processor and include files. The amount of code amplification caused by recursive include files is absurd. If we could get away from such bad language designs, that would do away with the need for many cores to tackle slow compilation times.

2019-06-15 11:47 pm

Zan Lynx
Precompiled headers, and not putting absolutely everything into the headers really speed up C and C++ builds.

C++ is really bad about putting almost the entire library into the header though.

Even with C++ it can get pretty fast. I recently discovered “extern template” which tells it that you’ll provide a compiled template for that specialization later. That can really speed things up.
2019-06-16 1:34 pm

Alfman verbose=1
Zan Lynx,

Precompiled headers, and not putting absolutely everything into the headers really speed up C and C++ builds.

C++ is really bad about putting almost the entire library into the header though.

Even with C++ it can get pretty fast. I recently discovered “extern template” which tells it that you’ll provide a compiled template for that specialization later. That can really speed things up.

I’m having trouble finding the reference now, but google published a study about how bad C++ was in terms of code amplification in their builds. What we really need is to get rid of header files completely and replace them with proper modules. Not only would it compile faster, but it would save developers from writing tedious boilerplate headers needed by C. We have alternative languages that are up to task, such as D-lang.

https://dlang.org/articles/pretod.html

C/C++ have a monopoly on system code, but for better or worse it’s not about merit so much as popularity. Everyone including myself uses C simply because everyone else is using it and it’s the best supported language.