Benchmarks Archive
The Sophon SG2042 is the world’s first commodity 64-core RISC-V CPU for high performance workloads and an important question is whether the SG2042 has the potential to encourage the HPC community to embrace RISC-V. In this paper we undertaking a performance exploration of the SG2042 against existing RISC-V hardware and high performance x86 CPUs in use by modern supercomputers. Leveraging the RAJAPerf benchmarking suite, we discover that on average, the SG2042 delivers, per core, between five and ten times the performance compared to the nearest widely available RISC-V hardware. We found that, on average, the x86 high performance CPUs under test outperform the SG2042 by between four and eight times for multi-threaded workloads, although some individual kernels do perform faster on the SG2042. The result of this work is a performance study that not only contrasts this new RISC-V CPU against existing technologies, but furthermore shares performance best practice. ↫ Nick Brown, Maurice Jamieson, Joseph Lee, Paul Wang The Sophon SG2042 is the RISC-V processor found in the Milk-V Pioneer workstation, which was recently featured on LTT as well, for the video crowd among us. There’s definitely still a way to go for RISC-V, but the gains over the past few years are clear, and if this keeps progressing this way, it won’t be long before RISC-V becomes a valid, competitive architecture.
Out of 72 benchmarks ran in total on both operating systems with the Lenovo ThinkPad P14s Gen 4, Ubuntu 23.10 was the fastest about 64% of the time. If taking the geometric mean of all the benchmark results, Ubuntu 23.10 comes out to being 10% faster than the stock Windows 11 Pro install as shipped by Lenovo for this AMD Ryzen 7 PRO 7840U laptop. I recently bought a laptop, and the stock Windows installation – free of OEM crapware, which was a welcome surprise – opened applications and loaded webpages considerably slower than Fedora KDE did. This has not always been the case, and I’m pleasantly surprised that while the desktop Linux world has focused a lot on performance, Microsoft was busy making Windows even less pleasant than it already was. I wouldn’t be surprised if across all price/performance levels, Linux is faster and snapper than Windows – except maybe at the absolute brand-new high-end, since AMD, Intel, and NVIDIA entirely understandably focus on Windows performance first.
Putting the Raspberry Pi Zero against the MangoPi MQ Pro was something I’d wanted to do since seeing the MQ Pro’s announcement and specifications. It just seemed to make sense. On paper, they’re largely similar, with 1GHz single-core CPUs, and 512MB of RAM. A 1GB MQ Pro is also available and is what I’ll be using here so your mileage may vary slightly if you have the 512MB version. So what happens when you pit these relatively similar single-board computers against each other? That also ignores the price side of things. The Raspberry Pi Zero W retails for around £10GBP (keep an eye out on rpilocator if you’re currently in the market) in the UK through authorised retailers whereas the Mango Pi MQ Pro 1GB model tested here will run you around £23 if you manage to get one through the official store when they have stock (these prices both include GST/VAT at 20%). At £23 I still think it’s worth it to get your hands on a small RISC-V based board that offers twice as much, faster RAM and better performance in a lot of areas but if you’re purely interested in the price this may not appeal to you. RISC-V is steadily progressing, and if a relatively low performance boards like this aren’t your thing, there’s more powerful boards incoming, such as Pine64’s Star64.
On the competitive landscape, Ampere is carving out its niche for the moment, but what happens once AMD or Intel increase their core counts as well? A 50% increase in core counts for next-gen Genoa should be sufficient for AMD to catch up with the M128 in raw throughput, and technologies such as V-cache should make sure the HPC segment is fully covered as well, a segment Ampere appears to have no interest in. Intel now has an extremely impressive smaller core in the form of Gracemont, and they could easily make a large-core count server chip to attack the very segment Ampere is focusing on. Only time will tell if Ampere’s gamble on hyper-focusing on certain workloads and market segments pays out. For now, the new Altra Max is an interesting and very competent chip, but it’s certainly not for everyone. Admit it. You too want a 128-core ARM processor on your desk.
ARM has introduced the Neoverse N1 platform, the blueprint for creating power-efficient processors licensed to institutions that can customize the original design to meet their specific requirements. Ampere licensed the Neoverse N1 platform to create the Ampere Altra, a processor that allows companies that own and manage their own fleet of servers, like ourselves, to take advantage of the expanding ARM ecosystem. We have been working with Ampere to determine whether Altra is the right processor to power our first generation of ARM edge servers. The AWS Graviton2 is the only other Neoverse N1-based processor publicly accessible, but only made available through Amazon’s cloud product portfolio. We wanted to understand the differences between the two, so we compared Ampere’s single-socket server, named Mt. Snow, equipped with the Ampere Altra Q80-30 against an EC2 instance of the AWS Graviton2. Cloudflare compared these two ARM server platforms and benchmarked them, and they give a ton of detail about them, too. Give it a few more years, and ARM will be a decidedly normal sight within data centres all over the world.
In 2016, through a series of joint ventures and created companies, AMD licensed the design of its first generation Zen x86 processors to be sold into China. The goal of this was two-fold: China wanted a ‘home grown’ solution for high-performance x86 compute, and AMD at the time needed a cash injection. The outcome of this web of businesses was the Hygon Dhyana range of processors, which ranged from commercial to server use. Due to the Zen 1 design on which it was based, it has been assumed that the performance was in line with Ryzen 1000 and Naples EPYC, and no-one in the west has publicly tested the hardware. Thanks to a collaboration with our friend Wendell Wilson over at YouTube channel Level1Techs, we now have the first full review of the Hygon CPUs. This is such an intriguing story. This specific joint venture – underreported and unknown to many in the west – may prove invaluable to Chinese own tech sector for years to come.
AnandTech benchmarks the two nearly identical Surface Laptop 3s from Microsoft – one with AMD’s latest mobile processor and GPU, and the other one with Intel’s. They conclude: There aren’t too many ways to sugar coat the results of this showdown though. AMD’s Picasso platform, featuring its Zen+ cores and coupled with a Vega iGPU, has been a tremendous improvement for AMD. But Intel’s Ice Lake platform runs circles around it. Sunny Cove cores coupled with the larger Gen 11 graphics have proven to be too much to handle. That being said, much like the first desktop Ryzen processors being a huge leap forward for AMD without closing the gap with Intel at the time, the Picasso platform seems to repeat the feat in laptops. It was fantastic to see AMD get a design win in a premium laptop this year, and the Surface Laptop 3 is going to turn a lot of heads over the next year. AMD has long needed a top-tier partner to really help its mobile efforts shine, and they now have that strong partner in Microsoft, with the two of them in a great place to make things even better for future designs. Overall AMD has made tremendous gains in their laptop chips with the Ryzen launch, but the company has been focusing more on the desktop and server space, especially with the Zen 2 launch earlier this year. For AMD, the move to Zen 2 in the laptop space can’t come soon enough, and will hopefully bring much closer power parity to Intel’s offerings as well. I can’t wait to see what AMD can offer consumers in the laptop space over the coming years. If it’s going to be a repeat of the desktop space, we’re going to be in for some seriously good times.
Today something happened that many may not have seen. Intel published a set of benchmarks showing its advantage of a dual Intel Xeon Platinum 9282 system versus the AMD EPYC 7742. Vendors present benchmarks to show that their products are good from time-to-time. There is one difference in this case: we checked Intel’s work and found that they presented a number to intentionally mislead would-be buyers as to the company’s relative performance versus AMD. Intel is desperate, and it’s really starting to show.
A library that I work on often these days, meshoptimizer, has changed over time to use fewer and fewer C++ library features, up until the current state where the code closely resembles C even though it uses some C++ features. There have been many reasons behind the changes – dropping C++11 requirement allowed me to make sure anybody can compile the library on any platform, removing std::vector substantially improved performance of unoptimized builds, removing algorithm includes sped up compilation. However, I’ve never quite taken the leap all the way to C with this codebase. Today we’ll explore the gamut of possible C++ implementations for one specific algorithm, mesh simplifier, henceforth known as simplifier.cpp, and see if going all the way to C is worthwhile.
AnandTech benchmarked the new RTX graphics cards, and concludes:
So where does that leave things? For traditional performance, both RTX cards line up with current NVIDIA offerings, giving a straightforward point-of-reference for gamers. The observed performance delta between the RTX 2080 Founders Edition and GTX 1080 Ti Founders Edition is at a level achievable by the Titan Xp or overclocked custom GTX 1080 Ti’s. Meanwhile, NVIDIA mentioned that the RTX 2080 Ti should be equal to or faster than the Titan V, and while we currently do not have the card on hand to confirm this, the performance difference from when we did review that card is in-line with NVIDIA's statements.
The easier takeaway is that these cards would not be a good buy for GTX 1080 Ti owners, as the RTX 2080 would be a sidegrade and the RTX 2080 Ti would be offering 37% more performance for $1200, a performance difference akin upgrading to a GTX 1080 Ti from a GTX 1080. For prospective buyers in general, it largely depends on how long the GTX 1080 Ti will be on shelves, because as it stands, the RTX 2080 is around $90 more expensive and less likely to be in stock. Looking to the RTX 2080 Ti, diminishing returns start to kick in, where paying 43% or 50% more gets you 27-28% more performance.
Neither of the two new RTX cards seem to be particularly smart purchases at this point - the 2080 barely performs any better than a 1080 Ti, and while the 2080 Ti does offer a decent performance improvement over the 1080 Ti, it's also $1200. You might want to wait to see if NVIDIA's raytracing efforts pay off and gets adopted in video games, and if said raytracing features don't suck too much performance.
Does anyone remember our articles regarding unscrupulous benchmark behavior back in 2013? At the time we called the industry out on the fact that most vendors were increasing thermal and power limits to boost their scores in common benchmark software. Fast forward to 2018, and it is happening again.
Companies lie. They lie all the time. As with anything related to performance measuring and comparisons - wait for trusted third party benchmarks from places like AnandTech and GamersNexus. Company-provided figures are almost always anything from unrealistic best-case scenarios at best, or downright lies at worst.
It's well-known that you should measure the performance of your code, and not rely only on the opcode's "cycle counts".
But how fast is an IBM PC 5150 compared to a PCjr? Or to a Tandy 1000? Or how fast is the Tandy 1000 HX in fast mode (7.16Mhz) compared to the slow mode (4.77Mhz)? Or how fast is a nop compared to a cwd?
I created a test (perf.asm) that measures the performance of different opcodes and run it on different Intel 8088 machines. I run the test multiple times just to make sure the results were stable enough. All interrupts were disabled, except the Timer (of course). And on the PCjr the NMI is disabled as well.
There's no point in any of these benchmarks, but that doesn't make them any less interesting.
The "Bionic" part in the name of Apple's A11 Bionic chip isn't just marketing speak. It's the most powerful processor ever put in a mobile phone. We've put this chip to the test in both synthetic benchmarks and some real-world speed trials, and it obliterates every Android phone we tested.
As far as SoCs go, Apple is incredibly far ahead of Qualcomm and Samsung. These companies have some serious soul-searching to do.
I can't wait for AnandTech to dive into the A11 Bionic, so we can get some more details than just people comparing GeekBench scores.
Intel's latest 10-core, high-end desktop (HEDT) chip - the Core i9-7900X - costs £900/$1000. That's £500/$500 less than its predecessor, the i7-6950X. In previous years, such cost-cutting would have been regarded as generous. You might, at a stretch, even call it good value. But that was at a time when Intel's monopoly on the CPU market was as its strongest, before a resurgent AMD lay waste to the idea that a chip with more than four cores be reserved for those with the fattest wallets.
AMD's Ryzen is far from perfect. But when you can buy eight cores that serve even the heaviest of multitaskers and content creators for well under half the price of an Intel HEDT chip, i9 and X299 are a hard sell (except, perhaps, to fussy gamers that demand a no-compromises system).
The question is: Are you willing to pay a premium for the best performing silicon on the market? Or is Ryzen, gaming foibles and all, good enough?
I've said this countless times, but I want to keep bringing this one home: this is what competition does. It lowers prices, improves performance, and makes Intel looks like a stumbling fool. And what better day to celebrate the benefits of competition than today?
Cheers, America. Party safe!
So there you have it. As of October 4, Google Now has a clear lead in terms of the sheer volume of queries addressed, and more complete accuracy with its queries than either Siri or Cortana. All three parties will keep investing in this type of technology, but the cold hard facts are that Google is progressing the fastest on all fronts.
Not surprising, really, considering Google's huge information lead. Still, I have yet to find much use for these personal assistants - I essentially only use Google Now to set alarms and do simple Google queries, but even then only the English ones that do not contain complicated names.
With the exception of Apple and Motorola, literally every single OEM we've worked with ships (or has shipped) at least one device that runs this silly CPU optimization. It's possible that older Motorola devices might've done the same thing, but none of the newer devices we have on hand exhibited the behavior. It's a systemic problem that seems to have surfaced over the last two years, and one that extends far beyond Samsung.
Pathetic, but this has been going on in the wider industry for as long as I can remember - graphics chip makers come to mind, for instance. Still, this is clearly scumbag behaviour designed to mislead consumers.
On the other hand, if you buy a phone based on silly artificial benchmark scores, you deserve to be cheated.
"During the 4th Semester of my studies I wrote a small 3d spaceship deathmatch shooter with the D-Programming language. It was created within 3 Months time and allows multiple players to play deathmatch over local area network. All of the code was written with a garbage collector in mind and made wide usage of the D standard library phobos. After the project was finished I noticed how much time is spend every frame for garbage collection, so I decided to create a version of the game
which does not use a GC, to improve performance."
The
latest browser benchmarks are in... again - seems like there's a new one every week. This is one of the best "browser battle" articles though. Chrome 13, Firefox 6, IE9, Opera 11.50, and Safari 5.1 are put through 40-something tests on both Windows 7 and Mac OS X Lion. As a PC guy I was pretty impressed with the performance of Safari on OS X, and the reader feature looks awesome too. The author also uncovered a nasty Catalyst bug that makes IE9 render pages improperly and freeze up under heavy loads of tabs. The tables at the end pinpoint the strengths and weaknesses of each browser, which is nicer than a 1-10 or star rating. Good article, and thorough.
Phoronix has
conducted some preliminary benchmarks, comparing Debian GNU/Hurd to Debian GNU/Linux. "There was only a handful of tests that could be successfully run under Debian GNU/Hurd and in those results the numbers were generally close, though Debian GNU/Linux was running about 4% faster in some and with the MP3 encoding the Linux OS was nearly 20% faster. Debian GNU/Hurd is an interesting project but for now its support is still in shambles, the hardware support is vastly outdated, and there is also no SMP support at this time. Regardless, it will be interesting to see how Debian GNU/Hurd turns out for the 7.0 Wheezy milestone."
"
Google has released a
research paper that suggests C++ is the best-performing programming language in the market. The internet giant implemented a compact algorithm in four languages - C++, Java, Scala and its own programming language Go - and then benchmarked results to find 'factors of difference'."