If you use an Apple silicon Mac I’m sure you have been impressed by its performance. Whether you’re working with images, audio, video or building software, we’ve enjoyed a new turn of speed since the M1 on day 1. While most attribute this to their Performance cores, as it goes with the name, much is in truth the result of the unsung Efficiency cores, and how they keep background tasks where they should be.
↫ Howard Oakley
While both Intel and AMD are making gains on Apple, there’s simply no denying the reality that Apple’s M series of chips are leading the pack in mobile computing (the picture is different in desktops). There are probably hundreds of reasons why Apple has had this lead for so many years now, but the way macOS distributes background and foreground tasks across the two types of cores in M series chips is an important one.
Still, I wonder how the various other processors that use power and efficiency cores fare in this regard. You’d think they would provide a similar level of benefit, but I wouldn’t be surprised if the way Windows or Linux handles such cores and the distribution of tasks is simply not as optimised or strict as it is in macOS. Apple often vastly overstates the benefits of its “vertical integration”, but I think the tight coupling between macOS and Apple’s own processors is definitely a case where they’re being entirely truthful.

Thom Holwerda,
The author failed to.make this point, but I’m glad you did 🙂
Desktop CPUs are too power hungry for portable applications.
One important reason Mx CPUs do well with things like video editing is because apple creates hardware acceleration for some high intensity tasks instead of relying on general purpose CPU cores to run an equivalent software algorithm. Hardware acceleration provides high performance, but hard coding accelerators in silicon has cons too.
I don’t trust apple’s claims until they’ve been independently verified by third parties. Too often their marketing is deceptive and they rely heavily on controlled media rewarding and punishing reviews based on the substance of the reviews, which is gross behavior from apple. IMHO apple are doing well enough on merits that they shouldn’t need to do this.
The author makes a few statements that I can’t agree with, for example…
It’s simply not true, CPU cores running 100% background load is not generally a problem with intel CPUs. As long as there are other available cores available, then foreground software is still going to be responsive. The bottleneck have less to do with CPU and more to do with software threading models. You get highly responsive interfaces by using dedicated UI threads. This is responsive even under load. By contrast an application that performs task execution in the UI thread will become laggy even though CPU cores are left unused. And the truth is it would be just as laggy running on apple’s Mx CPU.
Not to dismiss P/E core tuning altogether, but if users are experiencing any kind of UI performance latency I honestly think software design is going to be the main culprit. Disabling all the performance cores and just using efficiency cores should still behave responsively with well designed software.
Asahi Linux is trying to work with this too. The best I remember is them using uclamp[1] to schedule pipewire[2][3] only on e-cores. The Linux scheduler is trying to become energy aware too, but keeping the best performance on servers as well as on phones using the same scheduler is IMO a lot. I’d be interested in what kind of magic scheduling apple does to enforce the e-core usage. Of course controlling everything on the OS makes it easy.
Both Intel’s and AMD’s big.little CPUs could definitely benefit from this too.
[1] https://www.kernel.org/doc/html/latest/scheduler/sched-util-clamp.html
[2] https://asahilinux.org/2024/01/fedora-asahi-new/
[3] https://github.com/AsahiLinux/asahi-audio/commit/ccd5286dcd51501aea1acce352227eefa199a0e6
Serafean,
I don’t think it’s that important that the same scheduler be used between a phone a server. However the idea of prioritizing interactive tasks doesn’t seem all that different. The problem for a scheduler is that most software doesn’t declare this information to the OS, most applications just naively dispatch threads to the OS and the scheduler is left to heuristically guess what threads are important. Technically task priorities are available for use by the software, but sometimes administrators need to manually “nice” heavy CPU tasks such as on servers that have big jobs to run but are still responsible for serving web requests.
It’s really imperative to get the priorities right when all the cores are loaded. Things like large compiler jobs, blender simulations, and LLMs etc can do that.
I actually don’t think that foreground necessarily need P-cores though. E-cores can be perfectly suited to UIs where low latency is much more important than raw horsepower.
Weirdly, (and i only just made the connection) i always thought a computer designed similarly to this would be a game changer. However (and bear in mind i came up with this over a decade ago) the idea would be to pair a low power ARM CPU to run background tasks, and a high end Intel CPU to run applications or foreground tasks. I considered the architecture difference would make engineering difficult, but not impossible.
I never considered at the time using the /same/ architecture for this. But thinking back on it, the low power Atoms or those tiny x86 parallel processing cores (phi), and a plentiful-core I7 would have done exactly the same thing, without the hassle of trying to run code for two different CPU architectures.
Turns out that was the end-game all along. Except the P cores and E cores are integrated onto the same CPU package. It also turns out the performance gains were mostly achievable too.
The123king,
What a horrendous monster that would be…I want one 🙂
A dual socket CPUs approach might make it possible to build with off the shelf CPUs. But if you’re thinking about combining them in the same custom chip, then I imagine it could be quite difficult to get the IP rights to do that, even engineering efforts aside.
big.LITTLE scheduling has been worked on for ages.
Windows scheduler for hybrid architectures, esp intel thread director, is traditionally worse than both macos/ios and linux/android.
In windows-land there are some added complication compared to linux/mac: Windows uses a complex/reactive heuristic system to guess which threads belong on P vs E cores. For x86 hybrid archs, linux provides more predictable/fair scheduling.
FWIW macos supports semantic intent via “QoS” model, the programmer can tell the OS exactly how critical their threads are. This allows the scheduler to make more accurate speculation on which cores to use. Windows in contrast tries to infer that after the fact.
Windows can have higher background service usage, which confuses the scheduler placing background tasks on P-cores. Wintel also has to support backwards compatibility with non-hybrid hardware, which can hinder the optimization for Alder Lake/Raptor Lake/et al. Incidentally windows on ARM has much better overall scheduling performance and power efficiency, tracking macos on QCOM Elite X SKUs.
Overall Linux exhibits the lowest latency and higher multi-core efficiency, outperforming Windows and macos.