Apple’s M1 Pro, M1 Max SoCs investigated: new performance and efficiency heights

Thom Holwerda 2021-10-25 In the News 42 Comments

On the CPU side, doubling up on the performance cores is an evident way to increase performance – the competition also does so with some of their designs. How Apple does it differently, is that it not only scaled the CPU cores, but everything surrounding them. It’s not just 4 additional performance cores, it’s a whole new performance cluster with its own L2. On the memory side, Apple has scaled its memory subsystem to never before seen dimensions, and this allows the M1 Pro & Max to achieve performance figures that simply weren’t even considered possible in a laptop chip. The chips here aren’t only able to outclass any competitor laptop design, but also competes against the best desktop systems out there, you’d have to bring out server-class hardware to get ahead of the M1 Max – it’s just generally absurd.
On the GPU side of things, Apple’s gains are also straightforward. The M1 Pro is essentially 2x the M1, and the M1 Max is 4x the M1 in terms of performance. Games are still in a very weird place for macOS and the ecosystem, maybe it’s a chicken-and-egg situation, maybe gaming is still something of a niche that will take a long time to see make use of the performance the new chips are able to provide in terms of GPU. What’s clearer, is that the new GPU does allow immense leaps in performance for content creation and productivity workloads which rely on GPU acceleration.

These are excellent processors and GPUs, especially when taking their power consumption into account. Sure, a lot of it is optimised only for Apple’s approved frameworks and applications, but if you’re deep into the Apple ecosystem, these are simply no-brainer machines for any creator.

About The Author

Thom Holwerda

Follow me on Mastodon @[email protected]

42 Comments

2021-10-25 3:07 pm

SamAskani
I’m a very niche segment here really aiming to get one of these processors. I do a ton of HPC and I have backends for our code using CUDA, OpenCL and Metal. Our GPU code is around at least 2 orders of magnitudes faster than a top-class Xeon processor. The investment in Metal came precisely when Apple started to announce plans to leave behind OpenCL. At the time, it wasn’t fully clear the transition to Apple Silicon, but as Mac systems are important in my work, we did the work to support Metal in AMD GPUs, and the results were quite good. An AMD Pro W6800 using accelerated computing with Metal is neck to neck in performance to an Nvidia RTX A6000 with CUDA. The RTX A6000 edges the W6800 in the sense it has 48 GB RAM vs 32 GB with the W6800. In my work, every GB on the GPU counts dramatically. When the M1 was announced, we were quite excited as plans started to appear those systems would be eventually moving to high-end versions, increasing the memory available to the GPU.

Early benchmarks place the M1 Max in the ballpark of the RTX 3070-3080, which I’d take any day as much I have more memory, and that is where the 64 GB version comes into play. I’d take a lower performance of the RTX 20XX family just to have more memory. So I already ordered one system last Mon seconds after the Apple store was taking orders. If the M1 Max is truly on the ballpark of RTX 3070-3080… that is a huge accomplishment for Apple. And I couldn’t wait for any second to get whatever processor may come in the pipeline for the Mac Pro. Beyond video specialists who will benefit out of the box, I can see the scientific community losing their minds with the possibilities that this new generation of CPU+GPU may bring. I’d be NVidia and I’d be quite nervous that Apple may get into their market of HPC with GPU , where NVidia has pretty much total dominance, but using insanely expensive and complex technology to interface their GPUs to provide more memory, while still depending on having a high-end CPU to sync all the GPU work and memory transfers. Exciting times.

2021-10-25 8:10 pm

Xanady Asem
I don’t think NVIDIA is losing any sleep over Apple in regards to HPC.
Its just not a makes Apple is interested at all. Plus CUDA is too entrenched on the HPC and datacenter.

The amount of power Apple is packing on their SoCs is disruptive, IMO, but mostly for creative applications. For video editors based on ProRes, I read the M1 Max can do several concurrent 8K timelines. Which is crazy on a mobile form factor. Basically you have the editing right next to where the content is created.

I’m interested in seeing where Apple is going for the MacPro. As they still can double the die area in the current 5n process, and can put 2 of those huge dies on the same package using the new TSMC interporser. So you could have a Desktop SoC with 40 cores and 64GPU cores, and 256GB of on package memory. Which would be nuts.
2021-10-26 4:10 am

The123king
And I couldn’t wait for any second to get whatever processor may come in the pipeline for the Mac Pro.

I’m VERY interested in whatever happens to the Mac Pro. Not because i’m going to buy one, but purely from an implementation standpoint. I’d like to see it maybe built like the older backplane-based machines of the 60’s and 70’s, with slide in modules (kinda like “blades”) with fully upgradable CPU/RAM modules, as well, of course, a plethora of PCIe slots. I think standard DIMM modules are dead for Macs, but it would be very nice if they had a standard chassis where all the actual heavy-lifting modules could be swapped in and out on demand and upgraded when newer hardware is released.

That was one of the greatest features of the QBUS PDP-11’s. It was incredibly easy to swap in a newer, more powerful CPU, swap in faster and more RAM, as well as add peripheral controllers for things like disk, tape and serial/parallel connections. This kind of modularity is long gone in the computing world, but really made it easy to upgrade an existing system piecemeal, or roll-your-own system with only the components you required.

If Apple came out with a similar, modular chassis, and backplane setup, with every component modular and replaceable, i think they’d have a very long-lived system architecture that thousands of content creators would flock to. Even better if i they made it a (semi) open standard, allowing third parties to create hardware modules for it as well.

2021-10-26 8:20 am

richterlevania3
Hahaha, that will NEVER happen.

2021-10-26 11:43 am

The123king
A man can dream 🙂

2021-10-25 3:53 pm

Anonymous
Most people don’t need more processing or GPU power. You may as well offload all of that to an open standard interface running on an external box, distributed network, or remote provider such as another business with spare capacity or a “cloud” provider.

Of course they won’t do that as like washing machines your laptop/desktop would last 10-20 years or more.

2021-10-26 7:01 am

Morgan
I kind of see it the other way around; with the M1 Pro/Max chips these laptops should (as long as the other components hold up) last at least 10 years or more and still be viable. We’re approaching the limits of what is possible with silicon, and that is evident in the fact that the new M1 chips are effectively just doubled/quadrupled original M1 chips with a wider memory bus. They are throwing more cores at an existing design on the edge of the silicon cliff, and there isn’t much more innovation left to squeeze out of the design.

Until we’ve found a substrate that allows for another huge leap in performance, Apple will continue to throw more cores and memory at the problem so we end up with a Mac Pro sporting 8 or 16 M1 chips worth of silicon, with 128 or more GPU cores and 256+ GB of RAM. That will end up being your “external box” for truly massive computing requirements, and the laptops and desktops with the Pro/Max line will handle the everyday tasks. The fact that they can already handle seven streams of 8K video simultaneously, outperforming the top spec $12k Mac Pro [1], is simply mind boggling.

Unless Intel and AMD can ramp up their silicon tech in the next couple of years, they will be left behind for the foreseeable future. I really hope they do step up, because Apple shouldn’t be the only choice when it comes to beyond HPC levels of performance in consumer products.

[1] https://appleinsider.com/articles/21/10/24/apple-execs-excited-about-m1-max-macbook-pro-video-editing-capabilities

2021-10-26 7:36 am

Anonymous
Different users have different needs. This is why you need to look at things from the point of view of what problem you are trying to solve. This also explains why I’m taking a position on API support, extensibility, and compatibility.

I don’t play games nor have a need to process demanding assets like mathematical model or art assets nor do most businesses. Now I appreciate some people do both as individuals through to massive corporate efforts but for most regular people this is nothing more than dick measuring. I’ve noticed a few people in this thread pile on with embracing spec bumps but they forget they are seeing it from the point of view of their professional needs. We’re talking probably 1% of the population.

Apart from browsing web sites and watching videos and occasionally processing photos and videos I’m not doing anything with my computer that a Windows 95 era computer couldn’t do. Almost any modern dual core CPU is good enough. I have a camera which will do 4K but this is wasted as for final production I wouldn’t target higher than 720p (1080p at a push but it’s not necessary). For prints mega resolution is also wasted but I appreciate there are artists who need 100mbpx.

What’s really happening is a form of capitalistic socialism where a small number of people who want more and more are relying on spreading the cost across the rest of the market.

2021-10-26 1:22 pm

The123king
What’s really happening is a form of capitalistic socialism where a small number of people who want more and more are relying on spreading the cost across the rest of the market.

No, it’s called “progress”. If everyone had the view that current technology was “good enough”, computers would still be built from valves and use Williams Tubes for RAM.

2021-10-27 9:28 am

Bill Shooter of Bul Platinum Prime
Yeah, I used to have that opinion as well, but after more reflection, I think there is still more innovation to be had even with the same cpu core. The M1’s are systems on a chip, which not just includes things like the gpu and memory on die, but allows for more experimentation with how they interact with each other. Of course, there is always room for refinement of the cpu core itself, but that takes longer. Intel was doing it well with the whole tick tock cycle, but I’m not sure what the development lifecycles are going to be for mac desktop processors. They clearly have a hit with the M1 that everyone loves, but in terms of profit, they may be better off increasing iphone sales instead. I remain skeptical of their long term dedication to traditional computer forms ( imac, mac mini, the laptops, the pro).

2021-10-27 10:03 am

Anonymous
Apple had their moment to drive change in the industry and shift things away from Windows and towards opens standards.There was a lot of anticipation and goodwill there after the shift to Intel but Job’s blew it. Tim Cook the bean counter is all mouth and no trousers.

Measured by sales Apple today are no more than a media conglomerate who sell hardware as a dongle. Will they or won’t they do an electric car? I really couldn’t care.

We have to move towards modular systems and repairability and open standards both for the environment and to de-carbonise. That is a hard target and every single day which passes makes it harder to change.

The fact I have “flagship” 4G phones (with replaceable batteries) which has the hardware and depending on which way the wind is blowing the firmware to do VoLTE but for UK telecoms company and Samsung greed is really irking me.
2021-10-27 10:49 am

Alfman verbose=1
Bill Shooter of Bul,

They clearly have a hit with the M1 that everyone loves, but in terms of profit, they may be better off increasing iphone sales instead. I remain skeptical of their long term dedication to traditional computer forms ( imac, mac mini, the laptops, the pro).

For better or worse, the M1 is being used to push apple’s vertical integration. Other computer builders can’t have it and you as a consumer can’t have it either unless you buy into the apple ecosystem, which many people aren’t interested in. Even if I were an apple user, I would be disappointed that this ecosystem is slowing being locked down by apple to benefit themselves.

One of the catch-22s in computing is that we’re so dependent on corporations to build our computers, yet their profit incentives lead them to add restrictions that make their platforms worse. This is why I prefer linux these days. It’s not free from controversy by any means and there are problems, but I do think the project’s incentives are much less maligned with the community. I have more faith in Linux distros over microsoft and apple platforms to not seek to restrict what owners can do on their computers.

2021-10-27 2:41 pm

Bill Shooter of Bul Platinum Prime
The vertical integration can be great and lead to some really cool things, as M1 has shown, but of course it can lead to silly situations where competition is really locked out. Right now I think that’s were we are now. M1 is nice, but how can oems work with similar soc’s? There should be a non mac standard for how to connect a soc like the M1 to otherwise standard PCs. There are x86 based socs out there as well, built for the gaming consoles, but to get them to work with windows, they had to have a more conventional pc ish design.
2021-10-27 4:54 pm

Alfman verbose=1
Bill Shooter of Bul,

The vertical integration can be great and lead to some really cool things, as M1 has shown, but of course it can lead to silly situations where competition is really locked out.

I don’t see vertical integration being that important. I mean, the same M1 CPUs would sell well in mass quantities to other ARM device makers and nothing about it needs to be tethered to apple products. To me that’s a business decision rather than a technical one.

Right now I think that’s were we are now. M1 is nice, but how can oems work with similar soc’s? There should be a non mac standard for how to connect a soc like the M1 to otherwise standard PCs.

I agree that would be nice, but it’s doubtful that apple would be willing share the M1 with competing device makers. For better or worse I don’t know that any other ARM device makers have the resources to attempt to compete with apple. Apple likely has priority at TSMC and it’s even possible they’ve negotiated some exclusive terms for their latest CPU manufacturing processes.

There are x86 based socs out there as well, built for the gaming consoles, but to get them to work with windows, they had to have a more conventional pc ish design.

I don’t think SoC is all that important actually. I suspect even apple is going to end up changing course and separating the GPU and CPU in future designs in order to make competitive GPUs.

2021-10-25 5:41 pm

sukru
The question is the API.

The GPU is really powerful on paper. And I am sure they will integrate well with Adobe products.

On the other hand OpenGL and OpenCL are officially deprecated a few years ago: https://appleinsider.com/articles/18/06/04/opengl-opencl-deprecated-in-favor-of-metal-2-in-macos-1014-mojave

And obviously CUDA, DirectX are not even in the picture. And Vulkan only comes via a third party: https://www.phoronix.com/scan.php?page=news_item&px=Apple-Silicon-Vulkan-MoltenVK

And Proton/Wine will only work through emulation: https://www.reddit.com/r/macgaming/comments/lyv35d/future_of_steam_on_the_m1/

That leaves out many opportunities for open programming on that platform. Yes, “Metal” is available, but will people really spend the additional effort to optimize on that platform? Or will they just port the software to bare minimum essentials?

I am once again on the brink of ordering an Apple product, and once again, I decide it is not worth the effort. No proper Linux, No more Windows, No public APIs means I would be pretty much locked in to whatever Apple decided to give us.

The only hope is that this would be the proper kick for other platforms to actually clean up their acts.

2021-10-25 6:30 pm

SamAskani
As someone who went the rabbit hole of Metal for compute kernels… if you are versed on CUDA and OpenCL, the differences in programming are very minimal for regular kernels. Programmatically, all 3 (CUDA, OpenCL and Metal) are just variants on “how to prepare data for the GPU, copy data back and forth to the GPU and run a kernel), the biggest innovation in Metal in Apple Silicon is the opportunity to eliminate completely the data transfers. Still, to be confirmed if we can access directly in a Metal compute kernel the memory coming directly, for example, from a Numpy array, that would be so awesome. There are a couple of examples how you can interface Swift+Metal in Python, and from there, you have a ton of possibilities. The big missing pieces on Metal are more advanced libraries that not even AMD offers for OpenCL such as FFT with CUDA that really shines. There are some important considerations, but if you have worked on OpenCL and CUDA, you will know the devil is in the details. For example, some OpenCL driver’s implementations can’t address more than 32 bits of memory, which is stupid when your card has more than that these days. Please note this 32 bit limitation is only for OpenCL kernels, not for other GPU operations. I have that problem with OpenCL in Macs, quite annoying, but with that limitation is not present with Metal, you can access all the 32 GB RAM in a GPU fully dedicated for computing in an external enclosure that is not displaying anything, Metal has a weird limitation that you have up to 32 input buffer arrays where each can’t have more than 2.5 GB in size (go and ask S Jobs on his grave why that weird limit), still, 32 x 2.5 GB = 80 GB of buffer memory for GPU calculations, which is more than the M1 MAX has in total memory, Metal (as it was CUDA back in its early days) has had important upgrades in the first iteration of versions, so I’d not be surprised these limitations would be relaxed as newer generations of Apple Silicon continue to evolve. One limitation so far is that Apple is not invested too much (differently to Nvidia with CUDA) in providing much more advanced compute examples with Metal. But I can see the open-source community will start jumping on board now there is a real motivation to put the effort, I can see one mile away a pyMetal will be developed and sit next to pyOpenCL and pyCuda, and from there, it is going be serious fun.

2021-10-25 8:09 pm

Alfman verbose=1
SamAskani,

As someone who went the rabbit hole of Metal for compute kernels… if you are versed on CUDA and OpenCL, the differences in programming are very minimal for regular kernels. Programmatically, all 3 (CUDA, OpenCL and Metal) are just variants on “how to prepare data for the GPU, copy data back and forth to the GPU and run a kernel), the biggest innovation in Metal in Apple Silicon is the opportunity to eliminate completely the data transfers.

I’ve done both Cuda and opencl. While they’re variations on the same ideas, they’re not really the same and there’s a lot of nuance. Unless you stick to the high level abstraction libraries (or write your own), I’d agree with sukru that it’s not necessarily a trivial port especially if you already have a lot of code invested in one platform. Unlike portable C code, you’re now looking at writing, debugging, optimizing, supporting bug reports for multiple targets. It’s not like a normal software port than can use the same code everywhere. And because these targets support different sets of features it’s may not end up being a one to one code mapping, meaning the code may end up being vastly different. I guess you can use the greatest common denominator, but you may be missing out of some of the interesting GPGPU features.

While the M1 max can offer good specs for a laptop, there’s no upgrade path today if you want to transfer your GPGPU work into a desktop with more cores and more powerful GPUs. In the past apple users could plug in eGPUs and get more performance that way, but apple does not support this with any M1 macs. So for the moment at least it seems like the M1 max is the highest you can go. With any other platform there are upgrade paths with higher performing GPUs for consumers and even enterprise.

But I can see the open-source community will start jumping on board now there is a real motivation to put the effort, I can see one mile away a pyMetal will be developed and sit next to pyOpenCL and pyCuda, and from there, it is going be serious fun.

I’m still skeptical because the benchmarks in the article still had a lot of gaps. Beyond the synthetics only a couple real world cross platform applications were benchmarked and they didn’t perform anywhere as well as the specs or synthetic benchmarks suggested. We don’t know whether these are representative of real world performance in general, or they could just be anomalies, Time will tell as we get more numbers in.

Still, I am in agree that we need more competition. It pushes everyone to try harder. I heard that when the rtx 3080 ti was nearing market nvidia was so close to releasing the 3080ti with 20GB of ram and had working prototypes and everything anticipating a breakthrough by AMD that didn’t happen, so they lowered the specs.
https://www.tweaktown.com/news/81517/nvidia-geforce-rtx-3080-ti-with-20gb-is-real-100mh-mining-power/index.html

This is why we need competition.

2021-10-25 9:21 pm

Sabon
“While the M1 max can offer good specs for a laptop, there’s no upgrade path today if you want to transfer your GPGPU work into a desktop with more cores and more powerful GPUs. In the past apple users could plug in eGPUs and get more performance that way, but apple does not support this with any M1 macs. So for the moment at least it seems like the M1 max is the highest you can go. With any other platform there are upgrade paths with higher performing GPUs for consumers and even enterprise.”

Apple is JUST STARTING this transition to Apple silicon. While they may not be eGPU support now. As time goes on I’m pretty sure there will be support for this.

As for M1 max is the highest you can go. You are trying to make us laugh, right?

They haven’t introduced the rest of their medium tier computers yet with M1 chips. There are still all the iMacs and the upper end Mac Minis. These are all still to come. Maybe an iMac Pro. Then the rumored small Mac Pro and the large Mac Pro.

Only the lowest hanging fruit has been moved to M series chips. Within the next 18 months we will see the rest of the computers getting their chips and then they will start dipping (sic) into things like eGPUs and so forth later.

Right now the top end is 10 CPUs, up to 32 GPUs and up to 64 GBs of RAM. These are still the LOW END hanging fruit.

I expect the 30″ Macs (that’s what I’m predicting) will have this as the low end and probably up to 20 CPUs with 64 GPUs and 128 GBs of RAM with a Pro version with 40 CPUs, 128 GPUs and 256 GBs of RAM or more.

Then there is the Mac Pro mini and the Mac Pro Maxi and Maybe a Mac Pro Extreme! This could have 50 CPUs or more, maybe 256 GPUs and 8 TBs of RAM … OR MORE!

I expect Apple to become King Kong the Mac Pro Extreme! (I don’t know what they are going to call it), but I think they are going to really put a stamp down on what they can do with their most powerful computer. It might cost $40,000 and only be used for gaming companies with massive video files/graphics to render. But I think it will make Intel look like their best computer is a Gameboy in comparison.

Or do they hold back? It’s going to be interesting. I really think it will. But the limitations that you think you will be stuck with Apple. I think that they will be coming out with something where it is going to take years before any eGPUs will keep up with the Mac Pro. It will have enough RAM and power beyond what you will be able to think of for a while.

And games? Apple just might want all gamers to convert to Apple computers.

Okay. So this is a fantasy. But it’s based on something that is possible –right now– with Apple can produce. The question is if they will want to make a huge statement or not.

2021-10-25 10:04 pm

sukru
Alfman,

I did not know that nVidia had such a prototype. However they are known to artificially cripple their hardware, sometimes in drivers (why not run on virtualization at all?). And they managed to sell more GPUs to miners by reducing the profitability, but not eliminating it.

Yet, M1 Pro/Max is still good news. Even if I won’t buy the hardware today, one of these things became likely:

1) As Sabon hopes, there will be a lot of demand for the hardware, compelling Apple to open up the APIs to public
2) Other manufacturers will finally produce better competition, which are already open

Of course, this could go backwards and everyone could start producing proprietary systems.
2021-10-26 1:43 am

Alfman verbose=1
sukru,

1) As Sabon hopes, there will be a lot of demand for the hardware, compelling Apple to open up the APIs to public

To me that seems too contrary to apple’s DNA. It would probably require a change in leadership and I really don’t think they’re interested in open software or hardware standards. I mean they had a good opportunity with vulkan and they declined. So I’d be surprised for this to happen.

2) Other manufacturers will finally produce better competition, which are already open

I do expect competition to improve, which is good news.

Of course, this could go backwards and everyone could start producing proprietary systems.

The growing influence of proprietary platforms is something I worry about. The Importance of openness can fall on deaf ears In the broader community. I think my kids’ generation are going to grow up in a world where proprietary restricted platforms are the norm.

2021-10-26 1:16 am

Alfman verbose=1
Sabon,

Apple is JUST STARTING this transition to Apple silicon. While they may not be eGPU support now. As time goes on I’m pretty sure there will be support for this.

Perhaps, but where does this leave M1 max users who cannot plug into higher power GPUs today? Many won’t need to, but if you do it creates a bit of a dilemma. Future options are completely unknown.

As for M1 max is the highest you can go. You are trying to make us laugh, right?

They haven’t introduced the rest of their medium tier computers yet with M1 chips. There are still all the iMacs and the upper end Mac Minis. These are all still to come. Maybe an iMac Pro. Then the rumored small Mac Pro and the large Mac Pro.

Most of your post is about speculation about the future. But there’s only limited value in speculating. Back with the original m1 there was speculation that the imac was going to offer much more ram and more cores, yet it didn’t. Of course for a laptop the specs are pretty darn good regardless, but if you need more than a laptop, suitable for desktop or even enterprise, we don’t know when apple will be able to deliver something better.

I expect Apple to become King Kong the Mac Pro Extreme! (I don’t know what they are going to call it), but I think they are going to really put a stamp down on what they can do with their most powerful computer. It might cost $40,000 and only be used for gaming companies with massive video files/graphics to render. But I think it will make Intel look like their best computer is a Gameboy in comparison.

Nobody knows. Apple could break ahead, or maybe the idea of scaling everything in a SoC will run its course. The entire industry is always changing.

I think that they will be coming out with something where it is going to take years before any eGPUs will keep up with the Mac Pro. It will have enough RAM and power beyond what you will be able to think of for a while.

All thunderbolt eGPUs are already sub-par today because thunderbolt is a limiting factor.

Okay. So this is a fantasy. But it’s based on something that is possible –right now– with Apple can produce. The question is if they will want to make a huge statement or not.

“Right now” based on what? We can only judge a race up to this point in time, apple isn’t the only one moving forwards. It’s good to have competition.
2021-10-26 1:35 pm

SamAskani
I never said there is one-to-one replacement between the three backends, but we have managed to use macros to smart out to have a common kernel code that is the same for OpenMP, CUDA, OpenCL and Metal, so it is feasible for certain problems that can be solved roughly equally by all backends. For us, the critical part was to maintain the same kernels no matter what backend we use. The details how to run on each backend has its own #ifdef path, but it is completely feasible to have a project that remains agnostic. Our project produces binaries for each of the backends available in platform (OpenMP, CUDA, and OpenCL for Win and Linux; OpenCL and Metal for MacOS). This is only feasible for kernels that use operations such as integrals over large domains or FDTD stencils. If you need more advanced API based functions such as FFT, then you are stuck with Nvidia for GPUs, and FFTW or Intel MKL-FFT for regular X64,

2021-10-26 3:07 pm

Alfman verbose=1
SamAskani,

I never said there is one-to-one replacement between the three backends, but we have managed to use macros to smart out to have a common kernel code that is the same for OpenMP, CUDA, OpenCL and Metal, so it is feasible for certain problems that can be solved roughly equally by all backends.
…

The problem is some features just don’t work the same way neither at the source level nor API level. It would be like writing a win32 client, a gtkmm client, and javascript client. you can get them to do the same things in principal, but at the same time the code is different because the APIs and languages are different.

In a similar way with GPGPU you’ve got cuda using C++, opencl using C, metal using objective-c. I’m not claiming one is better or worse than another since that comes down to personal preference, but if you’re a mobile warrior and have been coding for M1 metal API on the road and then come home and want to load your model on a more powerful desktop GPU, you’ll have some porting work to do using ifdefs, wrappers, and artificial constraints on what features can be used. Once you spent time optimizing your code for one API, it may not necessarily be optimal for another. So you’ve got more work to see how things can be optimized and then you may have to debug the changes to make sure you get the same results under both APIs, etc.

My point is that this is more effort and less ideal than just using the same language and API across your desktop and laptop such that you just rebuild or even reuse the same binaries unmodified. I don’t expect my laptop to crunch numbers faster than my bigger machines at home, but I’d strongly to use the same API where ever I am.

I’m glad you found a way that works for you, but I still think for some people porting between APIs is a con.

2021-10-25 9:04 pm

Sabon
The easiest way to explain why Apple’s computers (with M1, M1Pro & M1Max) are so fast is:

On most computers, every part of the computer (CPU, GPU, etc) are literally all separate pieces on the motherboard with wires connecting them all together and to the RAM in the computer. Anytime the CPU feels that the GPU needs to be used, everything about that file has to move from normal RAM to VRAM (video RAM) and the same is true for Machine Learning and everything else. Everything has its own memory and as things get moved around, well that takes a lot of power and that is why computers, for the most part, get so hot. The less efficient the computer the hotter it gets.

Meanwhile … with the new Apple computers, once something is placed in memory, that thing never moves during the time it is being used. It stays in what is called “Unified Memory” which just means you don’t need to move files from one type of memory to another. Since the main thing that creates heat in a computer no longer happens, the computer is a lot cooler. It also makes it much faster since you don’t have to wait for part or all of a file, especially gigantic files, to finish moving. The computer just tells the program the address of where the file is and it just goes there and uses it.

Another way of putting it would be like this.

Let’s say you have 7 brothers and sisters. It doesn’t matter how many are which type. They remember that they all do different things.
Now, if you need all 7 siblings to do something, the non-Apple way would be to grab the boxes that the stuff is in, let’s say there are 80 boxes which is enough to make this seem very horrible to have to physically do.
As you physically move the 80 boxes from one room to another you get very tired from exerting a LOT of energy which makes you all hot and sweaty.
Wouldn’t it just be a LOT smart to leave the boxes in one place and just have all your brothers and sisters come and go through the boxes and do whatever they need to do without having to move any of them? They can move stuff around in the boxes, they just wouldn’t the boxes as a whole unless they were moving them to a USB drive or something.
Nobody but nobody would get hot and sweaty in a room with plenty of space and fans blowing at the perfect speed to keep you nice and cool and happy and non-sweaty since you aren’t moving a whole lot of big boxes from room to room to room.

2021-10-25 10:08 pm

kurkosdr
You are kind-of correct and kind-of wrong at the same time.

PCs have had something called the Graphics Aperture since the days of AGP and completely VRAM-less GPUs have existed for decades in PCs. The problem is that PCI-E imposes rigid bandwidth restrictions, so a GPU reading everything from RAM causes performance issues. Which is the reason performance-oriented PCI-E GPUs need to have big caches in the form of VRAM and then also do some management between Graphics Aperture and VRAM. While with an SoC you can have as beefy access lines to the memory as you want for the GPU (and the CPU for that matter, let’s not forget that the DDR specs can also be a bottleneck for CPUs).

So, PC laptops have to move to an SoC architecture, preferably with built-in RAM. However, where does this leaves Nvidia? Will they leave the laptop segment? Also, for the case of built-in RAM this means giving up any concept of RAM upgradeability,

Generally, the PC’s biggest strength so far (that the main components such as the CPU, GPU and RAM were made by different manufacturers, encouraging competition) has turned into its biggest drawback. I expect Intel and AMD to start moving everything into their chips, even RAM, if they want to compete.
2021-10-25 11:53 pm

Alfman verbose=1
Sabon,

Meanwhile … with the new Apple computers, once something is placed in memory, that thing never moves during the time it is being used. It stays in what is called “Unified Memory” which just means you don’t need to move files from one type of memory to another. Since the main thing that creates heat in a computer no longer happens, the computer is a lot cooler. It also makes it much faster since you don’t have to wait for part or all of a file, especially gigantic files, to finish moving. The computer just tells the program the address of where the file is and it just goes there and uses it.

You’re right about the M1 having unified or shared memory, but that doesn’t necessarily translate to higher performance, Shared memory can actually slow down both the CPU and GPU because they’re competing over a shared resource. Also, the fact that CPU and GPU memory can be shared doesn’t automatically imply there’s a benefit for doing so. This is why high performance GPUs tend to use dedicated memory.

Many applications tend to send relatively small jobs to the GPU (in terms of bandwidth) and then the GPU uses a whole lot of bandwidth internally to do its job. Consider a typical game engine. A GPU is typically preloaded with many gigabytes of textures, models, and scene information. Then the CPU just tells the GPU where entities are and sends the GPU on its way to work off of it’s own ram. Or consider the GPU being used for video effects one frame at a time. The GPU can process very intensive convolutions over the data where data may have to be read or written repeatedly to process effects, but the frame only has to be sent once from the host, which is a net reduction in host bandwidth.

Incidentally this is the reason that PCIe4 over PCIe3 is often so marginal, and sometimes even using half the PCI lanes is still ok.
“PCIe 4.0 vs PCIe 3.0 RTX 3070 + Ryzen 5 5600X | Test in 8 Games”
http://www.youtube.com/watch?v=i622qEj9-Lo

Modern discrete GPUs can use shared memory today, it’s easy to do, but if there’s any chance the data will need to be accessed over and over it’s usually better to copy it into dedicated GPU memory where ram is much faster and there’s no contention for shared host resources. If the GPU has to a access large textures over and over again, that’s no problem at all using dedicated memory, but will eat away the CPU’s bandwidth with shared memory.

The worst case scenario for the PCI bus would be when the GPU is being loaded, however PCI is unlikely to be the limiting factor compared to things like SSD. The m1 max’s SSD is said to run at 7.4 GB…
https://macperformanceguide.com/blog/2021/20211022_1830-Apple-MacBookProM1X-SSD.html
…assuming it’s accurate, it’s still only a tiny fraction of the bandwidth the PCIe4 standard is capable of at 32GB/s for a PCIe4 x16 GPU.
http://www.trentonsystems.com/blog/pcie-gen4-vs-gen3-slots-speeds

So IMHO dedicated memory will continue to play an important role for high performance GPUs.

2021-10-26 6:05 am

Anonymous
@Alfman

Incidentally this is the reason that PCIe4 over PCIe3 is often so marginal, and sometimes even using half the PCI lanes is still ok.

My eGPU sits in a box in a cupboard 99.9% of the time. However, people will be surprised what can be run with a 2.5GHz dual core with an eGPU over a single lane PCI bus (via Expresscard slot). Depending on what GPU card you get a solid 60hz or higher can be achieved. It’s also good enough for using as an OpenCL co-processor etcetera for running Da Vinci Resolve on an old laptop..

To some degree I think people are getting too drawn in by raw tech specs. After years of brainwashing by marketers and a propensity for some to measure dick length it’s no surprise. OpenGL doesn’t care whether you have a unified memory architecture (board level like SGI or component level like Apple) or an external GPU with its own memory. Metal was a dick move by Apple. Sony patenting back end transparency so you could invisibly scale from local to remote distributed GPU processing was also a dick move. Aside from Microsoft’s various attempts to monopolise the API space Microsoft’s keeping to itself for running its cloud platforms OS features which allow a user to scale from a single host to multiple and distributed hosts is another dick move.

Once you have the display, processing power, keyboard and a couple of ports locally you have everything you need to keep going for industrial level timescales i.e. 20 years plus. Everything else can be shunted off to an external module either a slot in a dock or via a cable to a local box or a remote system.

All these companies have pretty much monopolised the OS/hardware/service space. Annually this is billions flooding out of Europe and other countries to the US. It has crippled local business and in some aspects part of why European IT has pretty much imploded hence the EU Horizon’s programme… And why Russia wants to develop an “Off” switch for the internet and why China is insisting on domestic production. American’s are just too greedy.

2021-10-26 3:37 pm

kurkosdr
Is there a source on the back end transparency patent thing?

Most patents relating to graphics were issued in the 90s and have expired by now, so I would be surprised if the patent is still active. For example the last patent for S3 Texture Compression (an officially optional but defacto mandatory extension of OpenGL) expired in 2018.

2021-10-26 5:05 pm

Anonymous
I mentioned the Sony patent thing a few weeks ago and said it was ridiculous then. I read an article which mentioned the patent being granted a couple of months ago. Apparently it’s a feature in the PS5 which enables local processing or offloading to the cloud. Like I said… ridiculous.

2021-10-25 9:37 pm

kurkosdr
Are there any benchmarks for the GPU? I am very skeptical of RTX3070~RTX3080 performance levels.

2021-10-25 10:50 pm

kurkosdr
I mean, let’s see both the CPU and the GPU run at full speed in an ARM-native game, and see how the cooling system copes. The M1 Max is an impressive chip, but an RTX3080 and even an RTX3070 has TDP on its side. Some laptops even have dedicated cooling just for the GPU.

2021-10-26 5:47 am

Xanady Asem
You have it backwards the rtx has tdp against it since it adds to the already taxed heat capacity of the device by the cpu power dissipation

2021-10-26 7:35 am

kurkosdr
It doesn’t if it’s in a proper gaming laptop with dual cooling (separate set of heatpipes and fan for CPU and GPU respectively).

2021-10-26 10:25 am

Xanady Asem
Physics don’t work that way. Heat pipes are not magical multipliers of heat capacity. You have to increase tge systems volume to accommodate the extra heat pipe dissipation surface.

That’s why those gaming laptops tend to be bigger.

Then there’s the whole issue that more power consumption us bad when it comes to yoy know battery life
2021-10-26 3:42 pm

kurkosdr
Yes, a thick-and-heavy gaming laptop with dual set of heatpipes and dual fans and crappy battery life, aka what you really need to properly accommodate a mobile RTX3080, will smoke the M1 Max.
2021-10-26 4:40 pm

Alfman verbose=1
kurkosdr,

Yes, a thick-and-heavy gaming laptop with dual set of heatpipes and dual fans and crappy battery life, aka what you really need to properly accommodate a mobile RTX3080, will smoke the M1 Max.

That’s the thing, which is better depends on one’s priorities: efficiency vs performance. On the CPU side, the M1 exhibits both high performance and efficiency, so this dilemma doesn’t really come up. x86 has some disadvantages but it’s only now that there’s serious competition that architectural advantages may start coming into play. The lead may be determined by only a few percentage points.

On the GPU side, early benchmarks so far suggest that the M1 max’s GPU may not be as competitive on performance in real applications. If this is indeed a trend and not an isolated benchmark anomaly, it will mean that consumers evaluating the M1 max for GPU applications will need to decide which is more important: GPU efficiency versus performance. If a mobile 3080 has a substantial performance lead over the m1 max, then some users will overlook it’s higher power demands (and visa versa).

Edit: In reality for many people all these performance considerations are secondary factors. Being able to run one’s platform of choice is probably the most important thing for most users, including me. Personally I demand good linux support.

2021-10-26 12:06 am

Alfman verbose=1
kurkosdr,

Are there any benchmarks for the GPU? I am very skeptical of RTX3070~RTX3080 performance levels.

We don’t have comprehensive benchmarks yet to measure the M1 against competitors. There’s been a lot of hype made in the absence of data. In the coming weeks though many more people will finally be able to benchmark it and we’ll be better positioned to make conclusions based on data and not just opinions.

2021-10-26 5:07 pm

Anonymous
I’ve caught comments on Slashdot saying the Anandtech reviewer didn’t do proper comparisons and was recycling old Ryzen benchmarks.

2021-10-26 5:45 am

Xanady Asem
It’s going to be difficult to compare the m1 max and rtx since they’re targeting different APIs (metal vs dx12 vulkan)

Raw fp32 the m1 max seems to match the rtx 3080 lapto version (both at 10ish tflops)

There are some early gfx benches

https://www.google.com/amp/s/amp.hothardware.com/news/apple-m1-max-alienware-rtx-3080-laptop-adobe-benchmark

2021-10-26 10:43 am

Alfman verbose=1
javiercero1,

It’s going to be difficult to compare the m1 max and rtx since they’re targeting different APIs (metal vs dx12 vulkan)

Raw fp32 the m1 max seems to match the rtx 3080 lapto version (both at 10ish tflops)

You are right the metal API makes straitforward benchmark comparisons more difficult, I actually wonder if this may have been a goal for apple in choosing not to support the open vulkan standard. Obviously it wouldn’t look as good if they supported vulkan and the performance wasn’t competitive. Anyways it’s going to take time to benchmark a wide range of application samples to get an idea of average performance in the field.

So far the M1 max’s CPU benchmarks seem to be outstanding for a laptop, but the same cannot be said of the GPU judging by the early benchmarks that have been published so far.

There are some early gfx benches

Yes, the two benchmarks that your link refers to are the GeekBench compute benchmark where both mobile 3060 and 3080 crush the m1 max score by 52% and 100% respectively
https://hothardware.com/news/apple-m1-max-crushes-first-gen-m1-in-leaked-benchmarks
And the second one is PugetBench’s adobe premiere benchmark where the GPU score for 3080 is 3.2% higher than the M1 max. I agree with their assessment…

It’s going to depend on the benchmark and/or real-world application, though. For example, a recent entry on Geekbench showed the M1 Max posting a 68,870 Metal score. We ran some tests on our 2020 Mac mini with Apple’s first-gen M1, and lo and behold, the leaked benchmark score was more than three times higher (68,870 versus 22,456). However, the leaked score couldn’t hang with a couple of Windows laptops we tested, namely Lenovo’s Think X1 Extreme and MSI’s GE76 Raider, both with a GeForce RTX 3080 on board.

It’s possible the GeekBench result is merely an outlier, and maybe something’s wrong with it, but then the anandtech article paints a similar picture with M1 max exhibiting stellar laptop CPU performance, contrasted with the GPU being a slouch next to a 3080 besting it by over 100%.

However I do consider it problematic that the anandtech’s only two game benchmarks are being run under rosetta. Theoretically the GPU subsystem may not be effected by that, but it still feels like an unfair test. So in short I’d say we don’t have enough data yet and we need a lot more real world application benchmarking to observe trends and determine if these early GPU results are representative or merely outliers. If they are representative though, it’s kind of a bummer that the GPU doesn’t perform as well as on paper.

2021-10-26 5:23 pm

Anonymous
If you want some real fun compare Apple’s smartphone camera output across their range versus a proper camera. Apple fake image quality algorithmically by adding a lot of processing and quick and dirty shortcuts before write of the final image. Look closely and you’ll see bad artifacts.

NVidia have traditionally done better in benchmarks because they have tended to run their GPU’s hotter and cheat better than AMD. To some degree all GPU IHV’s do the same specially for games here you wouldn’t notice, to be fair. I’m guessing this is why some workplaces used to insist on Quadro drivers. If your GPU is driving a medical application you wouldn’t want artifacts.

2021-10-26 1:53 am

Zayn
https://wccftech.com/intel-alder-lake-mobility-cpu-benchmarks-leaked-faster-than-the-apple-m1-max-smokes-amd-5980hx-11980hk/