Overall though, it’s no denying that Intel is now in the thick of it, or if I were to argue, the market leader. The nuances of the hybrid architecture are still nascent, so it will take time to discover where benefits will come, especially when we get to the laptop variants of Alder Lake. At a retail price of around $650, the Core i9-12900K ends up being competitive between the two Ryzen 9 processors, each with their good points. The only serious downside for Intel though is cost of switching to DDR5, and users learning Windows 11. That’s not necessarily on Intel, but it’s a few more hoops than we regularly jump through.
Competition is amazing.
I’m not that impressed with Alder Lake. It gets Intel back in the game with AMD probably earlier than most suspected they would, and manages to just beat the year-old 5950X at the expense of massively more power usage. The Gracemont cores are pretty impressive for their size.
More performance (and more efficiency) is always nice; but I’m more interesting in the “hybrid P + E cores” aspect (as it’s the first time I know of that mainstream 80×86 attempted it). Specifically; what Intel’s “thread director” actually is (the manuals don’t describe any new hardware at all despite news articles claiming it’s new hardware); and whether there’s any way an OS (that supports it) can “undisable” AVX-512 on the P cores.
AVX512 is fused off in the P-cores.
As it stands there is no support for different revisions of the x86 ISA on the same system.
Apparently not (Intel said they were fused off, but were wrong). See: https://www.anandtech.com/show/17047/the-intel-12th-gen-core-i912900k-review-hybrid-performance-brings-hybrid-complexity/2
The thing is, the firmware isn’t necessarily special, and if the firmware can enable AVX-512 (when E cores are disabled) then maybe an OS can use the same method to enable AVX-512 (maybe when E cores aren’t disabled).
Being pedantic (sorry); it’s not that simple. Multi-socket systems have always had the potential to support slightly different chips; and Intel’s ancient (late 1990s) MultiProcessor Specification even gave explicit warnings/advice to OS developers (“Operating system writers should factor in processor variations, such as processor type, family, model, and features, to arrive at a configuration that maximizes overall system performance. At a minimum, the MP operating system should remain operational and should support the common features of unequal processors.” in section “B.8 Supporting Unequal Processors”).
The problem was always that operating systems mostly didn’t bother supporting it (even though hardware allowed it in some cases); and motherboard manufacturers weren’t too enthusiastic about validating all the potential permutations either.
Of course I’m not saying it’d be trivial for an OS to do more than the bare minimum (support the common features of unequal processors.). For “with or without AVX-512” there’d be a major difference in the task state area used for switching between tasks that (at least for normal operating systems like Windows and Linux) would complicate task switching and migrating tasks from one class of CPU to another. However; recently (for Linux 5.15) Linux developers added support dissimilar CPUs (mostly systems where only some ARM cores support 32-bit), so they’re already part of the way towards supporting a hypothetical “only some 80×86 cores support AVX512” situation.
Brendan,
Yes, that’s interesting. I don’t think many will want to develop features around functionality that is officially unsupported, but in principal it shouldn’t be that difficult to achieve.
What normally happens when an unsupported opcode gets detected is to kick off an invalid opcode exception interrupt. The OS can then handle it however it needs to. It could terminate the program with an error, but it doesn’t have to if it corrects the fault and executes the instruction again. In this case it could automatically reschedule the thread on a regular full performance processor that supports 512 SSE.
The page you linked to kind of alludes to this, and it’s probably what intel themselves were planning on doing before deciding to shut off 512 bit SSE for whatever reason. It’s kind of a mystery to why intel would decide to drop something that’s evidently working already such that motherboard manufacturers are already enabling it. The article seems to suggest the decision was not technical but politics or marketing.
For better or worse, a lot of Windows software (especially if compiled with Intel’s ICC) does “dynamic dispatch”. What this means is that if the executable is started on an E core its initialization code will detect the E cores features, find that AVX512 isn’t supported, and then auto-configure itself (set function pointers, etc) to not use AVX512.
With this in mind; for Windows; I think the best approach (in theory) would be for the executable file’s header to include 2 sets of flags – one indicating which CPU features the executable requires and the other indicating which CPU features the executable can benefit from (but doesn’t require). That way the OS can decide if the process should use P cores or E cores when the process is first started (and avoid “wrong dynamic dispatch” and avoid the cost of migrating all threads when one thread uses an unsupported instruction; while also allowing OS to detect that no CPU supports the executable when the program is installed).
I think it’s a combination of Intel needing Windows to add special support for their chip (and “more support” making it harder for Intel to convince Microsoft to make the changes before release day); and Intel wanting parallelizable code to use all the cores in parallel (and AVX2) for efficiency rather than only using the P cores (and AVX512) to get less work done less efficiently.
Brendan,
Certainly, although my point was that the OS can auto-detect when an existing thread uses it in real time even without being told.
Even so though, what microsoft does is on microsoft. If they don’t support it, then that’s their loss. I don’t see why intel would be pressured to deny the feature’s existence because of that, there is a world that exists beyond microsoft after all. Anyways even though I don’t follow their behind the door reasoning, it makes for interesting trivia that it’s present in the BIOS & silicone!
I should have been more specific: system I referred the actual silicon part not the OS.
Some bios seem to support AVX512 when E-cores are disabled, The tests I’ve seen were people being able to see the wider registers manually, but I don’t know if precompiled codepaths were getting a CPUid that allowed them to use the full AVX512.
I’ve been told the issue seems to be at the memory and fabric controllers level, it doesn’t seem to support asymmetric capability operation on a per-core granularity.
javiercero1,
Are you talking about asymmetry between completely different CPUs? The type of asymmetry we’re talking about here is some cores supporting AVX512 and others not on the same CPU, which shouldn’t be impacted by the memory fabric.
Asymmetric/heterogeneous load/store requests impact fundamentally the ring fabric, memory controller esp the shared L3.
I assume Alder Lake either doesn’t implement the functionality or the HW is too buggy in this release.
javiercero1,
If you can dig up evidence for this, then I’d have to accept it. But as of right now it seems to be working already in terms of AVX512 and it may only be a matter of scheduling the binaries to execute on the right cores, just like the author discusses..
It’s only working with the E-cores disabled, Thus there’s no supported heterogeneous ISA within the chip.
The ring/mem controller is a mostly HW scheduling problem, not OS. In the L3 there seems to be no core tags, so passing a wide stride load/store request to a core that does not support it seems to be a consistency risk. So at least in this iteration the controller doesn’t seem to support asymmetric operation.
javiercero1,
I seems like they disabled those cores to compensate for OS limitations rather than hardware ones. While I could be wrong, my gut feeling is that E cores don’t care what the P cores are doing. AVX512 uses the full cache line mechanics that the CPU is already using even when it’s not executing AVX512 instructions. So it would be reasonable to think that the E cores are already “compatible” with AVX512 instructions running on neighboring cores. Nevertheless obviously it needs to be tested before we can say anything conclusive about it.
The E-cores are not compatible with AVX512, that’s the whole point. And that’s why asymmetric operation is not currently supported in this HW revision.
It’s both a HW and SW issue. Even if the SW scheduler is aware of the ISA affinity for each cluster. The wide strides are live on the ring and L3, So a Gracemont read on a Golden Cove commit could be problematic, since the Gracemont controller does not implement that type of packet. Or if it does, it’s probably too buggy for Intel to support it on this release.
As it seems Intel is not officially supporting AVX512 on Alder Lake.
javiercero1,
You are misconstruing what I said: “it would be reasonable to think that the E cores are already ‘compatible’ with AVX512 instructions running on neighboring cores.”
The author has already proven that AVX512 is compatible with those neighboring cores. Those AVX512 instructions may never reach the E core decoder such that it would even be aware other cores are running AVX512 instructions. If the AVX512 algorithm only touches the other compatible cores, it’s quite plausible that the E core will happily chug along running it’s own code completely oblivious of the fact that an AVX512 algorithm is running on other cores. So I will not be surprised if someone manages to fix the OS scheduler and get it working.
Given the author’s reporting, it may still work despite not being “supported”. I wish I had one to test with here to gather evidence one way or the other. I concede that I could be wrong, but given that you face the same lack of evidence you should concede that your assumptions could be wrong as well. So until we get more evidence it doesn’t seem like we can definitely answer the question here.
Yes, we can answer the question easily: Alder lake doesn’t support AVX512 in hybrid mode. I’m simply giving you the reasons from the HW side of things why intel may not be supporting it
javiercero1,
Where is your evidence though? Because if you don’t have any then it’s an open question.
The evidence is in the article.
javiercero1,
The article doesn’t back what you are saying though. The author explains why the E cores cannot run AVX-512 and we are all in agreement that they can’t. But there’s no evidence to suggest that, with proper BIOS & OS support, AVX-512 applications could not run on the P cores while having other non-AVX-512 applications running on the E cores.
If you disagree, then please cite the specific text that proves that there’s a fundamental incompatibility between C & P cores even when AVX512 applications are restricted to P cores by the OS.
Not for nothing, but the article claims that intel engineers actually did have AVX512 working since the beginning and Intel only later disabled AVX512 in the BIOS to their annoyance. It seems quite plausible that E cores and P cores were engineered to be fully compatible at the cache & memory interface. Hypothetically it might just be a matter of changing the OS so that AVX512 applications are not schedule to run on cores that don’t support it.
Without new evidence one way or the other, it remains an open question.
No. The article only claims that AVX512 is present on the P-cores, and it’s not fused off as it was initially claimed. There’s no mention of AVX512 supported on a hybrid system with the E-cores enabled,.
The evidence is that the shipping systems do not support hybrid ISA operation. Once again we’re stuck in a debate cycle in which you don’t consider reality as representative of what is going on.
javiercero1,
I know, that’s why it’s an open question, sheesh.
That’s just an assertion, where is your evidence though?!? The author did not test AVX512 with E cores enabled because it was an easy way to limit AVX512 to the P cores that support it. That does NOT prove it could not work with proper BIOS & OS support. This was what Brendon’s posts were about and I’m in agreement with him that it’s possible that it could work.
Either you have the evidence or you do not. I am genuinely curious but so far you’ve only provided conjecture with no evidence. It makes the discussion extremely tedious if your going to continue not providing evidence while still maintaining that you have it. Open that damn box Schrödinger and lets see what you’ve got 🙂
You can’t prove a negative.
All that we know is that Intel does not support Hybrid ISA operation on this system. The evidence is in the actual systems.
I’m simply adding extra context on why it is not just a simple OS scheduling issue, since the controllers for the ring fabric, L3 and LS queues are managed by HW and mostly opaque to the programmer. AVX affects the stride and therefore also the consistency, so that HW has to be aware. So the best we can guess is that either the HW support is not there or it is not validated for this iteration.
Really nothing more. Just a polite contextualization that some may or may not find helpful. But as usual, you have to make it weird.
javiercero1,
Ah good, now we’re getting somewhere. I agree it’s a guess, that’s a great way to put it. Testing of the actual hardware should reveal more answers. I hear what you’re saying about the fabric, but I still think that intel engineers who didn’t know that AVX512 was going to get disabled would have made sure the core’s IO interfaces were compatible regardless of AVX512 instructions on the P cores. I’m happy to concede that it’s just a guess and that we need more evidence!
I was already there, you’re the one catching up dude.
The POR for Alder Lake was set in stone well before it was taped out. There is no way the AVX issue was an after thought that caught any of the engineering teams off guard, or created any drama.
The engineering teams for Golden Cove were doing the features as requested, since it is a core being shared with the Xeon lines which expects to have complete AVX functionality.
This happens all the time. I’ve worked on IPs where their functionality was only used partially in different designs. Validation these days takes longer than design. So more and more of the original functionality is enabled after several iterations of the road map, as more of it is validated.
Doing asymmetric HW and SW is too much of a PITA for a niche feature like AVX512. So it make sense that at least in this iteration Intel is not bothering with it. Even if most of the HW is there, it’s probably not validated as that would have delayed the roadmap.
On top of that, Alder Lake even without AVX512 is hitting 220+ Watts when all clusters are on full turbo. So with AVX512 on + E-Cores it would be an even bigger thermal disaster. Which is not a good thing IMO.
AMD is kicking Intel’s butt in the consumer space without supporting AVX512. So Intel just wants to get their big.LITTLE arch out there ASAP before Zen4.
Perhaps for Rocket Lake the HW for full asymmetric operation will be validated and enabled. But right now, that you have to turn off the E-Cores is a clear indication that asymmetric operation is either not implemented or fully validated.
javiercero1,
I don’t care how we got here, it’s just good that we’re in agreement that the hardware itself is the ultimate evidence. Intel will say what it wants officially, but it’s interesting that early testing is finding support for AVX512 that intel didn’t want us to see.
Well, according to the article engineers had the feature working early on and it’s only disabled in BIOS, it’s not unrealistic to think this feature is already working in silicon and intel’s marketing wanted to take it away in order to help distinguish it from a future sku. Again, we’re just guessing like you said.
You’re still not understanding my point.
AVX512 is not working in hybrid mode. That’s all.
javiercero1
That’s fine, but you’re still assuming that it won’t work. It’s quite possible if not likely that intel has code to make this work internally. It would not be a stretch to think a linux hacker could fix the OS to support it. So I think it is completely fair to ask about what evidence exists for the assumption. It’s totally fine that you don’t have evidence that it can’t work, after all I have no evidence that it does work either! That’s my point, until we get more evidence, we cannot definitively say whether the hardware can do this or not.
It’s impossible to prove a negative.
I’m simply telling you that it’s not just an OS scheduling issue, as HW support is also needed for asymmetric operation.
Since Intel has stated they don’t intend to support that feature in this product, we can have an educated guess that the feature is either not implemented or not fully validated.
javiercero1,
It depends. When you have all the data you can prove that there are no positives through the process of elimination, Mathematically and algorithmically we do this. Sometimes there are times when this is not practical. In this case intel may have already let the cat out of the bag with it’s board partners who are unlocking features that intel wanted them to keep secret.
I expect it’s not validated. However that won’t necessarily stop people from doing it, just like overclocking. Some people stick to supported configurations, others don’t care. To each their own.
Scheduler is probably ready to schedule processes on capable cores. The process needs AVX? It gets the P-core.
I wrote “probably” because I last heard about this capability a year ago – https://lwn.net/Articles/838339/ , The article is about ARM systems, but the it’s generic feature for every platform.
Thom Holwerda,
Yes it is. We now have 3 fairly strong CPU vendors between intel, amd, and apple, and that’s great news! The whole industry is constantly improving and competition is what drives it. In healthy markets it’s very normal and expected for competitors to leapfrog each other’s specs because each vendor will be adjusting their targets to be competitive in relation to the rest of the market. This is why it’s best to keep a cool head and not get caught up in the news hyperbola that doesn’t fit reality. Osnews is guilty of this too. It was only 18 days ago you said that nothing intel or AMD even comes close, well here we are less than 3 weeks later and M1 max’s ST performance is beaten by 14% and MT performance is beaten by even more because it doesn’t have enough cores.. This is why it’s more important to look at the trends than the fact that vendors are leapfrogging each other every product generation, which shouldn’t be surprising when they’re close.
Looking at the trends, we see that ARM and M1 processors may not have the best performance on the market every quarter, but what they do have is exceptional power efficiency. Apple’s presence is probably what motivated intel to invest in hybrid P+E cores. Regardless of which companies one prefers, everyone should agree this is great for choice and competition compared to a few years ago when x86 was the only serious contender.
It’s AMD’s turn to show what they have for us next 🙂
None of this diverts me from questions about abuse of market power and forced obsolecence and unnecessary environmental waste among other things. Standard modular and repairable and upgradeable designs and power efficency and longevity of use interest me more than feature creep and gimmicks. I don’t go shopping in a Ferarri.
“and users learning Windows 11”.
It is early days but it seems to run pretty good using Linux. So Linux users can stick with what they know and get the benefits of the hybrid cores.
https://www.phoronix.com/scan.php?page=article&item=intel-12600k-12900k&num=1
Linux, via Android, has been doing big.LITTLE scheduling in ARMland for years now.