Intel’s Netburst: failure is a foundation for success

Thom Holwerda 2022-06-20 Intel 18 Comments

In the world of today’s high performance CPUs, major architectural changes don’t happen often. Iterating off a proven base is safer, cheaper, and faster than attempting to massively rework the basics of how a CPU fetches and executes instructions. But more than 20 years ago, things hadn’t settled down yet. Intel made two attempts to replace its solid but aging P6 microarchitecture with something completely different. One was Itanium, which avoided the complexity associated with out-of-order execution and variable length decode to deliver very wide in-order execution.
Pentium 4 was the other, and we’ll be taking a look at it in this article. Its microarchitecture, called Netburst, targeted very high clock speeds using a long pipeline. Alongside this key feature, it brought a wide range of innovative architectural features. As we all know, it didn’t quite pan out the way Intel would have liked. But this architecture was an important learning experience for Intel, and was arguably key to the company’s later success.

The Pentium 4 era was wild, with insane promises Intel could not fulfill, but at the same time, an are of innovation and progress that would help Intel in later years. Fascinating time.

About The Author

Thom Holwerda

Follow me on Mastodon @[email protected]

18 Comments

2022-06-20 9:20 pm

BluenoseJake
I remember those times, and Netburst was such a failure, they had to go back to the P6 for Pentium M, then used that as the basis for the original core architecture. Not much remained of the Netburst architecture. It was a failure, and so was Itanium. It was AMD who really saved Intel’s butt, by cross licensing x64 and multi core designs.

2022-06-20 11:44 pm

Alfman verbose=1
BluenoseJake,

I remember those times, and Netburst was such a failure, they had to go back to the P6 for Pentium M, then used that as the basis for the original core architecture. Not much remained of the Netburst architecture. It was a failure, and so was Itanium. It was AMD who really saved Intel’s butt, by cross licensing x64 and multi core designs.

There are pros and cons with AMD having done that. I kind of would have liked x86 to fall to the wayside to make room for new alternatives. I guess there’s still a possibility for that to happen, we’ll have to wait and see what happens in the future, but we may have gotten there 20 years sooner if AMD hadn’t given x86 a brand new lease on life.

2022-06-21 1:08 pm

NaGERST
What might you have wanted to take it’s place, if i might ask? The true 64bit architectures at the time that could be scaled for office and personal use was not that many. Alpha, PowerPC, POWER*, SPARC, MIPS perhaps or do you prefer some more obscure ones perhaps. I was quite partial to the floating point speed of the alpha chips that was at 833mhz THREE times faster than the pentium 3 at 1000mhz. Itanium and VLIV cpu’s like the Transmeta Crusoe might have had some advantages, but they were not that fast (allthough decent in most tasks).

2022-06-21 1:59 pm

Alfman verbose=1
NaGERST,

What might you have wanted to take it’s place, if i might ask? The true 64bit architectures at the time that could be scaled for office and personal use was not that many. Alpha, PowerPC, POWER*, SPARC, MIPS perhaps or do you prefer some more obscure ones perhaps. I was quite partial to the floating point speed of the alpha chips that was at 833mhz THREE times faster than the pentium 3 at 1000mhz. Itanium and VLIV cpu’s like the Transmeta Crusoe might have had some advantages, but they were not that fast (allthough decent in most tasks).

That’s a good question. Intel had a fab advantage over most competitors. I can only speculate about what architectures would have succeeded x86 if customers realized there was no x86 upgrade path. Given a viable market opening in the consumer space, there were lots of competitors at the time that could have evolved to fill the gap. AMD clearly demonstrated technical ability but who knows if they would have been able to crack the market with an alternative. IMHO PPC would have had a good chance. ARM didn’t yet transition to 64bit because it was targeting mobiles (although this might have been accelerated under different circumstances). Alpha may have been a contender, I never got to play with one personally. Also intel was already phasing it out in favor of itanium. Itanium is an interesting architecture but was too far ahead of it’s time IMHO. VLIW has lots of merit and explicit parallelism is incredibly powerful as GPGPUs are proving but itanium suffered because those computers ended up competing with x86 to run ordinary sequential general purpose software rather than VLIW appropriate workloads. If itanium had been designed as a coprocessor (akin to cuda) it might have had a different reception. I wanted one but it was not affordable at all.

2022-06-22 1:24 pm

phoenix
It would have been interesting to see what DEC could have done with the Alpha architecture. It really was ahead of anything from Intel, especially with SMP and the point-to-point connections to other CPUs and peripherals instead of using the front-side-bus.

Best thing that ever happened to AMD was hiring many of the Alpha devs. HyperTransport in the original Athlon / AthlonMP was based on the Alpha bus architecture. And a lot of the foundation for Athlon64 and the amd64 extensions to x86 came from the Alpha.
2022-06-22 4:31 pm

Xanady Asem
@Phoenix

Hypertransport came from academia and a bunch of vendors (AMD, IBM, SUN, et al) as an scalable point to point switched network protocol for connecting internal components. It took elements from a bunch of previous approaches like the Transputer, SGI’s NUMAlink, etc.

The Alpha Bus (used by AMD in the original Athlon) used DDR signaling, but it was still a shared FSB.

Both AMD and Intel benefited immensely from hiring DEC’s former Alpha architecture group. AMD got Keller et al, which designed the original K8, and then came back for the K12.

Interestingly enough, most of Intel’s uArch in the 00s started life in performance simulations as “Alphas” since a lot of the architectural simulators they used inhouse came from DEC, and executed AXP ISA binaries.

2022-06-21 3:03 pm

kurkosdr
The only viable successor to x86 was Itanium due to its ability to run x86 programs with only a moderate performance hit. And it was all officially supported by the Itanium’s hardware and then by Intel software. It’s why Intel kept pushing Itanium even after it proved to be a performance dud. Want more than 4GB of RAM? Join our marvelous Itanium monopoly today! Remember, no cross-license agreements with AMD for Itanium.

Then AMD came up with AMD64 and the world was spared of Itanium and the Intel monopoly it came with .

So, I am glad x86 didn’t fall to the wayside, because the successor was awful in more than one ways.

2022-06-21 4:35 pm

Alfman verbose=1
kurkosdr,

The only viable successor to x86 was Itanium due to its ability to run x86 programs with only a moderate performance hit. And it was all officially supported by the Itanium’s hardware and then by Intel software. It’s why Intel kept pushing Itanium even after it proved to be a performance dud. Want more than 4GB of RAM? Join our marvelous Itanium monopoly today! Remember, no cross-license agreements with AMD for Itanium.

I agree about x86 compatibility making it onto itanium’s feature list, however the use of die space for that purpose created opportunity costs and moreover it performed rather poorly. As you allude to itanium 2 ditched that in favor of better optimized software emulation.

https://en.wikipedia.org/wiki/IA-32_Execution_Layer
https://www.informationweek.com/it-life/intel-sees-a-32-bit-hole-in-itanium

I don’t deny your assertion that x86 compatibility was a barrier to alternatives but when push comes to shove most software is written in high level language that can be recompiled for new architectures so it’s not out of the question that non-x86 could have achieved critical mass. Apple’s done it with emulation to bridge the gap, windows probably could have survived a well coordinated transition. Absent x86, the opportunity would have been there for someone else to fill the gap in the market, perhaps AMD with a non-x86 product. As for itanium being the successor, I won’t rule it out but I do have a hard time envisioning it gaining more traction than it did given it’s notoriously bad price/performance ratio. Of course nobody’s right or wrong since this is just a fictitious mind exercise 🙂

2022-06-21 5:32 pm

kurkosdr
I don’t deny your assertion that x86 compatibility was a barrier to alternatives but when push comes to shove most software is written in high level language that can be recompiled for new architectures so it’s not out of the question that non-x86 could have achieved critical mass. Apple’s done it with emulation to bridge the gap,

The MacOS crowd is a different kind of crowd compared to the Windows crowd (I wish Redmond understood this too btw). Most MacOS users run a small collection of software (most of it from the OS vendor and Adobe) which they can update. It’s why Apple can afford to offer them a slow emulation box that gets taken away after some versions every time they change ISAs. Windows users have large collections of software and games. So, anything that doesn’t offer the ability to run x86 with a moderate performance hit at worst is a non-starter. Itanium was actually designed to make translation from x86 easier (in hardware or software).

Heck, Windows Vista maintained full binary compatibility (with the exception of software that did silly things like underhandedly patching the kernel) and people were screaming that their old drivers and their games’ StarForce DRMs didn’t work on the 32-bit version of Vista (and ho the 64-bit version didn’t load 32-bit drivers). And we are talking about ISA changes? Really? Even Itanium was a strech.

It’s why I think all those ARM Windows laptops are silly btw. Are you telling me that Qualcomm’s engineering prowess is so huge that they can design a core that outperforms Intel’s x86 cores, much let outperforms Intel’s x86 cores when accounting for the translation overhead?
2022-06-21 6:31 pm

Alfman verbose=1
kurkosdr,

The MacOS crowd is a different kind of crowd compared to the Windows crowd (I wish Redmond understood this too btw). Most MacOS users run a small collection of software (most of it from the OS vendor and Adobe) which they can update. It’s why Apple can afford to offer them a slow emulation box that gets taken away after some versions every time they change ISAs

I agree there would be criticism but I think windows would ultimately survive a well executed transition anyways. If you recall apple faced criticism for switching to x86 too. I think most users would fall in line as long as the transition wasn’t completely botched (see windows-rt).

It’s why I think all those ARM Windows laptops are silly btw. Are you telling me that Qualcomm’s engineering prowess is so huge that they can design a core that outperforms Intel’s x86 cores, much let outperforms Intel’s x86 cores when accounting for the translation overhead?

Well, yes switching to ARM just to run/emulate x86 is silly (IIRC there was about a 15-30% emulation cost for running x86 on the M1) but that’s only a transitional step obviously. For general office & browsing ARM should be ok on day 1. Older applications naturally tend to have lower CPU requirements and this is the bulk of windows applications. Maybe they ran on 10+ year old systems, which makes them better candidates for emulation. Newer applications are much more likely to still be supported by the publisher such that emulation may not even be necessary. And DirectX/OpenGL games are often GPU bound more than CPU bound so many of those would probably be ok.
I do appreciate the validity of your point, but I don’t think it’s necessarily a show stopper for typical consumers.

For me personally I’ve wanted an open unlocked ARM laptop for a long time to run linux on ARM natively. I concede the binary blob/proprietary driver situation really puts a damper on it though.
2022-06-24 8:44 am

The123king
I don’t deny your assertion that x86 compatibility was a barrier to alternatives but when push comes to shove most software is written in high level language that can be recompiled for new architectures so it’s not out of the question that non-x86 could have achieved critical mass. Apple’s done it with emulation to bridge the gap,

Apple had total control on the hardware though. Want your 68k/PPC/Intel code to work on the new platform? Here’s a compatability layer which you can use until you port stuff.

With Windows and x86 however, any competing architecture will always play second fiddle to x86. This is because Intel and AMD have absolutely no vested interest in killing off x86 (as the other company will find themselves with a profitable monopoly) and x86 has such a large install base that software will always be designed x86 first with any other architectures being an afterthought. As such, any x86 emulation on another architecture will just perpetuate the x86 dominance, as developers will see no point developing native ports, because their software will always run under the x86 emulation layer.

Unless there is a serious vested interest in developing native non-x86 code on a Windows platform (such as significantly better performance, or hardware trickery impossible under x86), Windows will be predominantly x86-based until the day it’s discontinued
2022-06-24 2:18 pm

Alfman verbose=1
The123king,

Apple had total control on the hardware though. Want your 68k/PPC/Intel code to work on the new platform? Here’s a compatability layer which you can use until you port stuff.

So does microsoft though, they literally dictate windows hardware certification requirements for both x86 and ARM (and other architectures they’ve supported in the past). This can be good (ie requiring ARM devices to have a standard UEFI bios). And this can be bad (ie requiring ARM devices to be secure boot locked). In any case though microsoft gets a big say over hardware.

With Windows and x86 however, any competing architecture will always play second fiddle to x86. This is because Intel and AMD have absolutely no vested interest in killing off x86

I understand that, but the same was true with x86 macs too. The choice of what hardware to support is ultimately apple’s and microsoft’s to make.

Don’t get me wrong, I don’t expect it to happen and there would be grumbling if it did, but for the sake of argument they could continue to put out x86 versions like the $6-30k macpro while pushing normal consumers onto ARM, which is what apple has done.

Unless there is a serious vested interest in developing native non-x86 code on a Windows platform (such as significantly better performance, or hardware trickery impossible under x86), Windows will be predominantly x86-based until the day it’s discontinued

At the end of the day the interest in developing for any given platform is financial. If a dominant platform is able to convince the industry that hardware-X is the future while migrating masses of consumers over, then that in itself is the motivation for developers to support it. The motivation to support particular hardware has next to nothing to do with hardware specifics and just about everything to do with that platform having tens/hundreds of millions of active customers.

So while I think MS could manage it, it hinges on how serious they are about switching. What does microsoft have much to gain from a switchover? It’s hard to imagine what MS would gain unless they wanted to take over for the OEMs and that would be a huge shift.

2022-06-22 4:51 am

jgfenix
Alpha had FX!32.

2022-06-21 5:59 am

jmorgannz
I wouldn’t be so quick to label it a failure.
Certainly technically it was horrible compared to P6.
But the marketing is what mattered. After AMD hit 1GHz first, and P6 not being able to scale up clock to compete, they needed a long pipeline architecture expressly to be able to pump the clock speed up so they could compete with AMD in GHz labels.
Of course it was meaningless technically – but the marketing mattered.

2022-06-21 1:15 pm

NaGERST
I remember falling for the marketing. Bought a 3200mhz northwood with a intel chipset motherboard as an upgrade to my Barton 3200+ (nforce2 chipset) as a friend had told me that my barton only ran at 2200mhz.
The damn pentium 4 felt liked a cripple, it was slower at almost everything, and in games it was VERY noticably slower. In operations like compression and decompression the pentium4 kept up rather fine to the barton, but never beat it by enough to make it worth it. I managed to return the pentium4 machine and just kept using the barton machine until i got my 2x dual core Opteron 275 a while later. Now THAT was an upgrade beyond what i thought possible at the time.
2022-06-21 2:51 pm

kurkosdr
NetBurst was a failure considering Intel hoped to scale it to 10GHz:
https://web.archive.org/web/20000819011344/http://www.zdnet.com/zdnn/stories/news/0,4586,2601717,00.html
Basically, the idea was that long pipelines may be inefficient due to the proportionally long “pipeline bubbles” generated when a branch prediction fails, but it doesn’t matter if you take advantage of Moore’s Law to bump up the frequency 10x.

Basically, the idea was that IPC improvements cannot be anticipated and rely on unpredictable innovation (it’s the reason why most CPUs today are multi-core instead of one big core with insane single-thread performance: we have no core design that can fill all those transistors), but Moore’s Law and pipeline lengthening can be anticipated. What nobody anticipated was the explosion in TDP when you increase the clock frequency.

There is one additional detail lost to time: Prescott wasn’t a quick update to Northwood as many believe it was, Intel had planned it well in advance and hoped to scale it anywhere from 5 to 6GHz.

2022-06-21 5:07 pm

Xanady Asem
The story is not as straightforward as people think; There were lots of structures from P68 that made it to Banias. And Banias wasn’t just a dusted off P6.

P4 and Itanium came from a set of tragically wrong expectations that some research teams within intel came up in the mid/late 90s:
1. Single thread performance was going to be dominant for decades to come in the desktop
2. Their process tech group could increase transistor speed almost linearly, with density being a secondary concern
3. Memory technologies like RAMBUS were going to deliver on their promises
4. Static analysis and software-based thread level speculation would match dynamic out of order HW schedulers
5. 64-bits addressing had a higher value proposition than x86 software compatibility.

Luckily for Intel they have had a somewhat “paranoid” management approach, so they always tend to have plan b/c teams working in parallel. Because they got so many things wrong, that if they hadn’t had backup plans on the ready they would have not made it.

2022-06-22 5:22 am

LeFantome
Some might say that “Only the Paranoid Survive”