Windows is one of the most versatile and flexible operating systems out there, running on a variety of machine architectures and available in multiple SKUs. It currently supports x86, x64, ARM and ARM64 architectures. Windows used to support Itanium, PowerPC, DEC Alpha, and MIPS. In addition, Windows supports a variety of SKUs that run in a multitude of environments; from data centers, laptops, Xbox, phones to embedded IOT devices such as ATM machines.
The most amazing aspect of all this is that the core of Windows, its kernel, remains virtually unchanged on all these architectures and SKUs. The Windows kernel scales dynamically depending on the architecture and the processor that it’s run on to exploit the full power of the hardware. There is of course some architecture specific code in the Windows kernel, however this is kept to a minimum to allow Windows to run on a variety of architectures.
In this blog post, I will talk about the evolution of the core pieces of the Windows kernel that allows it to transparently scale across a low power NVidia Tegra chip on the Surface RT from 2012, to the giant behemoths that power Azure data centers today.
This is a fun article to read, written by Hari Pulapaka, member of the Windows Kernel Team at Microsoft. I feel like in our focus on Microsoft as a company and Windows a whole – either the good or the bad parts of both – we tend to forget that there’s also a lot of interesting and fascinating technology happening underneath it all. The Linux kernel obviously gets a lot of well-deserved attention, but you rarely read about the NT kernel, mostly because it isn’t open source so nobody can really look at the nitty-gritty.
I hope we’ll be getting more up-to-date articles like this.


How I despise the use of “virtually” to mean just anything from small to gigantic differences.
The Windows kernel currently supports only ARM and x86, which is not a lot. Itanium is likely obsoleted as MIPS or PowerPC.
All multiplatform kernels, from NetBSD to Linux to Rtems or QNX are “virtually” unchanged.
All major GUIs are virtually unchanged since the first Macintosh.
The Wheel invention is virtually unchanged since 5000 years.
Edited 2018-10-26 22:46 UTC
Well, the primary definition is “almost entirely,” as given by Merriam-Webster, or “Nearly, almost” as given by Oxford.
And, reading the article, it seems that, indeed, the Windows kernel is almost entirely unchanged between currently supported architectures.
Linux today is completely different from Linux 2.6 which was a big departure from 2.4 and so on. Indeed, the Kernel-Userland APIs are mostly unchanged, just with additions, but that doesn’t mean that the kernel hasn’t changed. SELinux, CGroups, KVM, Tickless, Hotplug mechanisms, Kexec, KSplice, /sys, different classes of devices (WPan, 802.11, RDM, PV Hardware, MTD, DTD, hotplug devices, etc.) are very different from the 2.6 of 2003. Just because you don’t use the new features of the kernel doesn’t mean they aren’t there. All of them required major architectural changes and the similarities between the original releases and the current ones are very limited.
This was discussed here recently. See http://www.osnews.com/permalink?660387 .
Note that UNIX allows for delete of in use files in the sense of removing them from the namespace but existing opens are still redirected to the old copy of the file/old data. Trying to modify code in place is obviously a bad idea too. Reboots on Windows aren’t really driven by in use files, but a policy choice to ensure that no code is running using the old/unpatched code. Linux just provides a lot more tools and delegates policy – I have an OpenSUSE VM whose updater tells me exactly which services are using files that have been updated by a patch and letting me decide what to do, for example.
I actually can see why they did this though. It essentially means that they don’t have to redefine anything about the API except the calling conventions (and even those were mostly just changes to register usage). I’m not really saying it’s a good design choice (or a bad one), but I can at least understand it.
Also, PPC and MIPS are bi-endian architectures, it’s the rest of the platform that dictates whether they are primarily big or little endian. Same for ARM (though essentially nothing has ever used big-endian ARM) and IA-64 (which is weird and can switch at run-time and have different endianess for code and data).
Or maybe since Xerox Star/Alto?
Also, Xbox 360 is kinda still supported, new games still come out for it, so PowerPC version of NT is not yet entirely obsoleted.
Xbox runs a custom operating system built from the ground up.
https://blogs.msdn.microsoft.com/xboxteam/2006/02/17/the-xbox-operat…
Hm, from comments there:
(d3vi1 here also mentions PPC64_LE…) Oh well, I stand corrected.
I’m really curious – who sells 896-core x86 systems that support Windows?
Hi,
You’re not the only one that’s curious. I found this:
https://www.anandtech.com/show/13522/896-xeon-cores-in-one-pc-micros…
..which doesn’t have any actual information (speculation based on what the screenshot said – e.g. 32 of Intel’s 28-core “Xeon Platinum 8180” chips).
– Brendan
I know it’s not as impressive but you can get 128 vCPU instances on Azure.
Blade servers acting as “one†machine?
But that would require SSI clustering, which (as far as I know) Windows can’t do.
“I’m really curious – who sells 896-core x86 systems that support Windows?”
The HPE SuperDome Flex
Max 896 cores and 48TB of RAM:
https://community.hpe.com/t5/HPE-Storage-Tech-Insiders/A-Look-Under-…
https://www.hpe.com/us/en/product-catalog/servers/mission-critical-x…
This is a descendant of the SGI UV line.
HPE acquired SGI in 2016.
Edited 2018-10-27 16:48 UTC
Microsoft, like other large cloud vendors, build their hardware by themselves.
I’m sorry, compared to what?
Linux and the *BSDs run on everything, and you can strip them down much smaller and swap out higher layers much more freely than with Windows, so Windows is basically the second least versatile and flexible operating system after macOS.
Though come to think of it, the same kernel and runtime are on iOS, Apple TV, Apple Watch, and even the new touchbar. Are those SKUs? Are SKUs supposed to be good? They’re a licensing distinction, not a technical one. Imagine how poorly it would reflect on the kernel’s flexibility and versatility if there actually had to be a real, technical difference between Windowsâ„¢ and Windows for n+1 Coresâ„¢.
If you consider an SKU to be analogous to a distribution, then the Linux kernel has many more of them especially including all the embedded ones and android.
Also the extra code they talked about to support multiple SKUs sounds like unnecessary extra complexity where bugs could be introduced.
Reading their post it just looks like they are always just playing catchup to linux. Windows 7 was the first version to support >64 cpus, and they discovered it didn’t scale well? SGI were building linux machines with up to 512 processors several years before that.
Also as you point out, they are not only one of the least flexible platforms currently available but most of what little flexibility they have isn’t actually available to users, who can’t really customise it like they can with linux/bsd and only have a small set of prebuilt options to choose from.
That is mostly a moot point since the SGI Linux machines were not actually running stock Linux. It was a highly customized port. Limits like this exist even in modern OSs such as Solaris. Solaris, even in its latest incarnation has a limit of 256 CPUs. This is particularly relevant since their newest dual-cpu machines actually go have 512 vcpus (2 cpus x 32 cores x 8 threads = 512 vcpus). And they also have 4way machines for 1024 vcpus. That means that the newest Solaris (still an incredibly solid and feature-rich OS) can only address the total threads of a single latest-generation SPARC Socket and if you have multiple sockets you need virtualization to use them all.
Microsoft’s relationship with SMP is actually interesting, because in the early NT 3.1 days, the magazine articles were quite critical of their decision to exclude asymmetrical multi processing and only focus on SMP. Windows NT actually had very wide limits for the time, it was the bootloader/firmware/hal that limited the actual posibilities of the OS. With a few minor bootloader hacks you can boot NT 3.1 i386 on systems with 4GB of RAM or multi processor systems. If the bootloader can describe them and the HAL can initialize them (like the Intel MPS HAL), all Windows NT systems (3.x and 4.x) can use up to 32CPUs and up to 4GB of RAM for 32bit. Those were astronomical limits in the era of Intel 486 and 16MB of RAM.
When you talk about scalability, you must distinguish between two different scalabilities:
-Horizontal scale-out (many PC on a fast switch) scalability. Here we have SGI UV3000, Altix, ScaleMP servers. These clusters can have as many as 10.000s of cores and 100s of cpus. Supercomputers can have 100.000s of cpus. All these are clusters and are mainly used for HPC number crunching computations.
-Vertical scale-up (one single large server) scalability. Here we have Mainframes, RISC servers such as POWER cpu servers such as IBM P795, SPARC M10-4s or M8 servers. These large servers typically have 16- or even 32-cpus, and weigh 1.000 kg or more. These servers are mainly used for business workloads, such as SAP, OLTP databases, ERP, etc.
.
Here is the main difference: Clusters cannot run business workloads. There is no company that runs OLTP databases or SAP on a cluster because it would be too slow. Try to google and see if you find anyone running SAP on a SGI UV3000 server. There are none. Why? Let me explain.
.
Linux has always excelled on clustered workloads, ie HPC number crunching. These workloads are embarassingly parallel, ie. easy to run in parallel. HPC typically calculate an equation solver on a small grid, over and over again. For instance CFD scientific computations solving Navier-Stokes diff.equations. HPC workloads can be run separately on each cpu fine. Not much communication is going on, between the CPUs. This workload is excellent for clusters. Typically, these large clusters serve one engineer who starts up a task that runs for days or weeks. The whole cluster is dedicated to one user.
.
However, clusters cannot run Business Enterprise workloads such as SAP, OLTP Databases, ERP, etc. The reason is that business servers serve 1000s of clients simultaneously. One is doing accounting, another payroll, etc. All these data from 1000s of users does not fit into the cpu cache, so you need to go out to RAM all the time. Another huge problem is that the business source code branches heavily, payroll to accounting to… So the cpus must communicate much, with mutexes and locks, so the business database is correcly updated. So business workloads must synchronize a lot.
As the business software reaches RAM all the time (data from 1000s of clients does not fit into cpu cache), the RAM typically has a latency of 100ns. That corresponds to a 10 MHz cpu. Remember these? Ouch. They are slow. This assumes all the data is collected into one large server. Even worse, if the data is spread out to a cluster then reaching another PC has worse latency, maybe 500 ns? That would correspond to 2 MHz cpu. Nobody can run critical business workloads on a 2 MHz cpu. This is the reason Clusters cannot run business workloads.
.
All earlier 512 cpu Linux servers (SGI Altix) are clusters. They are only fit for parallel HPC workloads. Until last year, there did not exist large scale-up business servers for Linux. The largest scale-up business server for Linux was ordinary 8-cpu Xeon servers. If you needed to run large business workloads, you have no choice than Mainframes or RISC. Today Fujitsu sells one 64-cpu SPARC business server, the M10-4S.
So when people say that “Linux does not scale” they talk about scale-up business servers. Linux has always scaled well on scale-out clusters. Supercomputers all run Linux. Until last year, the largest Linux scale-up business server had 16-cpus, the Bullion x86 server which had bad performance. So how could Linux be optimized for large 32-servers when they did not exist? That is impossible for Linux to scale well on large business servers, because such large servers did not exist.
All SAP benchmarks belong to SPARC, just check the top list. The Linux servers come far below.
Lets look at some scale-up business servers in more detail. If we look at the Bullion 16-cpu Linux server, we see it has bad topology:
https://deinoscloud.files.wordpress.com/2012/10/bullions-bcs.png
If you need to access data from a red cpu, to another red cpu, it is as many as 3 hops. That introduces lot of latency. This is only for 16-cpus.
If you look at the latest Intel 8-cpu Platinum topology, the worst case is 2 hops. All cpus are not directly connected to each other, you need to go through another cpu.
So if you tried to do a 32-cpu x86 server, it would be 4 hops, or maybe even 5-hops? That would be very bad. Contrast that to a SPARC M6 32-cpu server. In worst case, it is one hop. That is very good, and that is why SPARC is so fast for business workloads:
https://regmedia.co.uk/2013/08/28/oracle_sparc_m6_bixby_interconnect…
And if you need a 96-cpu SPARC server that is also one hop in worst case:
https://images.techhive.com/images/idgnsImport/2013/08/id-2047607-sp…
This the reason RISC servers scales much better than Linux on x86 servers – we talk about business servers. Scale-up. Unix has scaled to 32-cpus for decades and had long time to optimize for such huge servers. Linux had not time to mature on large 32-cpu servers. It takes decades.
If we study the latest and shiniest HPE 32-cpu server with 896 cores. In the datasheet (check the links) it says that the server is built up from “4-cpu building blocks”. So if you need more cpus, you connect more 4-cpu blocks. I.e it is a cluster built from 4-cpu blocks. That means there are lot of hops from one 4-piece to another. So scalability is bad, so SAP is still fastest on SPARC. This HPE server is no threat to RISC servers. When running business workloads.
One SAP installation can cost several $100 millions. The IBM P595 with 32-cpus for the old TPC-C record costed $35 million. No typo. One single 32-cpu server! So business servers are very lucrative, and more expensive than a cluster. Everybody wants to big large business servers. Linux is non existent on large scale-up workloads market segment. Large business server segment exclusively belong to Mainframes and RISC servers. Linux has 0% market penetration there.
BTW, SAP Hana is exclusively for analytics, it is designed to run on a cluster. It is not SAP OLTP business. SAP HANA is only for analytics, embarassingly parallel workloads.
You realize, don’t you, that there is more than just Windows, Linux, BSD, and MacOS out there? That are widely used?
In that context, it is “one of the most” flexible. There isn’t a lot out there that can power both a smart phone and a massive server with hundreds of CPUs, and there are a ton of actively used, actively developed kernels out there.
And, just because Microsoft doesn’t sell a super-scaled down version of Windows doesn’t mean it can’t be scaled down to minuscule size. Microsoft has indeed demoed a 22MB system image serving webpages.
When it comes to “widely used” theres not much else, maybe qnx and vxworks in the embedded space… Everything else is pretty much dead/unsupported these days, or extremely niche.
And 22mb is not what i’d consider small, and this system was a demo rather than anything useful.. On the other hand there are many embedded devices out there with less than 22mb of flash which manage to host a web server (typically with an interactive ui for management).
There’s also Solaris, AIX, zOS…
Currently, FreeBSD supports up to 256 CPUs I believe, ad is only fully supported on x86/amd64. It is tier-2 on ARM, and should be Tier-1 fairly soon ( I think that’s the goal for 12.0)
NetBSD runs on a wide range of small hardware, but doesn’t scale up to large systems, and even if it boots on, say, 128 CPUs, isn’t going to run well.
OpenBSD runs on a wide variety of hardware, but isn’t going to scale up to more than a handful of cores, and even then, poorly.
AIX is widely used (Not nearly as widely as Linux, but it isn’t obscure). It runs on POWER-based servers, but nothing else.
zOS is still widely used in it’s market, but it is just the one, small (yet very profitable) market. It’s appearance in datacenters isn’t uncommon.
Solaris is something certainly widely used in the datacenter, but pretty much server only. Yeah, it runs on amd64 desktops, but nobody does that anymore.
No, they don’t.
Linux officially supports a total of 22 architectures right now (counting AArch64 and 32-bit ARM as one and ignoring User-Mode Linux). I can still name a dozen it doesn’t run on, including stuff that’s still ‘current’. Of those that it does support, most distributions only support 5-6 (typically a subset of x86, ARM, PPC, S/390, SPARC, MIPS, and IA-64) or even fewer than that.
BSD is even more restricted. NetBSD has the best cross-platform support, but only supports 9 architectures (counting variant forms as equivalent), with only 5 of those having full support (which puts the full-support list in-line with what FreeBSD does). There’s some special variants of BSD based on much older code that run on pother platforms (Microchip PIC MCU’s for example).
Yes, they do have far better cross-platform support than Windows does, but saying they run on everything is just plain false.
ahferroin7,
I don’t know that’s it’d ever be practical for real projects today, but quite impressive anyways.
Being efficient on microcontrollers these days requires skillsets that were more proficient many decades ago among mainstream programmers. Technically arduino makes it easy for modern programmers to program them, but the performance of the built in abstractions (like analogRead) is terrible. The underlying atmega chips are remarkably fast, but you need to cut through the library bloat and program the registers directly.
Yep, people used to think I was crazy back when I actually worked with them because I preferentially used hand-written assembly for my projects. They generally changed their minds when I showed them an Arduino doing things they thought were impossible on one. Of course, I always found this even funnier because AVR assembly is one of the easier assembly languages to learn.
ahferroin7,
I tried to find somewhere to buy hardware, but all the links I’m finding seem to be broken or gone, is it just me?
To be honest, I believe most embedded projects where linux is warranted are better off using cheap ARM/mips SMBs where you can get the real deal with fewer compromise and better support.
For highly embedded stuff, there are also always OSes like Contiki …which at leasts has IPv6 stack (this will become more and more important as the years go by)
It’s been almost 20 years… expecting links to buy it to work would be kinda like expecting, in 1999, offers selling Commodore PET to be still valid.
And isn’t RTEMS, in contrast to RetroBSD, like, Real Time?
The distinction is actually pretty small. RetroBSD should work just fine for soft real-time applications. For hard real-time RTEMS is probably going to continue to be the preferred option, but not everyone needs hard real-time.
… and Windows is one of the last big operating systems, where the kernel is closed source!
Even macOS and iOS have an OpenSource kernel (xnu), tools which are based around it (darwin) and the compiler, which compiles it (llvm/clang). All OpenSource.
And the other systems, like Linux, *BSD, Haiku, etc are completely opensource.
Only Windows is completely closed.
That is also an security problem. Because security by obscurity don’t exist.
The bad guys are decompiling Windows and finding security holes in Windows, which they can use, to write virus or worms.
But the good guys are prevented by the license, to look at it.
I know, Microsoft doing a lot of in OpenSource: Visual Studio Code, .NET Core, etc. And all running on different operating systems.
The funny thing is, that the tool ProcDump for Linux is opensource
https://github.com/microsoft/procdump-for-linux
where the Windows version – like all Windows SysInternals – don’t allow reverse engineering, decompiling, making too much copies of the program, publishing the software for others, etc.
So, there are good reasons to using some Microsoft-products like VSCode, .NET Core, etc. But there existing no good reasons to use them on Windows.
As I said, Windows is one of the few operating systems, where neither the kernel, nor the compiler with which it was build is open.
Three years ago, it was “definitely possible”.
https://www.wired.com/2015/04/microsoft-open-source-windows-definite…
And today?
Greetings
theuserbl
Edited 2018-10-27 17:46 UTC
For the record, the iOS kernel is not opensource.
While it is XNU, just as macOS is, you can’t use the public XNU sources to build the iOS kernel, and patches and modifications made to XNU for iOS are also not public.
And what is with news like this one?
https://techcrunch.com/2017/10/01/apple-open-sourced-the-kernel-of-i…
https://appleinsider.com/articles/17/10/02/apple-posts-arm-compatibl…
There don’t exists a macOS for arm. So the arm-part is for iOS.
I stand corrected.
windows NT surpasses all linux current kernels with support and easily isolated its kernel and hardware from software space …than linux is not secure when get attack by cpu execution like 2 bugs from software layer…since it will never attack neither in nt, QNX, MACOS, or DragonFLYBSD….they can patch it but they know its impossible to exposed cpu from software layer that doesn’t have access to the hardware kernel ..cause its secure by design on Operating System….this one ultimate benefit in longrun…make a microkernel with monolith software layer or run multiply kernels like QNX does..still get the job done…while linux is chaos by design when execution can’t be separated by OS..cause linus design the way that he ignored long ago…the dangers of rouge developer