Multicore Requires OS Rework, Windows Architect Advises

Submitted by Jim Lynch 2010-03-19 General Development 29 Comments

“With chip makers continuing to increase the number of cores they include on each new generation of their processors, perhaps it’s time to rethink the basic architecture of today’s operating systems, suggested Dave Probert, a kernel architect within the Windows core operating systems division at Microsoft.”

About The Author

Thom Holwerda

Follow me on Mastodon @[email protected]

29 Comments

2010-03-19 2:09 pm
Bully
“Probert is on the team working on the next generation of Windows, though he said the ideas in this talk did not represent any actual work his team is doing for Microsoft. In fact, he noted that many of the other architects on the Windows kernel development team don’t even agree with his views.”
Makes you wonder what those other architects are thinking…
2010-03-19 3:29 pm
Tuishimi
…create a broader and more powerful instruction set that allows an operating system to off-load more of its logic directly to the hardware. Heck, they chips are becoming more and more powerful every time we look… why not begin blurring the line a bit; not suggested embedding the entire OS, just common OS logic, maybe even data structures and their typical methods used in implementing the logic (some already exist, right?)

2010-03-19 4:46 pm
siride
And they should add an instruction to calculate polynomials, too!

2010-03-19 5:03 pm
Tuishimi
Think about it tho’… as the technology becomes more complex, miniaturization continues… why not? Why not give the hardware more intelligence, removing software layers that do the same?

2010-03-19 5:50 pm
siride
You missed the joke. The VAX had an instruction to calculate polynomials back in the late 70s/80s. It was the king of the CISC architecture model. CISC is now considered dead. Even Intel has moved away from the CISC approach. It’s much easier for the hardware, and the software, to have simple instruction sets that can be executed efficiently than complex instruction sets that may or may not end up being useful, but waste a lot of die space and may cause a reduction in overall efficiency (just to be able to support the advanced instructions).
Remember also that it’s easy to upgrade software. Not so much with hardware. The intelligence should be in the software.

2010-03-19 6:50 pm
Tuishimi
I did miss the joke.
I agree and disagree. If something is so complex that it might change, or just the nature of the operation might change then I agree, it belongs in software. But if you can manage to move those eternal algorithms to the die then why not? Especially given the constant shrinking of the palette… sooner or later we will be cramming transistors into cellular sized spaces. If we have the ability to cram so much into so little space, would it really be wasting space to add functionality? Sure it could be in some form factors, but in general computing?
But I see your point. Faster, cleaner pipelines enable complex software to run faster. I just think at some point the trend will reverse itself and “smarter” hardware is going to make a comeback.
2010-03-19 6:55 pm
Tuishimi
Do you remember when VMS was ported to the Alpha? Ho boy! There was some trouble. Customers who relied on the exceptional math computational speeds of the VAX were furious when they discovered that despite the faster Alpha chip, math algorithms took factors longer to compute because they (and that has to do with the word size moving from 32 to 64 bit size) were implemented in software, not the hardware.
I remember that fairly well, processes that took a couple hours to run were taking days. Ouch. I believe that was fixed, eventually.
And yeah, that was all a little off-topic. Sorry.

2010-03-19 4:51 pm
poundsmack
they did, it’s called SPARC

2010-03-19 5:02 pm
Tuishimi
I am remembering my days at DEC. VAX was an extended instruction set that allowed the computers (running VMS) to perform very well.

2010-03-19 5:40 pm
poundsmack
ah yes, those were the days

2010-03-19 4:05 pm
twitterfire
I would like OS kernels to make use of GPU computation power. Modern GPUs are providing high parallel computation power and all that computation power is unused beside 3d applications.

2010-03-19 8:59 pm
samad
GPUs are specialized for doing parallel arithmetic operations. Specifically, a GPU can perform a vector sum in one cycle, whereas the CPU would require more cycles. The problem with your suggestion is that the kernel doesn’t spend most of its time doing vector sums, but instead things like interrupt handling and context switching.

2010-03-19 11:36 pm
twitterfire
I think that when we are watching flash videos on youtube, listen to mp3s, play movies, using some compression or cryptography software, we make heavy use of SIMD instructions and vector arithmetics. So why not use some GPU muscle?

2010-03-21 2:25 am
PlatformAgnostic
True. We should have a model where people can offload that work to the GPU. We’re starting down that road (DXVA for video, DirectCompute for more generic stuff). We try doing our best.
2010-03-21 2:31 am
Morgan
Adobe is doing just that, starting with Nvidia chipsets. It’s Windows only for now (as far as I know anyway), but it looks promising.
http://www.nvidia.com/object/io_1243934217700.html

2010-03-21 6:49 am
ebasconp
Is not OpenCL a standard to do exactly that and Snow Leopard the first OS that implements such standard?

2010-03-19 5:38 pm
Mark Williamson
The approach described (making the OS more like a hypervisor, with apps performing their own resource management) sounds like the Exokernel / Vertically Structured OS (e.g. Nemesis) research from a while back. The idea of partitioning different kinds of code onto different CPUs also resembles the Piglet aysymmetric multiprocessing OS prototype (IIRC that was a Linux that could dedicate a CPU to kernel stuff and a CPU to applications; something like that anyhow).
2010-03-19 9:31 pm
mphipps
It’s too bad that no one ever thought about writing an operating system that was specifically designed for multiple processors, that was pervasively multi-threaded and was super responsive under load. That would Be a great idea. It would be neat, too, if there were a free (as in beer and speech) version that anyone could download and try out. What might a name for an operating system like that be? Maybe something vaguely poetic, implying nature, simplicity and compactness…
Edited 2010-03-19 21:33 UTC

2010-03-20 1:24 am
Nicholas Blachford
It’s too bad that no one ever thought about writing an operating system that was specifically designed for multiple processors, that was pervasively multi-threaded and was super responsive under load. That would Be a great idea. It would be neat, too, if there were a free (as in beer and speech) version that anyone could download and try out. What might a name for an operating system like that be? Maybe something vaguely poetic, implying nature, simplicity and compactness…
Haiku was designed for relatively few cores, it’ll hit scaling problems as the number of cores increase and will have no idea how to deal with non cache-coherent cache systems.
I know of only one “real” OS designed to solve this: DragonFly BSD.

2010-03-20 6:12 pm
Nicholas Blachford
To clarify my previous point, the article is abut scaling beyond the relatively small numbers of cores we have now and going into the so called “many-core” area.
The hardware is also likely to end up quite different from what we have now. Today the systems are kept in sync by cache coherence, however this itself has scaling problems so in the future we’ll see non cache-coherent systems, Intel’s single chip cloud is an example.
Desktop OSs simply aren’t designed for this sort of design and
as DragonFly BSD is the only one I’m aware of working on this for the desktop/small server. There are other OSs but they are big iron or research OSs (e.g. Barrelfish).
Interestingly BeOS/Haiku does have one of the key elements in place already – the API uses message passing. So it’s probably a lot better placed for future systems than most OSs.
What would be really interesting is if you were to combine the DragonFly kernel with the Haiku user land. That’d give you a highly scalable, truly desktop OS.

2010-03-20 9:57 am
Invincible Cow
oups
Edited 2010-03-20 09:58 UTC
2010-03-20 1:34 pm
Phloptical
It certainly would Be a great idea! Wink wink….nudge nudge!

2010-03-19 9:32 pm
dnm240
A new approach to OS design is needed to exploit parallelism. Hm.
“Responsiveness really is king,” he said. “This is what people want.” Hm.
Flashback to March 2000 and BeOS R5 anybody? Or maybe even earlier with R4.5?
I say “Well if ya hadn’t killed it in its crib it would be a teenager by now, what do ya think of that, smarty-pants?”
There is nothing new under the sun.
Also, regarding GPUs, the original BeBox included a DSP, which is not the same, but is very similar…
Edited 2010-03-19 21:44 UTC

2010-03-21 5:00 am
Flatland_Spider
I still have my R4.5 CDs and BeOS Bible. I don’t have anything it will install on, but I still have it.
Experiencing BeOS was bordering on a religious experience back then. It was doing things with my crappy hardware that no other operating system could do, and all the other desktop OS vendors started to try to emulate Be.
Unfortunately it’s been kind of forgotten about. Microsoft kind of started that direction, and then they produced Vista and Windows 7. Apple added spotlight, and the Unix/Linux crowd could really careless.
It’s kind of scary to think of what we could have had, isn’t it?

2010-03-20 2:52 am
tessmonsta
The programs, or runtimes as Probert called them, themselves would take on many of the duties of resource management. The OS could assign an application a CPU and some memory, and the program itself, using metadata generated by the compiler, would best know how to use these resources
Ummm….this sounds exactly like the kind of resource management that’s been used in Mainframe programming for decades. Some high-end business applications would grab chunks of memory and CPU from the OS and then dole it out internally. I may be wrong, but I believe this is what CICS does since the OS was just too slow to keep up with the transaction volume.
Edited 2010-03-20 02:53 UTC
2010-03-20 12:00 pm
butters
This doesn’t address the main problem, which is that user software is not pervasively threaded.
Assigning each processor to a process doesn’t fix the single-thread performance ceiling, and it doesn’t let single-threaded processes utilize more than one processor when those resources are available.
I don’t see what problem this approach intends to solve. Next to user programs, modern OS kernels tend to be comparatively brilliant at multi-threading (not that there’s no room for improvement), and nothing in this proposal endows existing user programs with shiny new powers of parallelism.

2010-03-21 2:32 am
PlatformAgnostic
Welcome back, after a long hiatus! I agree pretty strongly with you.
I think the problem though isn’t even the apps. You want to run on as few cores as possible with the typical app anyway, because you’re using less power that way. And you still have to have acceptable performance when you’re running on the machines of today (often netbooks), so why code for the super-fancy 4-core as well? What do you get out of it as an ISV.
To really make use of the extra CPU, we need to change the vision of what we do with computers. Multicore isn’t worth it if it doesn’t improve someone’s life. For instance, if we had a highly parallel application that could do image processing, or voice recognition, or machine learning and save someone some time or entertain them in some way, this compute power would be woth something. But you also have to factor in the power consumed to achieve that.
Large parallelism is an obvious win in the server space, where there is usually a lot of independent pieces of work to do from many users, but it’s hard to translate down to client thusfar where there’s only one user, except in gaming graphics applications.
2010-03-21 6:43 am
cerbie
Programs aren’t pervasively multhreaded, because the hardware and software platforms cause diminishing returns for the hours a programmer puts into the work.
So, we shouldn’t make it easier to use resources in a more parallel fashion, because programmers aren’t already doing it well.
Isn’t that a bit of a chicken-and-egg situation?
By making it easier for the developer to make use of those resources, making parallel applications can become easier, as we’d see more of them being made. This idea is one of many to tackle that problem, this time by just getting out of the way.
Ideally, you won’t end up needing to code for a bunch of cores, but will have the whole dev system, from the bottom up, making it easier to use many processes and threads than to not do so, making gains from having more of them as automatic as having a faster CPU is, today…but without using functional languages everywhere to do it.
Edited 2010-03-21 06:46 UTC

2010-03-21 7:12 pm
wojnicki
Hey! Nobody uses pipes these days? I do – quite often. If I have several programs in a pipe, they can be run simultaneously on multiple cores. So the feature is there since… well before I was born
This kind of scalability mixes very well with the Unix approach: one application for one task. To perform more complex tasks mix the applications. Many applications to run lead to many simultaneous process and better utilization of multicore CPUs.