Post a Comment
Neat
Hopefully that'll mean in the future one can create a truly robust operating system where if one kernel crashes on a massive multicore monster - the isolating will mean the rest will keep running without a hitch. I do, however, wonder what the overhead is, how complex it would be to use it for a general purpose operating system and whether such a concept can be retrofitted to an existing operating system.
IIRC, "Message Passing" is Latin for "Slower than the January Molasses".
All that hardware bandwidth. All that potential for fast, low-latency IPC mechanisms. And it gets wasted, killed by latency, passing messages back and forth.
I always knew that the fantastically powerful computers of the future, running the software of the future, would perform significantly more poorly than what we have today. And this concept may just be a glimpse of how that future is to unfold.
You don't think the hardware will adapt?
I doubt it, seems logical that new processors will be designed with pipelines facilitating nanosecond IPC.
Microsoft just provided very costly R&D to the IT community free of charge, and is signaling to their partners that theoretical technology is now practical to some extent ....
... and in essence communicating that they should plan accordingly!
It certainly was very costly in terms of time, effort and the skilled labor to produce it.
It's safe to say this end result (the down-loadable code) cost at least a million USD to produce.
It served both R&D and PR purposes ... 'cheap' is a matter of perspective, though.
If there is a cadre of hobbyists who have released an alternative, I would love a link!
Most of the work was done by ETH, I doubt it would have cost even a fraction of that to Microsoft. It seems a mostly academic effort, but good to see MS involved.
Even at a million dollars, cheaper than a superbowl ad.
http://www.kernel.org :-)
I see now that you can't help diminishing the effort, and any unique value it has -- seemingly because it is either from a commercial source, or Microsoft specifically
I can't understand the point of the constant negativity, it doesn't help anyone!
It's silly to debate the overall cost, but let's just say it was far more than zero, and no one did it first, to this extent, for free.
Well, free in the sense it's not encumbered by GPL limitations, but MS limitations - free nonetheless.
edit: typed less instead of 'more'
Edited 2009-09-26 06:58 UTC
Not at all! It is great that companies like Microsoft contribute to pure OS research. More companies should take their lead.
Can it, while remaining compatible with the existing architecture? Intel/AMD aren't going to break the backwards compatibility of the x86 arch without a darn good reason. Keep in mind that backwards compatibility is largely why x86 has been so successful.
And if this were easy, wouldn't it already have been done by now? After all, message-passing is also at the heart of the microkernel concept, an idea that has been around for decades but has gone absolutely nowhere because of, as the GP sarcastically pointed out, bad performance.
I suspect they're going to need some hard proof of really dramatic improvements before Intel/AMD will pay attention to them.
I never mentioned x86, you did.
As you can see in the very graphic inlined in the article, this technology was developed for heterogeneous architectures.
Obviously, non-x86 only.
My point that some hardware could well evolve to take advantage of this technology stands unmolested.
It was mentioned down-thread that the dev platform for this was an AMD system.
I don't know which inlined graphic you're referring to, but now having read parts of this:
http://www.barrelfish.org/barrelfish_sosp09.pdf
the 64bit flavor of x86 is clearly a prime focus of their work (they also mention Sun's "Niagra" system, but I don't know what that is). They explicitly mention AMD Opteron and Intel Nehmalem(sp?). Barrelfish currently only works on x86-64 systems, though an ARM port is also in the works (they say).
I don't think you understand correctly what they mean by "heterogeneous" here. They're also referring to systems with cores of the same ISA, but just running at different speeds for example.
They seem to be claiming that the hardware is headed in this direction anyway, driven largely by the needs of multicore *server* systems. Section 2 of that pdf is a very interesting read. High-end server hardware is not something I'm familar with, so my original post may be wrong about the assumption that these changes would require incompatible changes to the x86 ISA.
I doubt it, seems logical that new processors will be designed with pipelines facilitating nanosecond IPC.
Microsoft just provided very costly R&D to the IT community free of charge, and is signaling to their partners that theoretical technology is now practical to some extent ....
... and in essence communicating that they should plan accordingly!
hardware will never adapt. microsoft doesn't make an OS that doesn't require new hardware.
another point, microsoft will never use this in any of their products or they wouldn't have given it away. being message based means its slow so they realize its useless and they're trying to look good by giving something away they've appeared to work on for the last few years (but gave up on in reality because its useless)
IIRC, "Message Passing" is Latin for "Slower than the January Molasses".
FYI, their paper does argue that message passing on a multicode architecture is significantly faster than shared memory access on the same machine.
But then they explain they have made the "OS structure hardware-neutral" in 3.2.
So in other words: Let's use message passing since it is fast on our AMD development machine, but if it is too slow on the next gen hardware, we will switch to something else.
Not exactly solving the problem, IMHO.
Edited 2009-09-26 05:11 UTC
So in other words: Let's use message passing since it is fast on our AMD development machine, but if it is too slow on the next gen hardware, we will switch to something else.
Not exactly solving the problem, IMHO.
That's not really an actual portrayal of what they said.
Their basic conclusion is that as the number of cores increases, the cost of cache-coherency will increase such that updates that span multiple-cache lines will be slower than passing a message to each core and letting the update occur locally. There's no real way around this problem so assuming that core counts continue to increase using a message passing approach like they took here, will make sense (it already does on large machines, there doesn't seem to be much of an advantage on 4 core machines).
What is architecture specific is the most efficient message passing method. From what I gathered from the paper, a lot of this is handled by the system knowledge base, but even if a future piece of hardware requires a fundamentally different message passing mechanism (like the addition of a dedicated inter-core messaging) it won't require a fundamental change in the organization of the OS.
IIRC, "Message Passing" is Latin for "Slower than the January Molasses".
All that hardware bandwidth. All that potential for fast, low-latency IPC mechanisms. And it gets wasted, killed by latency, passing messages back and forth.
I always knew that the fantastically powerful computers of the future, running the software of the future, would perform significantly more poorly than what we have today. And this concept may just be a glimpse of how that future is to unfold. "
All communication, basically, involves messages. It all depends on the sender and receiver. Memory can viewed as a service that handles read and write requests (messages).
In multi-core systems inter-core communication must go through memory, except for atomic operation coordination, which obviously has to be core-to-core. This results in multiple messages going back and forth for at simple exchange og infomation:
1. Sender: write data to memory (write msg)
2. Sender: inform receiver of new data (read/write, core-to-core msgs).
3. Receiver: read data from memory (read msg)
4. Receiver: inform sender of reception (read/write, core-to-core msgs).
I have left out all the nasty synchronization details in #2 and #4, but it usually involves atomic updates of a memory address, which can cause core-to-core sync messages, depending on cache state. Also, cache coherency in general might cause lots of messages.
It is easy to imagine that this could be done faster and in fewer steps if low-level core-to-core communication were provided. All the hardware is already point-to-point.
My point is, that it is not the message passing in u-kernels that gives the overhead. In fact, it is the extra protection (a long story).
Also, shared memory as a programming platform doesn't scale if you code programs the obvious way. You have to know what's going on underneath. It's like cache optimization. You have to know the cache (Lx) and line sizes before you can do a good job. The non-uniform in NUMA does make things better.
I think we have to make memory a high-level abstraction and give OS and middleware programmers more control of what is communicated were.
Hi,
IIRC, "Message Passing" is Latin for "Slower than the January Molasses".
All that hardware bandwidth. All that potential for fast, low-latency IPC mechanisms. And it gets wasted, killed by latency, passing messages back and forth. "
If you compare 16 separate single-core computers running 16 separate OSs communicating via. networking, to 16 separate CPUs (in a single computer) running 16 separate OSs communicating via. IPC, then I think you'll find that IPC is extremely fast compared to any form of networking.
If you compare 16 CPUs (in a single computer) running 16 separate OSs using IPC, to 16 CPUs (in a single computer) running one OSs; then will the overhead of IPC be more or less than the overhead of mutexes, semaphores, "cache-line ping-pong", scheduler efficiency, and other scalability problems? In this case, my guess is that IPC has less overhead (especially when there's lots of CPUs) and is easier to get right (e.g. without subtle race conditions, etc); but the approach itself is going to have some major new scalability problems of it's own (e.g. writing distributed applications that are capable of keeping all those OSs/CPUs busy will be a challenge).
-Brendan
Yes, good points. I would presume that this model would only take down applications that were currently running on the failed core. However, you would have to deal with messages in flight to the running core, so there would be unknown state to clean up. I bet you could easily cycle/reset the core into a known state. So, greater up-time in the long run.
As far as overhead is concerned, they say that native IPC was 420 cycles and the similar message passing implementation cost 757 cycles. That's 151ns vs 270ns on the 2.8GHz chips they were testing on. However, by breaking the current synchronous approach and using a user RPC mechanism they dropped the message passing to 450cycles on die, and 532cycles one hop. With two hops only costing tens of cycles more. Which is really starting to become negligible. So, it does cost, but where they excelled was multi-core shared memory updates. But, to get back to your comments, that really is not general purpose computing as of today, as most applications on my Linux box are single threaded. Of the few apps that aren't single threaded, ffmpeg and Id's Doom3 engine, they are most likely aren't synchronizing shared memory updates, rather I think they would isolate memory access to certain threads and pass commands around via a dispatcher thread. So, this is a pretty specific type of applications that excel on Barrelfish. I think they are targeting Google's MapReduce and Microsoft's Dryad.
Finally, it's important to notice that HW is moving to a message passing type architecture as well. AMD had implemented HyperTransport and Intel now has the QuickPath Interconnect. So, in Barrelfish, the implementation of the message passing on AMD's cpus is based on cache lines being routed via HT. In other words, hardware accelerated message passing. They isolated the transport mechanism from the message passing API, so I believe they could swap in different accelerated transport implementations depending on the architecture it's currently running on.
I'd assume with the multiple kernels there would be a thin virtualisation layer sitting on top which makes the 'cluster' of kernels appear as a single image with the same sort of result when a single machine goes off line there is maybe a momentary stall as either there is a retry of the processing or issuing a failure notice to the end users - the preferable scenario being the former rather than the later.
Finally, it's important to notice that HW is moving to a message passing type architecture as well. AMD had implemented HyperTransport and Intel now has the QuickPath Interconnect. So, in Barrelfish, the implementation of the message passing on AMD's cpus is based on cache lines being routed via HT. In other words, hardware accelerated message passing. They isolated the transport mechanism from the message passing API, so I believe they could swap in different accelerated transport implementations depending on the architecture it's currently running on.
Well, it is the same argument I remember having about Micro-kernels and the apparent slowness. Although every desires the absolute maximum performance I would sooner sacrifice some speed and have slightly more overhead if the net result is a more stable and secure operating system. Internet Explorer 8 for example has a higher over head because of process isolation and separation but it is a very small price to pay if the net result is a more stable and secure piece of software.
It therefore concerns me when I hear on osnews.com the number of people who decry an increase in specifications off the back of improved security and stability (with a slight performance penalty). Hopefully if such an idea were to get off the ground there wouldn't be a similar backlash because the last thing I want to see is yet another technology that comes in half baked simply to keep the ricers happy that their system is 'teh max speed'. They did it with Windows when moving the whole graphics layer into the kernel, I hope that the same compromise isn't made when it comes to delivering this idea to the real world.
*blink*
Wait a sec, I don't use Windows anymore, so someone please tell me, can you actually run a Win3.1 app today on Win7? Back when I was using XP, I had a number of older Win'98 games that wouldn't work no matter what I did, so it seems MS has been breaking compatibility all along whenever they wanted/needed to.
If Win20 is a bloated monster, I doubt it will be because of compatibility, more likely it'll be due to the size and complexity of that monstrosity to be known as dotNET15.
Yes, as long as you are using 32 bit version. The 64 bit version cannot run 16 bit software due to processor architecture limitations.
Try runnung Java of the time, and you'll see that .Net 15 is not that slow after all.
As I am studying "computer science" I will have to disagree with you. Writing software can be science since writing software includes creating algorithms, and algorithms are a fundamental part of computer science, no matter of the field is artificial intelligence, image analysis or database optimization.
Relax, this is most likely the view of one of purists of science definition as "Acquiring knowledge by applying Scientific Method".
Too bad that definition leaves Math out of it, that's why I refuse to accept such definition, but most people I know gladly accept it. So, it's simply a matter of semantics, I guess.
Please note that I don't mean the word "purist" in an offending way, more like describing a specific line of thought.
RE[5]: Whats the whole point?
I beg to differ. Algorithmic science is very real and even if the science is more abstract than, for example, biology of physics it is a well established science with theories and observations in actual implementations.
You may not agree with me on the definition of science but I would still argue that computer science should be consider a science and hence writing software is a very fundamental part in exploring that science.
Edited 2009-09-27 16:05 UTC
That's not unfortunate, that is the way it should have been done in the first place. If your OS is so bloated because it has to support older legacy applications, then using a virtual machine instead frees your resources since for the most part all you have to do is make sure you include a virtual machine to run the apps. Its what Apple did with OS9 and when OSX finally got to a useable state they dropped OS9. MS has to learn how to push 3rd party developers and customers to the latest technology without being afraid of stepping on some toes. There is no obligation for them to support somebody else old code, but yet they do.
Oeh! Oeh! I know this one!
They have an obligation to shareholders. Windows' backwards compatibility is a major boon for a LOT of people. Ripping it out - as much as I want them to - is not a valid option, money-wise.
People turn on a computer to launch applications, not marvel at the OS or underlying architecture.
But I can see you are one of those people that would break Win32 compatibility and cause hundreds of billions in unneeded migration costs just to have everyone run a system that you think is neat-o.
well, the concept is at least closely related to microkernels.
they split the kernel into this monitors.
I only skimmed parts of it but they're essentially saying that as your system looks more and more like a distributed system you should have an OS that acts accordingly.
They also say the existing mechanisms wont scale whereas message passing does. One you get to more than ~13 cores traditional IPC is actually slower.
> Most of us are probably aware of Singularity,
> a research operating system out of Microsoft
> Research which explored a number of new ideas,
> which is available as open source software.
The difference of the license-side between Singularity and Barrelfish is, that Singularity is open source
http://www.codeplex.com/singularity/license
but Barrelfish is OpenSource (like the OSI-definition, or free software like the FSF-definition):
http://www.barrelfish.org/release_20090914.html
From the technical point of view, it both OperatingSystems are very different and can not be compared to each other.
But from the license side, Barrelfish is a lot better! :-)
I find it strangely odd that Microsoft of companies is giving this away. I bet they never plan to use it in any of their products - likely they consider it a failure and are using it for PR.
Message based? slower than molasses in january - rendered on a 486 with 8mb of ram with 3DStudio Max
To which I just had to give this a -1. It also gives the rest of us the power to mod down and disappear imbecile trolls, you know like yourself. Enjoy, have a nice day.
p.s. Was that ad hominem calling you an imbecile? How does jackass work?
It helps if you think of Microsoft Research as our time's Xerox Palo Alto Research. Lots of good ideas and interesting research that is being largely ignored by the company and not put into any of their products.


