Linked by Thom Holwerda on Fri 25th Sep 2009 23:12 UTC, submitted by Still Lynn
Microsoft Most of us are probably aware of Singularity, a research operating system out of Microsoft Research which explored a number of new ideas, which is available as open source software. Singularity isn't the only research OS out of Microsoft; they recently released the first snapshot of a new operating system, called Barrelfish. It introduces the concept of the multikernel, which treats a multicore system as a network of independent cores, using ideas from distributed systems.
E-mail Print r 8   · Read More · 69 Comment(s)
Thread beginning with comment 386376
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE: Comment by kaiwai
by sbergman27 on Sat 26th Sep 2009 04:52 UTC in reply to "Comment by kaiwai"
sbergman27
Member since:
2005-07-24

I do, however, wonder what the overhead is,

IIRC, "Message Passing" is Latin for "Slower than the January Molasses".

All that hardware bandwidth. All that potential for fast, low-latency IPC mechanisms. And it gets wasted, killed by latency, passing messages back and forth.

I always knew that the fantastically powerful computers of the future, running the software of the future, would perform significantly more poorly than what we have today. And this concept may just be a glimpse of how that future is to unfold.

Reply Parent Score: 5

RE[2]: Comment by kaiwai
by kad77 on Sat 26th Sep 2009 05:07 in reply to "RE: Comment by kaiwai"
kad77 Member since:
2007-03-20

You don't think the hardware will adapt?

I doubt it, seems logical that new processors will be designed with pipelines facilitating nanosecond IPC.

Microsoft just provided very costly R&D to the IT community free of charge, and is signaling to their partners that theoretical technology is now practical to some extent ....

... and in essence communicating that they should plan accordingly!

Reply Parent Score: 2

RE[3]: Comment by kaiwai
by tobyv on Sat 26th Sep 2009 05:14 in reply to "RE[2]: Comment by kaiwai"
tobyv Member since:
2008-08-25

Microsoft just provided very costly R&D to the IT community free of charge


Very costly R&D, or very cheap PR?

Reply Parent Score: 2

RE[3]: Comment by kaiwai
by Ed W. Cogburn on Sat 26th Sep 2009 07:00 in reply to "RE[2]: Comment by kaiwai"
Ed W. Cogburn Member since:
2009-07-24

You don't think the hardware will adapt?


Can it, while remaining compatible with the existing architecture? Intel/AMD aren't going to break the backwards compatibility of the x86 arch without a darn good reason. Keep in mind that backwards compatibility is largely why x86 has been so successful.

And if this were easy, wouldn't it already have been done by now? After all, message-passing is also at the heart of the microkernel concept, an idea that has been around for decades but has gone absolutely nowhere because of, as the GP sarcastically pointed out, bad performance.

I suspect they're going to need some hard proof of really dramatic improvements before Intel/AMD will pay attention to them.

Reply Parent Score: 3

RE[3]: Comment by kaiwai
by Ender2070 on Sun 27th Sep 2009 04:18 in reply to "RE[2]: Comment by kaiwai"
Ender2070 Member since:
2009-07-24

You don't think the hardware will adapt?

I doubt it, seems logical that new processors will be designed with pipelines facilitating nanosecond IPC.

Microsoft just provided very costly R&D to the IT community free of charge, and is signaling to their partners that theoretical technology is now practical to some extent ....

... and in essence communicating that they should plan accordingly!


hardware will never adapt. microsoft doesn't make an OS that doesn't require new hardware.

another point, microsoft will never use this in any of their products or they wouldn't have given it away. being message based means its slow so they realize its useless and they're trying to look good by giving something away they've appeared to work on for the last few years (but gave up on in reality because its useless)

Reply Parent Score: 2

RE[2]: Comment by kaiwai
by tobyv on Sat 26th Sep 2009 05:10 in reply to "RE: Comment by kaiwai"
tobyv Member since:
2008-08-25

I do, however, wonder what the overhead is,
IIRC, "Message Passing" is Latin for "Slower than the January Molasses".


FYI, their paper does argue that message passing on a multicode architecture is significantly faster than shared memory access on the same machine.

But then they explain they have made the "OS structure hardware-neutral" in 3.2.

So in other words: Let's use message passing since it is fast on our AMD development machine, but if it is too slow on the next gen hardware, we will switch to something else.

Not exactly solving the problem, IMHO.

Edited 2009-09-26 05:11 UTC

Reply Parent Score: 1

RE[3]: Comment by kaiwai
by Mike Pavone on Sat 26th Sep 2009 15:04 in reply to "RE[2]: Comment by kaiwai"
Mike Pavone Member since:
2006-06-26

But then they explain they have made the "OS structure hardware-neutral" in 3.2.

So in other words: Let's use message passing since it is fast on our AMD development machine, but if it is too slow on the next gen hardware, we will switch to something else.

Not exactly solving the problem, IMHO.

That's not really an actual portrayal of what they said.

Their basic conclusion is that as the number of cores increases, the cost of cache-coherency will increase such that updates that span multiple-cache lines will be slower than passing a message to each core and letting the update occur locally. There's no real way around this problem so assuming that core counts continue to increase using a message passing approach like they took here, will make sense (it already does on large machines, there doesn't seem to be much of an advantage on 4 core machines).

What is architecture specific is the most efficient message passing method. From what I gathered from the paper, a lot of this is handled by the system knowledge base, but even if a future piece of hardware requires a fundamentally different message passing mechanism (like the addition of a dedicated inter-core messaging) it won't require a fundamental change in the organization of the OS.

Reply Parent Score: 4

RE[2]: Comment by kaiwai
by Aussie_Bear on Sat 26th Sep 2009 07:32 in reply to "RE: Comment by kaiwai"
Aussie_Bear Member since:
2006-01-12

An Intel Engineer once said it best:
=> "What Intel Giveth, Microsoft Taketh Away"

Reply Parent Score: 1

RE[3]: Comment by kaiwai
by 3rdalbum on Sat 26th Sep 2009 09:17 in reply to "RE[2]: Comment by kaiwai"
3rdalbum Member since:
2008-05-26

An Intel Engineer once said it best:
=> "What Intel Giveth, Microsoft Taketh Away"


That's not completely fair - it's more like "What Intel Giveth, Symantec Taketh Away"...

Reply Parent Score: 11

RE[2]: Comment by kaiwai
by happe on Sun 27th Sep 2009 04:19 in reply to "RE: Comment by kaiwai"
happe Member since:
2009-06-09

"I do, however, wonder what the overhead is,

IIRC, "Message Passing" is Latin for "Slower than the January Molasses".

All that hardware bandwidth. All that potential for fast, low-latency IPC mechanisms. And it gets wasted, killed by latency, passing messages back and forth.

I always knew that the fantastically powerful computers of the future, running the software of the future, would perform significantly more poorly than what we have today. And this concept may just be a glimpse of how that future is to unfold.
"

All communication, basically, involves messages. It all depends on the sender and receiver. Memory can viewed as a service that handles read and write requests (messages).

In multi-core systems inter-core communication must go through memory, except for atomic operation coordination, which obviously has to be core-to-core. This results in multiple messages going back and forth for at simple exchange og infomation:

1. Sender: write data to memory (write msg)
2. Sender: inform receiver of new data (read/write, core-to-core msgs).
3. Receiver: read data from memory (read msg)
4. Receiver: inform sender of reception (read/write, core-to-core msgs).

I have left out all the nasty synchronization details in #2 and #4, but it usually involves atomic updates of a memory address, which can cause core-to-core sync messages, depending on cache state. Also, cache coherency in general might cause lots of messages.

It is easy to imagine that this could be done faster and in fewer steps if low-level core-to-core communication were provided. All the hardware is already point-to-point.

My point is, that it is not the message passing in u-kernels that gives the overhead. In fact, it is the extra protection (a long story).

Also, shared memory as a programming platform doesn't scale if you code programs the obvious way. You have to know what's going on underneath. It's like cache optimization. You have to know the cache (Lx) and line sizes before you can do a good job. The non-uniform in NUMA does make things better.

I think we have to make memory a high-level abstraction and give OS and middleware programmers more control of what is communicated were.

Reply Parent Score: 2

RE[2]: Comment by kaiwai
by Brendan on Mon 28th Sep 2009 06:44 in reply to "RE: Comment by kaiwai"
Brendan Member since:
2005-11-16

Hi,

"I do, however, wonder what the overhead is,

IIRC, "Message Passing" is Latin for "Slower than the January Molasses".

All that hardware bandwidth. All that potential for fast, low-latency IPC mechanisms. And it gets wasted, killed by latency, passing messages back and forth.
"

If you compare 16 separate single-core computers running 16 separate OSs communicating via. networking, to 16 separate CPUs (in a single computer) running 16 separate OSs communicating via. IPC, then I think you'll find that IPC is extremely fast compared to any form of networking.

If you compare 16 CPUs (in a single computer) running 16 separate OSs using IPC, to 16 CPUs (in a single computer) running one OSs; then will the overhead of IPC be more or less than the overhead of mutexes, semaphores, "cache-line ping-pong", scheduler efficiency, and other scalability problems? In this case, my guess is that IPC has less overhead (especially when there's lots of CPUs) and is easier to get right (e.g. without subtle race conditions, etc); but the approach itself is going to have some major new scalability problems of it's own (e.g. writing distributed applications that are capable of keeping all those OSs/CPUs busy will be a challenge).

-Brendan

Reply Parent Score: 2