Linked by Thom Holwerda on Tue 20th May 2014 21:23 UTC, submitted by BloopFloop
OSNews, Generic OSes

Arrakis is a research operating system from University of Washinton, built as a fork of Barrelfish.

In Arrakis, we ask the question whether we can remove the OS entirely from normal application execution. The OS only sets up the execution environment and interacts with an application in rare cases where resources need to be reallocated or name conflicts need to be resolved. The application gets the full power of the unmediated hardware, through an application-specific library linked into the application address space. This allows for unprecedented OS customizability, reliability and performance.

The first public version of Arrakis has been released recently, and the code is hosted on GitHub.

Thread beginning with comment 589286
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[2]: This is awesome
by Megol on Wed 21st May 2014 15:08 UTC in reply to "RE: This is awesome"
Megol
Member since:
2011-04-11

thesunnyk,

"I've always been toying with the idea of forking and continuing with Barrelfish. The idea, if you're not aware, is to have a kernel per CPU core. This allows you to think about your computer in an inherently more distributed sense, pushing computation out over the network or otherwise having your computer "span" devices or even the internet.

"

QNX have some support for this using a network as the "bus" layer.
Other systems have been designed for supporting distributed single system image. Limiting the support to kernel design isn't the best way towards that, many other system layers should be adapted to support variable latency and proper handling of link failures.


I like this idea as well! Not sure if it'd be useful for normal people, but what it offers is kind of an alternate to VPS, with dedicated resources yet less than the cost of a dedicated server. This model makes a lot of sense especially with NUMA systems, which are inherently more scalable than uniform memory access due to the overhead of cache coherency that x86 mandates.


Do you know that NUMA was first used in systems without x86 processors? Do you realize that much of the work doing scalable coherency protocols have been done on RISC systems?
In short: this isn't something x86 specific, it's common for all systems following the Von Neumann design.

Reply Parent Score: 2

RE[3]: This is awesome
by Alfman on Wed 21st May 2014 19:14 in reply to "RE[2]: This is awesome"
Alfman Member since:
2011-01-28

Megol,

QNX have some support for this using a network as the "bus" layer.
Other systems have been designed for supporting distributed single system image. Limiting the support to kernel design isn't the best way towards that, many other system layers should be adapted to support variable latency and proper handling of link failures.


Well, the trouble with this is that NUMA is being designed to solve some inherent scalability problems of shared memory systems. And although you can apply some hacks to SMP operating systems to better support NUMA, generic SMP / MT software concepts are flawed by shared memory design patterns that fundamentally cannot scale. In other words they reach diminishing returns that cannot be overcome by simply adding more silicon.

I'm only vaguely familiar with Barrelfish, but one of it's goals is to do away with the design patterns that imply serialization bottlenecks, which are common to conventional operating systems today. In theory all operating systems could do away with the serial bottlenecks too, but not without "Limiting the support to kernel design" as you said. Physics is eventually going to force us to adopt variations of this model if we are to continue scaling.

Edited 2014-05-21 19:14 UTC

Reply Parent Score: 3

RE[4]: This is awesome
by Kebabbert on Thu 22nd May 2014 22:57 in reply to "RE[3]: This is awesome"
Kebabbert Member since:
2007-07-27

"...Well, the trouble with this is that NUMA is being designed to solve some inherent scalability problems of shared memory systems. And although you can apply some hacks to SMP operating systems to better support NUMA, generic SMP / MT software concepts are flawed by shared memory design patterns that fundamentally cannot scale. In other words they reach diminishing returns that cannot be overcome by simply adding more silicon..."

And this is exactly why the largest SMP servers on the market has 32 sockets, like the IBM P795 Unix server. Some IBM Mainframes have 32 sockets as well. Fujitsu even has an 64-socket Solaris server M10-4S. The largest SMP servers on the market has 32 sockets, a few (only one?) has 64 sockets. Sure, these are not really true SMP servers, they have some NUMA characteristiscs as well. But in effect, they behave like true SMP servers, for example look at the bottom picture, and you will see that each cpu is connected to another cpu in at most 2-3 hops - which essentially is like a true SMP server:
http://www.theregister.co.uk/2013/08/28/oracle_sparc_m6_bixby_inter...

OTOH, a numa cluster (all numa servers are clusters):
http://en.wikipedia.org/wiki/Non-uniform_memory_access#NUMA_vs._clu...
like the SGI Altix or UV2000 server, or ScaleMP server - both has 100.000 cores and 100s of TB. All these servers have awfully bad latency when trying to reach cpus far away - true hallmark of a NUMA cluster. These Linux clusters can not be used to run SMP workloads, they are only fit for HPC parallel workloads. Typical SMP workloads are large Enterprise business systems, databases in large configurations, etc - where each cpu needs to talk to each other frequently, so there is lot of traffic and communication - SMP workloads. HPC workloads are run on separate nodes, not much communication - typically HPC workloads. All servers with more than 32 sockets on the market, are HPC clusters. Like the Linux servers with 10.000s of cores, or even 100.000 cores - they are all clusters. Not a single fat SMP server:


Regarding the huge ScaleMP Linux server with 1000s of cores and gobs of terabytes of RAM, yes it is a cluster that is tricked into believing it is a single huge fat SMP server running single image Linux kernel. It can not run SMP workloads, only run the easier HPC number crunching:
http://www.theregister.co.uk/2011/09/20/scalemp_supports_amd_optero...


Since its founding in 2003, ScaleMP has tried a different approach. Instead of using special ASICs and interconnection protocols to lash together multiple server modes together into a SMP shared memory system, ScaleMP cooked up a special software hypervisor layer, called vSMP, that rides atop the x64 processors, memory controllers, and I/O controllers in multiple server nodes....vSMP takes multiple physical servers and – using InfiniBand as a backplane interconnect – makes them look like a giant virtual SMP server with a shared memory space. vSMP has its limits.
...
The vSMP hypervisor that glues systems together is not for every workload, but on workloads where there is a lot of message passing between server nodes – financial modeling, supercomputing, data analytics, and similar parallel workloads. Shai Fultheim, the company's founder and chief executive officer, says ScaleMP has over 300 customers now. "We focused on HPC as the low-hanging fruit



And regarding the SGI Altix and UV1000 Linux server with 1000s of cores and gobs of RAM, it is also a HPC number crunching server - it is not used for SMP workload, because it does not scale well enough to handle such difficult workloads. SGI says explicilty that their Linux servers are for HPC only, and not for SMP.
http://www.realworldtech.com/sgi-interview/6/
The success of Altix systems in the high performance computing market are a very positive sign for both Linux and Itanium. Clearly, the popularity of large processor count Altix systems dispels any notions of whether Linux is a scalable OS for scientific applications. Linux is quite popular for HPC and will continue to remain so in the future,
...
However, scientific applications (HPC) have very different operating characteristics from commercial applications (SMP). Typically, much of the work in scientific code is done inside loops, whereas commercial applications, such as database or ERP software are far more branch intensive. This makes the memory hierarchy more important, particularly the latency to main memory. Whether Linux can scale well with a SMP workload is an open question. However, there is no doubt that with each passing month, the scalability in such environments will improve. Unfortunately, SGI has no plans to move into this SMP market, at this point in time

Reply Parent Score: 2