OSv is a cloud-based operating system with build images for Xen, KVM, Amazon web services and VMware coming up. It is written from scratch, “designed from the ground up to execute a single application on top of a hypervisor, resulting in superior performance and effortless management”. Linux.com ran an interview with some of the developers behind OSv.
The operating system will be optimized for Java applications “by exposing OS internals and direct access for page tables, scheduling decisions and the raw IO layer”. It will not be restricted to only Java applications though and will run JavaScript, Scala, Clojure, JRuby, Jython and more on JVM. Surprisingly, C is also supported.
OSv promises “Zero OS management” with “no need for administration, template management, configuration and tuning”. Common Java framework integration consists of “frameworks such as Tomcat, JBoss, SpringSource […]. Common open source technologies such as Hadoop and NoSQL are being optimized and integrated to run on top of OSv.”
So, in a few years maybe, when they succeed in making the virtualization layer vanish completely, they’ll finally arrive at the destination where Solaris has been for the last ~8 years: lightweight application containers (http://en.wikipedia.org/wiki/Solaris_Containers).
Actually, Linux already had that even longer:
http://linux-vserver.org/
And later:
http://openvz.org/
Wasn’t it IBM that did it first in one of their mainframe systems ?.
What is unfortunate is that in Linux it was always ‘out-of-tree’.
A couple of years ago the people at Parallels (OpenVZ), IBM, Google and started adding patches to the mainline Linux kernel.
They created:
http://en.wikipedia.org/wiki/Cgroups and namespaces: http://lwn.net/Articles/531114/ which together form: http://linuxcontainers.org/
I wouldn’t say they are done yet, but it’s been secure since a few Linux kernel release, 3.9 I think it was, but they’ve been picking up more and more steam lately.
On thing that was still missing a year or 2 ago was support for checkpoint and restart of process and process trees with or without namespaces. The OpenVZ guys are working on that and also support live migration now:
http://criu.org/Main_Page
Support for HA-containers (and VMs) also exists:
http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemake…
And some other people have already build other tricks on top of Linux Containers for running application containers:
http://www.docker.io/
Then other people build on top of that:
http://coreos.com/ Operating system specifically build for running Docker application containers.
Linux Containers in general and Docker are also supported by OpenStack: http://www.openstack.org/
Mesos which can distribute applications on 1000s of machines also supports LXC:
http://mesos.apache.org/
And there is a Mesos Docker project too:
https://github.com/mesosphere/mesos-docker
And yes, for those that want it, systemd also uses a lot of pieces from CGroups and LXC:
http://freedesktop.org/wiki/Software/systemd/
Did you know Ubuntu for Android also uses LXC container to run Ubuntu on Android ?: http://www.ubuntu.com/phone/ubuntu-for-android
So, biggest missing piece after mostly improving all the current code ?
A mature Btrfs. Btrfs already support send/receive. ZFS On Linux already supports it of course (but feels a bit like cheating đ and not as well integrated).
So clearly, people haven’t forgotten about containers.
Hell Google and Facebook use it to basically run all their applications.
Edited 2014-01-15 00:56 UTC
I forgot to mention that https://www.openshift.com/ the PaaS system by RedHat also does process isolation, but they use SELiunx.
The PaaS people at Heroku also use LXC for their seperation: https://devcenter.heroku.com/articles/dynos
Now a whole bunch of people from companies like Ebay, RedHat, OpenStack and Docker are working on an even more integrated PaaS solution: http://solum.io/
And one correction/clarification:
Google only uses cgroups I believe, not full on LXC.
Edited 2014-01-15 01:18 UTC
I wasn’t trying to say that Solaris was the first platform or the only that had this. They even openly admitted that they were looking at FreeBSD jails and wanted to take them to their logical extreme.
What I do think, however, is that Solaris’ (and particularly its open-source descendants’) implementation of OS virtualization is one of the most complete and elegant solutions. No out of tree patching, no semi-functional tools that break between versions, etc. It all just works. Moreover, since it was developed as one coherent whole, different subportions of the OS were tuned to the presence of zones. For instance, zones seamlessly integrate with the network virtualization layer (crossbow), so you can created virtual switches, rate limit ports, bind them to output vlans, etc. The package manager also allows you to descend into zones where necessary – this was recently improved further in Solaris 11.1.
So overall I find Solaris zones one of the nicest implementations to use.
While I’ve never used them, that is also what I understood about how Solaris zones works.
My only biggest point was: clearly containers aren’t going out of style. It’s actually very stylish these days to use them. Especially since we now have things like Docker. And people are trying to build ‘cloud ready’ server applications which try to be as stateless as possible.
And I think systemd also wants to get into that space of seamless management of host and containers. Just look at how the CoreOS people are also using systemd.
Yeah, these OS-v guys are an example of them appreciating in “stylish value” What bugs me though is that they are sort of trying to reinvent the wheel. Rather than lay the system out as simply as they can (e.g. use a shared kernel with application containers), they come at it from the traditional fat hypervisor space where legacy application support and consolidations were the driving requirements, and are “rediscovering” functionality that was here already a long, long time ago.
Their model is (from Wikipedia): it does not support a notion of users (it’s not a multiuser system) or processes – everything runs in the kernel address space.
Yeah, I don’t think they get it. They are reinventing the wheel. They are using KVM on Linux, why not go with Linux containers ?
VMs have a few nice aspects – you can move them between servers (without restarting them, even), they’re host-OS independent, and the (security) isolation is arguably stronger.
Still, it does sound like they’re reinventing processes (with the VM+host handling memory protection and multitasking).
Edited 2014-01-15 16:20 UTC
Are you saying you can’t migrate containers ?
I don’t know what is and isn’t possible with Solaris Zones, but OpenVZ and CRIU project can move containers, with TCP-connections intact:
http://criu.org/Main_Page
CRIU project can do live migration of processes in general that are or are not part of a container (checkpoint/restart with memory tracking, just like with a VM).
You even migrate them more quickly, because there is less memory and storage to copy (you only need to copy the storage if you are not using a central storage of course).
Edited 2014-01-15 17:28 UTC
1) When you’re talking about cloud computing (which is their primary market here), vmotion is almost completely unused. Your VMs are typically sitting behind a load balancer and you spin up instances as you need them. Also, vmotion is not free – you need to have shared storage, which is a huge performance hit for many (most) workloads where cloud computing excels.
2) Host-OS independence is meaningless, because it’s the app that needs to be tailor-made to OSv (plus currently their only supported runtime is Java anyway).
3) Security: yeah… http://www.cvedetails.com/vulnerability-list/vendor_id-7506/Qemu.ht…
Modern hardware virtualization is a far cry from the imagined ideal “thin hypervisor”. E.g the number of kittens that have to be slaughtered every time you just want to send a network packet from the a VM boggles the mind…
Only because EC2 doesn’t support migration, and that’s the model that people developed to work around that. VMWare vMotion (& DRS) is an absolute killer feature that’s certainly made my life far easier when I’ve worked on VMWare based platforms. Other cloud platforms are working on adding live migration, and even straight KVM has supported the concept of migration for years now.
You don’t have to use shared storage; you just have to sync the storage prior to the migration.
Unless your application is a children’s toy that is happy with just running a single instance, you’re always going to need load balancers. Moreover, you’ll need to deal with host failures, so resiliency to sudden loss of services is already a must. Lastly, since OS containers don’t really need to boot, you can be up and running in a few seconds – vmotion also needs a short suspend period, so in that regard, they’re almost equivalent.
Then you trade the cost at runtime for the cost at migration time, which becomes significant. You’ll essentially be catching up to the running state of the machine, which can result in non-trivial latency spikes for everybody on the network. Now that’s not to say that it can’t be done, or that it can’t be done right – sure it can – but it creates more fragility. For instance, what happens if the host with the VM data goes down? Your VM is now stuck on it and you can’t migrate away (you can’t just take outdated databases online and put them on the network).
What’s that got to do with it? It doesn’t invalidate the fact that there are plenty of instances where being able to migrate a VM is advantageous, and simple horizontally scaling is either not an option or very difficult.
If anything it’s the inverse: you don’t care about the ability to migrate and maintain state if your platform is “a childs toy”. “Real” applications care about state.
Depends on your platform; vSphere supports fault tolerance, for example.
Real cloud platforms have real networks. If syncing a single instance virtual disc across two machines creates a noticeable spike in network utilisation then boy are you doing it wrong.
Sure, but that’s considered a failure scenario. On platforms that don’t support migration at all, that’s considered normal!
Sure, and those instances are not “cloud computing”. You’re thinking traditional HVM, and that’s not up for discussion here (besides, OSv is not meant for those applications).
Not in cloud computing, where you care mostly about performance. VMs are paper cups you spin up and shut down all the time during the day, depending on load (and you’re billed per hour). You have a couple of back-end database units that hold all of the datasets and then you spin up front-end caches that each hold shards of the data and load-balancers direct clients to the most current cached copy.
Have you ever worked with the thing? FT’s cost to performance and flexibility is *huge*. It’s limited to 1 vCPU, introduces huge delay (the lockstepping is synchronous between hosts), requires that the primary and backup run on exactly the same CPU type and ESXi patch level and have sufficiently fat pipes between them to allow rerouting all network traffic.
Do you understand the difference between throughput, latency and jitter?
If you don’t plan for host failures in your clustered application, then your cluster is shit. Machines/networks fail and the more you have of them, the more often they fail. Something makes me think you’ve never actually had to manage a clustered application.
Again, only because EC2 doesn’t support that use model. Again, other cloud providers are adding support for live migration.
YOU might care about performance, but there are people who care about state. YOU might not want to put those kinds of workloads onto a cloud, but there are other people who do.
That’s ONE model, yes. Again, mostly born from the fact that EC2 doesn’t support the ability to migrate instances…
Right. And? You can still use FT to mitigate host failure. That’s what it’s FOR. Just because it’s limited doesn’t make it useless.
Sure, but that doesn’t mean that less options to build fault tolerance is somehow BETTER, and it doesn’t change the fact that some applications cluster poorly: Nagios is a great example, where it stores all the state, including check results, in memory at runtime. How the hell are you supposed to cluster that?
Ah yes, the old “Your experience and use cases are different to mine therefore your argument is invalid” defence.
You’re confusing cloud computing with workload consolidation and traditional virtualization. I know it’s modern to call everything and the kitchen sink cloud computing, but that’s simply not the case. Always consider OSv when you’re thinking about apps. Ask yourself: does this thing have zero state and needs to scale up and down as workload arrives? If yes, then you’re possibly looking at a candidate app that makes sense to run in a “cloud computing” environment. If not, then it’s most likely a traditional HVM virtualization case.
Right. And? You can still use FT to mitigate host failure. That’s what it’s FOR. Just because it’s limited doesn’t make it useless. [/q]
VMware FT is for when VMs are high-value, like in workload consolidation and legacy systems maintenance, not when they are low-value and need (mostly network) performance above everything else, like in cloud computing. If you just want a VM to keep your e-mail in, then that’s not cloud computing and you’re not a potential target for OSv either.
Sure, but that doesn’t mean that less options to build fault tolerance is somehow BETTER, [/q]
It means that if your system can’t tolerate failure, then it cannot be relied upon for mission critical applications. It doesn’t matter if the downtime is once a year for maintenance, or once every 3 years from hardware or network failure (and you’ll know from experience that unexpected failures are nowhere near uncommon).
1) Nagios isn’t a cloud computing app, it won’t run on OSv, so the example is moot.
2) If you absolutely need migration, use something like OpenVZ which can migrate containers.
3) If you *must* run a monitoring system in a cloud computing environment, use something that clusters well: http://zabbix.com/
Always remember the topic of this article: a single-process Java-focused web-centric scalable custom OS kernel with no persistent state whatsoever and just minimal support for persistent state for the app. *That’s* cloud computing. It’s not for running your company’s Exchange server…
There are reasons why I think you don’t:
1) You have huge misconceptions on what cloud computing actually is and confuse it with workload consolidation. They’re not the same. Cloud computing apps are designed to scale on demand and cluster well and that’s the topic of this article.
2) You consider VMware FT suitable for cloud computing, whereas it’s actually a tool to protect a few high-value low-performance VMs that don’t do a lot of networking but need uptime (think: a small CRM, traffic controllers, apps that don’t handle reconnects well, etc.)
3) I’ve actually written a fair amount of clustered code with 24/7 uptime requirements. It doesn’t matter whether a node failure is due to planned downtime, hardware failure or intermittent network failures (in fact, dealing with planned downtime is trivial by comparison).
I mean you no disrespect when I say “I don’t think you’ve ever had to manage a cluster”, it’s just my honest impression, because you seem completely oblivious to the core principles of cluster design.
My previous job was working for HP Cloud Services: myself and my colleague personally stood up the first hardware and Operations platform that now runs their production storage & compute systems. I worked very closely with their compute (Nova) team, who were in the same office as me.
a) I know what a cloud platform is.
b) I know what the TRADITIONAL definition of cloud WAS.
c) I know what the CURRENT definition of cloud IS.
I also know what the type of workloads that people tend to run on clouds: they’ve been moving more towards an on-demand hosted HVM
platform for years now. The quaint notion that a cloud is just for low value on-demand “Spin ’em up and knock ’em down” VM’s is over. More and more people want to use it as a hosted HVM solution.
No. That was your misconception. Go back and read what I actually wrote.
As have I. I’ve also built out critical operations infrastructure with 24/7 uptime (DNS resolvers don’t get to “sometimes be down”, for example). Which is why I know that the more tools I have at my disposal (like, say, the ability to migrate an instance and retain state), the better, and why I understand that a layered approach to redundancy is always more robust than just assuming that your cluster and LB will save you.
Again, this is the “My experiences are different to yours so yours are invalid” defence. It’s also massively condescending and obnoxious.
Recent versions of VMware vSphere can vMotion without shared storage although it’s preferred.
Hyper-V 3.0 as implemented in Windows Server 2012 has “shared-nothing” live migration.
But if you don’t have shared storage, you’ll need a really fast network.
Again, the main difference is that OSv neither is nor aims to be “OS Virtualization”. It is a guest OS designed as a piece-in-puzzle for complete Hardware Virtualization solution. That it shares some resemblance with OS Virtualization in some aspects is a happy “accident” of good design.
Maybe, but a technology is a lot more than its working description. Although the end goal is very much similar to containers, what’s behind makes all the difference in the world – not only in terms of performance but also in terms of what can be achieved.
OSv will do what it does using the hardware virtualization primitives provided by the processor so it not only can be faster while being more isolated (the processor is helping you after all), but it can allow you to expose hardware features directly to the application in a safe manner – again, because Intel VMX is multiplexing that for you.
There is a good post I wrote at our G+ page about that, that deals with this in more details: http://bit.ly/1aae8z2
There must be a use for KVM / virtualization even on Solaris. Look at what Joyent is doing. They (most of them Solaris developers from Sun/Oracle) have forked Solaris and ported KVM to it.
Sure there is. It’s for running apps that are platform-specific, although lately there’s been a push to get Linux-branded zone support back:
http://www.listbox.com/member/archive/182179/2014/01/sort/time_rev/…
However, the topic here was OSv, which is a completely different OS, so you gotta compare apples to apples.
This was inevitable, and good.
General purpose OSes (Windows, Linux, etc) have way too much functionality, and layers, between the hardware/hypervisor and the application a user is interested in. And yes that includes those same operating systems with a “server” sticker slapped on the box.
Given that many many brain cycles, blood sweat and tears are expended trying to get general purpose mostly personal-PC operating systems running well in a massively virtualised platform as a service – this direction is right.
It recognises that the common runtime is not an general purpose OS – it is JVM or something like that (Python app engine?)
I wish it well. Good bye full-fat-OS-virtualisers.
It’s good to see real innovation and rethinking of orthodoxies.
Good set of slides explaining the real differences with traditional hypervisors:
https://docs.google.com/presentation/d/11mxUl8PBDQ3C4QyeHBT8BcMPGzqk…
And already done a long time ago:
http://en.wikipedia.org/wiki/FreeBSD_jail
http://en.wikipedia.org/wiki/Solaris_Containers
http://en.wikipedia.org/wiki/Openvz
What does OSv add to the mix that none of the above don’t already do much easier and faster?
Containers are not necessarily faster. We have the hardware on our side, and containers do not.
We also can expose hardware functionality directly to applications because of that, that containers won’t.
The opposite is also true: there are certain things that containers easily do that we don’t even try.
I implemented a lot of the recent containers features for Linux and spent most of my past ~2 years giving speeches about Containers in key Linux conferences. So it is not that I think they don’t have a place – they do, but those are quite different things, really, despite how much they may look alike in the surface.
Please take a look at this G+ post ours for more info: http://bit.ly/1aae8z2
This is why I keep coming back to OSNews. Once in a while there is a gem like this that I wouldn’t hear about in other places where I read technews. Now I hope someone can write-up a nice comparison with Jails/Containers, some benchmarks and stability reports would be appreciated and I am curious if this is really 1 application-only (designed from the ground up to execute a single application)
I posted too quick. After reading a bit more it became clear that it isn’t a normal operating system. It cannot run on bare metal but it can only run on a hypervisor (specifically KVM/Linux). So basically it is a mini-guest OS that can deliver great performance for 1 application because the guest os is simple/small/fast (although no proof for this performance can be found).
They basically shift most of the normal OS-tasks (talking to the hardware) to the hypervisor so they only have to support running on the hypervisor. Smart, but that also means that you are depending entirely on the quality of the hypervisor for performance.
What I don’t understand is how they can claim “translated directly to capex saving by reduction of the number of OS instances”. If you can only run 1 app on 1 guest that basically means that to run 5 apps you need:
1 host Linux/KVM
5 guests (osV lightweight)
5 runtimes (JVM)
5 apps
While on other setups you would only need
1 host Linux/KVM
1 guest (Linux middleweight)
1 runtime (JVM)
5 apps
Also, this quote seems to be most realistic about what OSv actually is:
Can OSv run on top of Linux?
Avi: Yes, Linux acts as the hypervisor with KVM. So OSV runs it. If you have a cloud that is based on KVM then you have OSv running on Linux.
Glauber: If you take the hypervisor as the layer for granted, then by all means OSv is an operating system. But if you look at the whole stack and you’re running KVM, which is essentially Linux, OSv is basically a library that you attach your application to and you can boot directly on KVM. You’re booting that application and using KVM as a containing mechanism.
KVM has also been ported to http://en.wikipedia.org/wiki/SmartOS by Joyent so in theory it might run on that too,
You are right, but the core to understanding OSv is that we are not proposing that people get their baremetal applications and OSv-ify them. We are targeting people who are already in the Cloud, where hypervisors are a given, and providing them with a better option.
It is the same number of components, but your management cost does not come from the number of components, but rather from how much maintenance you need to throw at them. OSv aims at reducing that both in the Guest OS layer and in the JVM layer. The easier is very easy to understand why, the later could benefit from an example: our newly released version will include a feature that automatically calculates the Heap Size for you. We can safely do it because we have no competing applications inside the VM, and this is something that people in the Java world put some quite considerable effort in getting right.
There is a video in our youtube channel where I explain more about this functionality: http://www.youtube.com/watch?v=gXHdhkTVM6o
And we have more on our pipeline.
Java is now and always will be “Just Another Vulnerability Announcement”
When will these people learn?
On my list of things to play with – thanks!
From the article:
The reason is any application written for the Java VM is automatically compatible with OSv.
So it must run Android apps on it.
Android apps do not run on a Java VM. You can’t take an Android app and run it on a standard Oracle Java VM. Doesn’t work.
Android apps are written in Java, but they aren’t compiled to Java bytecode and won’t run on standard JVMs.
I am confident the so called library OS are going to increase their usage.
Although one can do process isolation and separation using some form of containers in HP-UX, Tru64, BSD/Linux distributions, Windows Hyper-V among others.
There are quite a few OS layers that are useless, specially given the fact that many cloud deployments are using one server type per VM (DB, Webserver and so forth).
Another set of library OSs for those interested,
Drawbridge from Microsoft Research
http://research.microsoft.com/en-us/projects/drawbridge/
MirageOS from Cisco/Xen Foundation
http://www.openmirage.org/
http://www.infoq.com/presentations/mirage-os?utm_source=infoq&utm_m…
I just don’t see the advantage of a “library OS” over a traditional container (aka OS-level virtualization).
Performance.
As in: worse performance is better? OS-level virtualization has no performance degradation at all.
Better performance and more available memory.
From what I understand how LibraryOS are being designed, a pico-hypervisor is way more lightweight than a full OS kernel, even with containers.
You just get the bare minimum for doing virtualization of the existing hardware and VM time slices.
Everything else, drivers, thread management, network stack and so on, are implemented at the language runtime level.
This trims down the fat on needless OS layers for the respective virtualized application and also minimizes the amount of context switches between pico-hypervisor and application.
You seem to be misunderstanding what I’m saying. OS-level virtualization means you have a single kernel, running directly on top of real hardware and simply partition up the userspace processes into isolated containers. From the kernel’s point of view your application is just another process running on the OS, with the added limitation not of being allowed to reach outside of its container.
So how do allow applications inside OS-level containers to bypass the whole OS stack and talk directly to virtualized hardware like LibraryOS allow?
They don’t, that’s the trick. There isn’t any virtualized hardware. In fact, there isn’t any virtualization or hypervisor. It’s just userland processes sitting in isolated islands on the host kernel. Think horizontal partitioning, not vertical.
There is no free lunch. There is a cost of implementing OS level virtualization, and it is very far away from zero. The cost goes up the more you want to guarantee isolation.
“The kernel is the same” does not mean you don’t duplicate resources, it only means that the kernel now needs to use more complex algorithms and in many cases, the way to do that is exactly by duplicating resources. Just they are inside the kernel, so you don’t easily see them.
What? Have you ever actually looked inside of a container implementation? In the vast majority of cases the isolation comes down to a single == whether the request is coming from the right container. More complex things, like separate IP stacks, are still extremely cheap (compared to the enormous cost of doing things like softintr’s and interprocessor interrupts in HVM), since all you do is allocate a few more data structures (a little extra memory), but your overall algorithmic complexity doesn’t increase by that much. In fact, a hypervisor still has to perform all of the extra operations an OS container does, plus it needs to pull expensive strings in HVM to get the operation completed. E.g. to deliver a network packet it must get a hardware interrupt, do an ARP lookup, softintr, VM enter and the guest will then again do a second (virtualized) hardware interrupt, ARP lookup and enter the packet into the networking stack. VMDq cuts the cruft down somewhat, but that requires hardware support, whereas OS-level virtualization simply disposes of the whole virtualization nonsense altogether and simply handles the packet at the first opportunity completely.
On an IBM mainframe, you might choose to run z/VM, which is a dedicated OS for hosting virtual machines. Inside each VM, you might choose to run CMS – which is meant as a light single-user OS. CMS used to be able to run on raw hardware, but it was never really meant to, and the current versions are entirely dependent on a hypervisor.
The idea is to use z/VM almost as a HAL, with separate VMs for each task, running CMS to host user programs, or other single-task OSes to provide databases/IPC/authentication/etc. (These days you can also host linux VMs.)
It started life as CP/CMS in the late 60s, morphing into something close to its current form with VM/370 around 1972. Everything old is new again.
Edited 2014-01-15 09:49 UTC
Another mainframe stuff from IBM that I also find interesting is OS/400, nowadays IBM i.
Kernel developed initially in Assembly/Modula-2, nowadays mostly C++.
All userspace applications are bytecode based (TMI) and they are AOT compiled at installation time.