OpenVMS Cluster Achieves 10 Year Uptime

Submitted by Kenneth Farmer 2006-01-10 OS News 21 Comments

“According to George Cook of WVNET this cluster has been up for over 10 years. WVNET is the West Virginia Network, a dynamic service organization providing telecommunications and computing services within West Virginia. WVNET was created in 1975 to provide central computing facilities and wide-area network communications linking its ‘central site’ computing resources in Morgantown with the campus computing systems at most of the colleges and universities throughout the state. The cluster consists of an Alpha 4100 (with four 533Mhz CPUs) running VMS 7.3-2; a VAX 6000-630 running VMS 7.3; and four DEC 3000 workstations running VMS 7.3-2.”

About The Author

Thom Holwerda

Follow me on Mastodon @[email protected]

21 Comments

2006-01-10 2:02 pm

Ronald Vos
BSDs outdone by VMS.

10 years is impressive, although I have no idea how other servers/clusters compare.
2006-01-10 2:18 pm

misha
Probably they hold that cluster only like a toy with huge uptime period. You won’t be able to run any task, with which people compare word ‘cluster’. Impresive software and hardware, though.

2006-01-10 2:32 pm

Thom Holwerda
Probably they hold that cluster only like a toy with huge uptime period.

Did you even READ the article?

“The cluster provices a variety of services:

– Runs the Banner student information system (on Oracle) for one of the member institutions of higher education.

– Runs multiple Oracle/Banner test databases to support Banner running on VMS at some of other member institutions.

– Is the Usenet news server (running DNEWS).

– Is one of the Domain Name (DNS) servers for several hundred domains.

– Provides DHCP, Bootp, Kerberos and print services

– Hosts the VMS Mosaic web site and ftp server.”
2006-01-10 3:04 pm

evdjj3j
I went to college in West Virginia this cluster is definitely not a toy. All students have access and must learn to use VMS on this cluster. Also the student have to write programs and compile them on the cluster.

2006-01-10 2:44 pm

WebWeasel
I’ve been working with OpenVMS machines for years. Usually what brings them down is a bad fan. I don’t know why but I’ve seen it a bunch of times.
2006-01-10 3:05 pm

Sphinx
They don’t call it non-stop networking for nothing. Our only real outage in 3.5 years of use as a back-end trading system was a common flash tank burn out, cost us about 8 hours and triggered an SEC investigation though. Did they ever get over the speed of light thing? The distance that you could spread the replicating halves of a cluster was not very far apart was the only real drawback I recall. That and no kids wanted to learn VMS.
2006-01-10 3:35 pm

Alwin
To quote from the History page of the (on-going) Coyotos research OS:

http://www.coyotos.org/history.html

“The last GNOSIS/KeyKOS system was turned off circa 2000 after running without interruption for 17 years.”

(KeyKOS being a predecessor of the EROS project, and EROS being re-done as Coyotos). Not entirely sure if that means ‘on 24/7 for 17 years’ or really ’17 years operation without a single reboot’, but it sure reads as the latter.

While nothing to sneez at, reading ‘cluster’ (=redundancy, rebooting/swapping individual nodes possible) made me less impressed. Personally I am more impressed with long uptime of individual, non-redundant systems. And that are doing real work, I might add. Maybe some small, easy, insignificant job, but still: providing some real service to the outside world. On a side note: best I’ve done myself (FreeSCO/Linux 2.0.x based home router/NAT box) is only around 3 months (limited by cheap, crappy hardware and brownouts/power failures/lack of a UPS).

Given a well-engineered OS, hardware is really the limiting factor. Software can reach 100% reliability (at least in theory), but the real-world hardware it runs on, never can. Redundancy, error-correction, UPS and such only help to push the number of 9’s in % uptime/availability.

(BTW: I know uptime != availability, but don’t care much since I don’t happen to be in the 99,99999% business 😉
2006-01-10 3:52 pm

lagitus
Do they mean to say that the cluster as a whole has not gone down for 10 years or have all the machines really not been booted for that long?

2006-01-10 5:48 pm

Celerate
Uptime usually means a period of time during which a computer has been running uninterrupted.

If the machine hadn’t been booted that would be downtime.

2006-01-10 4:16 pm

Smartpatrol
VMS makes computers into more of an appliance that you just plug in like a refrigerator. To bad engineers now adays don’t posses the desire to create new and modern always on computer systems such as these.
2006-01-10 4:21 pm

gilboa
Somehow, reading this makes my personal record of ~20 months (On a crappy old P133 Linux email server that died due to a PSU failure…) look pretty flimsy.

Baaah!
2006-01-10 5:04 pm

Anonymous
Reading some of your posts, I think … *grunt grunt* Big uptime, big dick!

It’s pretty amusing. Uptime is a measure of ePenis size among *nix fans I’ve noticed. Cue the “Because Windows can’t stay up for more than 3 days!!1!1!” jokes.

2006-01-10 5:57 pm

Celerate
Wow, you just have to take a stab at every pro Linux/Unix article and comment don’t you.

“Reading some of your posts, I think … *grunt grunt* Big uptime, big dick!”

Either you’re envious, or you need to see a psychologist.

Long uptime on a machine that gets a lot of use usually means reliability, if the software crashes and needs a reboot the uptime count also has to restart. Having a system crash is a pain in the arse for people using the computer at the time and work is lost, that is why people like reliability.

“Cue the “Because Windows can’t stay up for more than 3 days!!1!1!” jokes.”

Why would we?

I don’t have a bone to pick with Windows fans, I leave them alone and don’t post flamebait in articles and threads of theirs. You on the other hand see Linux or Unix mentioned (or you bring it up yourself if it isn’t) and you immediately start taking a dump on it.

2006-01-10 5:11 pm

vegburner
For those really needing uptime, a known good Linux or BSD kernel and a good UPS is what you need. I needed 24/7 availability on a server computer with no budget, and 2.0.something fit the bit. 3 years and a half with extremely heavy load (network and disk i/o + bzip2), and what shut it down was an extended power outage.

A well-made OS should never crash, and never needs to reboot, as remote vulnerabilities are very rare nowadays.

So 10 years on VMS with probably a lot of care is not really a performance, just a sign that this system was well-made (and that we already knew).
2006-01-10 5:16 pm

shadow303
I’ve recently had to start using VMS for a new work assignment, and I’d say there is a good reason that people don’t want to learn it. DCL is such a nasty command line interface to use. It’s just such fun typing weird things like “set def [.foo]” instead of “cd foo”. Don’t even get me started on the monkey-business required to remove a directory. The automatic file versioning is interesting, but I don’t have a use for it, so I have to either purge the old versions or deal with cluttered directory listings. There are just too many weird things and not enough benefits over *nix for it to be worth the effort. Oh, and including a program “vi” which isn’t the famous text editor is just an evil thing to do 😉

2006-01-10 5:43 pm

helf
i’ve always actually liked vms.. I use the deathrow cluster some. I really need to spend more time with it..

2006-01-10 6:13 pm

Fred
People…OpenVMS clustering is just that good. These clusters get used to get real work done, not run seti@home to keep the CPUs busy (though some might do that in the off hours). Don’t even think you can reach the same levels of garuanteed availability with “a linux box and a UPS”.

Read:

http://www.openvms.org/stories.php?story=03/11/28/7758863

http://uptimes.hostingwired.com/stats.php?op=all&os=OpenVMSCluster

http://www.openvms.org/stories.php?story=03/11/21/6963920

Or in general something about OpenVMS (clustering) before even attemting to argue that you can do better.

2006-01-10 7:55 pm

vegburner
I had 24/7 availability with off the shelf, tested x86 parts, to run the dorm network while the AlphaServer (1100, if I remember correctly) with VMS was down time and again, because the admin was probably too lazy to do “normal” admin things.

Sorry, my experience is that a good Linux / BSD install is at least as good as VMS for clustering. There must be a reason why recent animated movies are rendered on Linux/x86. If the Alpha was less expensive, and still alive, the situation would change, but right now, an OpenVMS cluster is only a way of saying “I have too much money”.
2006-01-10 8:05 pm

morbidenator
The one you submitted statistics on was the cluster version. Take a look at these for a more fair comparison:

http://uptimes.hostingwired.com/stats.php?op=all&os=OpenVMS

http://uptimes.hostingwired.com/stats.php?op=all&os=Linux

2006-01-10 10:19 pm

Fred
If I’m not mistaken, it was a cluster reaching the reported uptime. A single node openvms has about a much chance of going down as a single node bsd/linux/whatever, even *with* UPS. My original…rant was more about that someone seems to be under the mistaken impression that a single node linux box behind an UPS is somehow a substitute for a cluster with garuanteed availability.

And still, openvms custering is still that good The first link I posted is a pretty good example of how to implement and use openvms clustering. Compared to that, any HA hack implemented in linux pales in comparison. The uptime stuff…bleh, that’s just an ego boosting dicklength contest anyway.

Anyway, regarding the some of other comments:

HA != Computing clusters (the animation shop argument), plus that rendering stuff is a pretty specific sport at which I think RISC CPUs aren’t the best choice anyway…not to mention expensive. Oh, and OpenVMS didn’t die with Alpha, it runs fine on Itanic^Hum too.

And in agreement to Rev.Tig, indeed, it is sad that throwing more redundancy into the mix is nowadays considered a substitute for proper system design, implementation and administration. I guess it’s the throwaway culture we live in which dictates that cluster nodes have an economic lifetime measured in months instead of in decades.

2006-01-10 8:27 pm

Rev.Tig
There is a lot to be said for kit that runs this long and is still kicking, I do think that the cut-throat nature of the x86 market has unduely hampered modern “server class” products, they just don’t have the same built quality anymore (cue flames) On the other hand, gone are the day when five nines uptime is only possible for those with huge budgets, there are lots of open source projects at the moment that are working towards this.. CARP (fantastic), MySQL Cluster( well it is working ish), the Linux heartbeat HA projects etc etc etc… as soon as you can distribute your service in software and make seamless failovers then you are no longer at the whim of failing hardware, which is where we (TINW and I am speaking in the third person and only in financially preferable terms) use RAID…. A Redundant Array of Inexpensive Dells

In my humble experience with modern software it will bugger up before the hardware does. So if you are saving on software buy more hardware

As for build qualility, after two years you recycle the production machines to less critical tasks (you have built in for hardware failure anyway when you built in for software failure , besides technology has moved on and you will be able to get hardware with twice the grunt for the same price this year…

Of course this is gross simplification as we have not gone into other factors such as power, connectivity, natural disasters, building your datacentre next to fuel depot that subsequently blows up… ( you know who you are… )

So good luck to the WVNET for the next 10 years, the sad part is that it will probably be replaced for political reasons rather than it’s ability to do the job

Apologies for dull ramble…