Linked by Thom Holwerda on Tue 17th Sep 2013 22:04 UTC, submitted by garyd
General Development

ZFS is the world's most advanced filesystem, in active development for over a decade. Recent development has continued in the open, and OpenZFS is the new formal name for this open community of developers, users, and companies improving, using, and building on ZFS. Founded by members of the Linux, FreeBSD, Mac OS X, and illumos communities, including Matt Ahrens, one of the two original authors of ZFS, the OpenZFS community brings together over a hundred software developers from these platforms.

ZFS plays a major role in Solaris, of course, but beyond that, has it found other major homes? In fact, now that we're at it, how is Solaris doing anyway?

Thread beginning with comment 572668
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[4]: Solaris is doing well
by Kebabbert on Sat 21st Sep 2013 11:05 UTC in reply to "RE[3]: Solaris is doing well"
Kebabbert
Member since:
2007-07-27

Do you know if it's *really* a linux problem instead of an x86 SMP scalability problem? I honestly don't think x86 can scale efficiently beyond 8 cores under any OS.

You mean to say x86 does not scale beyond 8 sockets, not 8 cores. Sure, there are no larger SMP servers than 8 sockets x86 today, and has never been. SGI UV1000 server, is actually a NUMA cluster with 100s of sockets x86. But it is HPC cluster, so it is ruled out from this discussion because we discussing SMP servers, not HPC clusters.

Regarding scalability of Linux. If you look at SAP benchmarks using 2.8GHz Opteron cpus and fast RAM sticks, Linux has 87% cpu utilization on 8-socket server. That is quite bad cpu utilization. Solaris on same opteron cpus but slower at 2.6GHz and slower RAM sticks, gets 99% cpu utilization and beats Linux on SAP benchmarks. Solaris is using slower hardware, and beats Linux. 8-sockets is the maximum what Linux has been tested for, and Linux does not handle 8-sockets well. Linux Big Tux with 64 sockets, had 40% cpu utilization shows experimental benchmarks from HP, so HP could not sell Big Tux. So with 8 sockets, Linux had 87% cpu utilization, at 64 sockets Linux had ~40%. I guess at 16 sockets Linux will have cpu utilization at 70%, and rapidly fall off. Because Linux has not been tested nor optimized to handle 16 sockets - how could Linux scale 16 sockets well, the hardware does not exist!


My understanding is that linux will run on the same Sparc architectures that Solaris does:

Yes, it will. But how well? HP tried Linux for their 64 socket server, with awful results. I believe Linux will stutter and be very bad at 96 socket SPARC servers.


Do you have a benchmark of an apples to apples comparison between solaris and linux on the same processors (ignoring that such processors are not being sold with linux)?

There are benchmarks with Linux and Solaris on same or similar x86 hardware. On few cpus, Linux tends to win. On larger configurations, Solaris wins. But that is expected, because all Linux kernel devs sit with 1-2 socket PCs at home. Not many Linux devs has access to 8 socket servers. Linux vs Solaris on same hardware:
https://blogs.oracle.com/jimlaurent/entry/solaris_11_outperforms_rhe...

https://blogs.oracle.com/jimlaurent/entry/solaris_11_provides_smooth...



Mind you solaris *could* be better than linux for high end deployments.

There does not exist high end Linux deployments, and has never had. So for high end deployments, you have no other choice than go to Unix servers with 32 sockets or larger, from IBM, Oracle or HP. So I would be very very surprised if Solaris not was better than Linux. Unix kernel devs have for decades tested and tailored the kernel for 32 sockets and above - of course Unix must handle large servers better?


I'm genuinely curious about it, and if you have any evidence (benchmarks & case studies) that would be very informative to me.

People routinely runs Unix on large 32 sockets, or larger, for decades. So Unix should be comfortable running large servers without effort, I suspect.


For that matter, I'm very curious about the scalability of 64 core shared memory systems in general regardless of OS. Correct me if I'm wrong, but it seems to me that it would scale badly unless it were NUMA (or it had so much cache that it could effectively be used as NUMA).

The canonical example of a large SMP workload, is running databases in large configurations. As a kernel developer explains and talks about NUMA, SMP, HPC, etc:
http://gdamore.blogspot.se/2010/02/scalability-fud.html
"....First, one must consider the typical environment and problems that are dealt with in the HPC arena. In HPC (High Performance Computing), scientific problems are considered that are usually fully compute bound. That is to say, they spend a huge majority of their time in "user" and only a minuscule tiny amount of time in "sys". I'd expect to find very, very few calls to inter-thread synchronization (like mutex locking) in such applications...

...Consider a massive non-clustered database. (Note that these days many databases are designed for clustered operation.) In this situation, there will be some kind of central coordinator for locking and table access, and such, plus a vast number of I/O operations to storage, and a vast number of hits against common memory. These kinds of systems spend a lot more time doing work in the operating system kernel. This situation is going to exercise the kernel a lot more fully, and give a much truer picture of "kernel scalability"

Reply Parent Score: 2

RE[5]: Solaris is doing well
by Alfman on Sat 21st Sep 2013 16:00 in reply to "RE[4]: Solaris is doing well"
Alfman Member since:
2011-01-28

Kebabbert,

"You mean to say x86 does not scale beyond 8 sockets, not 8 cores. Sure, there are no larger SMP servers than 8 sockets x86 today, and has never been."

Actually I meant this in the context of SMP versus NUMA. You said "All 32 socket Unix servers share some NUMA features, but they have very good RAM latency, so you treat them all as a true SMP server". I'd really like to know the difference between x86 NUMA and "Unix server true SMP", since as far as I know SMP requires NUMA in order to scale efficiently above 4-8 cores without very high memory contention. Saying that Solaris servers are different sounds an awful lot like marketing speak, but maybe I'm wrong. Can you point out a tangible technical difference?

"There are benchmarks with Linux and Solaris on same or similar x86 hardware. On few cpus, Linux tends to win. On larger configurations, Solaris wins."

"Linux vs Solaris on same hardware:"


I thank you for looking these up. I really wish they were using *identical* hardware and only switching a single variable between tests (instead of switching the OS AND the hardware vendor).

Still, the benchmarks are interesting.

This shows a glaring scalability problem with RHL. We're left to infer that RHL has a scalability problem compared to the Solaris chart on the same page.
http://blogs.oracle.com/jimlaurent/resource/HPDL980Chart.jpg

However another chart on a different blog post (on different hardware) doesn't show the scalability problem under RHL.
http://blogs.oracle.com/jimlaurent/resource/HPML350Chart.jpg

So was the problem with Red Hat Linux, was it the hardware, OS, software, the number of cores? We really don't know. Surely any employee worth his salt would have performed the benchmarks in an apples to apples hardware/software configuration, why weren't those results posted?

As before, I'm not asserting that Solaris isn't better, it very well may be, but it would be naive to trust Oracle sources at face value.


"Consider a massive non-clustered database. In this situation, there will be some kind of central coordinator for locking and table access, and such, plus a vast number of I/O operations to storage, and a vast number of hits against common memory."

I'd think this design is suboptimal for scalability. A scalable design would NOT have a single central coordinator, there should be many (ie one per table or shard) running in parallel even though it's not clustered. To be optimal on NUMA software should be specifically coded to use it, however you are probably right that vendors are treating it as clustered nodes instead. They haven't gotten around to rewriting the database engines to take advantage of NUMA specifically.



Can you disclose whether you are connected to oracle?

Edited 2013-09-21 16:06 UTC

Reply Parent Score: 2

RE[6]: Solaris is doing well
by Kebabbert on Sun 22nd Sep 2013 14:20 in reply to "RE[5]: Solaris is doing well"
Kebabbert Member since:
2007-07-27

Actually I meant this in the context of SMP versus NUMA. You said "All 32 socket Unix servers share some NUMA features, but they have very good RAM latency, so you treat them all as a true SMP server". I'd really like to know the difference between x86 NUMA and "Unix server true SMP", since as far as I know SMP requires NUMA in order to scale efficiently above 4-8 cores without very high memory contention. Saying that Solaris servers are different sounds an awful lot like marketing speak, but maybe I'm wrong. Can you point out a tangible technical difference?

Here is some information on these "different Solaris servers". I mean they are different, because they are well built, minimizing memory latency. Look at the last picture at the bottom:
http://www.theregister.co.uk/2012/09/04/oracle_sparc_t5_processor/

"...Turullols said you need one hop between sockets to scale. It usually takes two hops to get to eight-way NUMA given current designs, so this is where that near linear scalability is coming from...."

You see that this SPARC T5 8-socket server is connected to every other cpu, via 28 lanes. And a cpu can reach any memory cell in at most one jump - which means latency is very low. This is the reason this 8-socket server scales linearly. There are many 8-socket servers where you only get 5 cpus out of 8, or so. They scale bad with many hops.

Now imagine the Oracle M6 server with 96 cpus connected to each other, there would be 4560 lanes. That is too much and messy. So how to build a 96 socket server that scales well? Look at the bottom picture on the coming M6 server:
http://www.theregister.co.uk/2013/08/28/oracle_sparc_m6_bixby_inter...

Bronek:"...If you look at the picture carefully you will find that all CPUs can connect to others directly (7 cores), via single BX (12 cores) or a BX and a CPU (i.e. single hop, remaining 12 cores). This all with 4Tb/s bandwidth to maintain cache coherency across sockets - I think that's some really nice engineering..."

So, this M6 server has all cpus connected to another in only a few hops at worst case. It looks like the latency will be a few 100ns at worst. On the other hand, a HPC cluster have worst case latency of 10.000ns - which makes them only usable for parallel workloads where you dont need to access data far away.

This M6 server is for running huge nonclustered database configurations, all from memory. Oracle is concerned with the SAP Hana memory database, and this is Oracle's answer: a huge SMP-like server capable of running everything from RAM. So SAPs Hana RAM database is not a threat to Oracle's database. Thinks Larry Ellison.

This M6 server is very intricate built as we all can see. The are no other vendor building large database SMP servers with sockets more than 32, than Oracle and Fujitsu (the new 64-socket SPARC64 server M4-10s). As far as I know. HP has a 64 socket server, but it is old and not updated. I dont know if it is sold longer.

Anyway, you will not see a Linux NUMA cluster server running non clustered databases.




I thank you for looking these up. I really wish they were using *identical* hardware and only switching a single variable between tests (instead of switching the OS AND the hardware vendor).

Here are the 8-socket Solaris vs Linux SAP benchmarks I talked of. They use very similar hardware, opteron cpus of almost the same model, but Linux uses higher clocked. Linux has 128GB RAM and Solaris 256GB, because the Linux HP benchmarking team wanted to use faster RAM memory sticks, so they had to use 128GB RAM. Solaris uses slower memory sticks.
download.sap.com/download.epd?context=B1FEF26EB0CC34664FC7E80B933FCCAC 80DD88CBFAF48C8D126FB65D80D09E988311DE75E0922A14

download.sap.com/download.epd?context=40E2D9D5E00EEF7CCDB0588464276DE2 F0B2EC7F6C1CB666ECFCA652F4AD1B4C




This shows a glaring scalability problem with RHL. We're left to infer that RHL has a scalability problem compared to the Solaris chart on the same page.
http://blogs.oracle.com/jimlaurent/resource/HPDL980Chart.jpg

However another chart on a different blog post (on different hardware) doesn't show the scalability problem under RHL.
http://blogs.oracle.com/jimlaurent/resource/HPML350Chart.jpg

There are not the same scalability problem, but it has other problems. The Linux graph is very stuttering and not smooth. Linux struggles with the workload, and is very stuttery. Solaris is not.



As before, I'm not asserting that Solaris isn't better, it very well may be, but it would be naive to trust Oracle sources at face value.

Linux has never been tested on larger servers than 8-sockets, so I would be very surprised if Linux could scale well. But yes, I agree you need to be careful with Oracle marketing, too. I prefer independent benchmarks. If they dont exist, we can do nothing. But still, the Oracle benchmarks shows huge performance advantages to any other cpu or OS. I expect Oracle could tweak benchmarks slightly, but not completely? It should not be possible to make a lousy cpu look great? Or?



Can you disclose whether you are connected to oracle?

Sure. I am not connected to Oracle in any way. I work in finance, not IT. I just happen to be a geek liking the best tech out there. I admire good tech. I like the IBM POWER7 when it was released because it was the best, back then, better than SPARC. And I said so, in posts, yes. I acknowledged the superiority of POWER7, back then. I am also a fan of Plan9. In my opinion it might be the most innovative OS of them all. I prefer GO, to Java. etc. I just like the best tech. It does not really matter who is doing it. OpenBSD for security. Solaris for being the most innovative Unix. SPARC for the fastest cpu. ZFS for the safest fileystem. etc. If BTRFS would be better than ZFS, I would switch and dont look back. I am pragmatic, prefer the best tech.

But I dont like lies and FUD. To that I react and I want to dispel FUD.
-IBM Mainframes have very weak cpus, they are not strong. No matter what IBM says.

-Linux scales quite bad. No matter what Linus Torvalds say.

-Linux code quality is non optimal. Which Torvalds and other kernel devs agrees on. Here is what Con Kolivas, the famous Linux kernel developer, says when he compares source code quality of Solaris to Linux:
http://ck-hack.blogspot.se/2010/10/other-schedulers-illumos.html

http://www.forbes.com/2005/06/16/linux-bsd-unix-cz_dl_0616theo.html

http://www.theregister.co.uk/2009/09/22/linus_torvalds_linux_bloate...

Reply Parent Score: 2