Linked by Thom Holwerda on Tue 17th Sep 2013 22:04 UTC, submitted by garyd
General Development

ZFS is the world's most advanced filesystem, in active development for over a decade. Recent development has continued in the open, and OpenZFS is the new formal name for this open community of developers, users, and companies improving, using, and building on ZFS. Founded by members of the Linux, FreeBSD, Mac OS X, and illumos communities, including Matt Ahrens, one of the two original authors of ZFS, the OpenZFS community brings together over a hundred software developers from these platforms.

ZFS plays a major role in Solaris, of course, but beyond that, has it found other major homes? In fact, now that we're at it, how is Solaris doing anyway?

Thread beginning with comment 572698
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[5]: Solaris is doing well
by Alfman on Sat 21st Sep 2013 16:00 UTC in reply to "RE[4]: Solaris is doing well"
Alfman
Member since:
2011-01-28

Kebabbert,

"You mean to say x86 does not scale beyond 8 sockets, not 8 cores. Sure, there are no larger SMP servers than 8 sockets x86 today, and has never been."

Actually I meant this in the context of SMP versus NUMA. You said "All 32 socket Unix servers share some NUMA features, but they have very good RAM latency, so you treat them all as a true SMP server". I'd really like to know the difference between x86 NUMA and "Unix server true SMP", since as far as I know SMP requires NUMA in order to scale efficiently above 4-8 cores without very high memory contention. Saying that Solaris servers are different sounds an awful lot like marketing speak, but maybe I'm wrong. Can you point out a tangible technical difference?

"There are benchmarks with Linux and Solaris on same or similar x86 hardware. On few cpus, Linux tends to win. On larger configurations, Solaris wins."

"Linux vs Solaris on same hardware:"


I thank you for looking these up. I really wish they were using *identical* hardware and only switching a single variable between tests (instead of switching the OS AND the hardware vendor).

Still, the benchmarks are interesting.

This shows a glaring scalability problem with RHL. We're left to infer that RHL has a scalability problem compared to the Solaris chart on the same page.
http://blogs.oracle.com/jimlaurent/resource/HPDL980Chart.jpg

However another chart on a different blog post (on different hardware) doesn't show the scalability problem under RHL.
http://blogs.oracle.com/jimlaurent/resource/HPML350Chart.jpg

So was the problem with Red Hat Linux, was it the hardware, OS, software, the number of cores? We really don't know. Surely any employee worth his salt would have performed the benchmarks in an apples to apples hardware/software configuration, why weren't those results posted?

As before, I'm not asserting that Solaris isn't better, it very well may be, but it would be naive to trust Oracle sources at face value.


"Consider a massive non-clustered database. In this situation, there will be some kind of central coordinator for locking and table access, and such, plus a vast number of I/O operations to storage, and a vast number of hits against common memory."

I'd think this design is suboptimal for scalability. A scalable design would NOT have a single central coordinator, there should be many (ie one per table or shard) running in parallel even though it's not clustered. To be optimal on NUMA software should be specifically coded to use it, however you are probably right that vendors are treating it as clustered nodes instead. They haven't gotten around to rewriting the database engines to take advantage of NUMA specifically.



Can you disclose whether you are connected to oracle?

Edited 2013-09-21 16:06 UTC

Reply Parent Score: 2

RE[6]: Solaris is doing well
by Kebabbert on Sun 22nd Sep 2013 14:20 in reply to "RE[5]: Solaris is doing well"
Kebabbert Member since:
2007-07-27

Actually I meant this in the context of SMP versus NUMA. You said "All 32 socket Unix servers share some NUMA features, but they have very good RAM latency, so you treat them all as a true SMP server". I'd really like to know the difference between x86 NUMA and "Unix server true SMP", since as far as I know SMP requires NUMA in order to scale efficiently above 4-8 cores without very high memory contention. Saying that Solaris servers are different sounds an awful lot like marketing speak, but maybe I'm wrong. Can you point out a tangible technical difference?

Here is some information on these "different Solaris servers". I mean they are different, because they are well built, minimizing memory latency. Look at the last picture at the bottom:
http://www.theregister.co.uk/2012/09/04/oracle_sparc_t5_processor/

"...Turullols said you need one hop between sockets to scale. It usually takes two hops to get to eight-way NUMA given current designs, so this is where that near linear scalability is coming from...."

You see that this SPARC T5 8-socket server is connected to every other cpu, via 28 lanes. And a cpu can reach any memory cell in at most one jump - which means latency is very low. This is the reason this 8-socket server scales linearly. There are many 8-socket servers where you only get 5 cpus out of 8, or so. They scale bad with many hops.

Now imagine the Oracle M6 server with 96 cpus connected to each other, there would be 4560 lanes. That is too much and messy. So how to build a 96 socket server that scales well? Look at the bottom picture on the coming M6 server:
http://www.theregister.co.uk/2013/08/28/oracle_sparc_m6_bixby_inter...

Bronek:"...If you look at the picture carefully you will find that all CPUs can connect to others directly (7 cores), via single BX (12 cores) or a BX and a CPU (i.e. single hop, remaining 12 cores). This all with 4Tb/s bandwidth to maintain cache coherency across sockets - I think that's some really nice engineering..."

So, this M6 server has all cpus connected to another in only a few hops at worst case. It looks like the latency will be a few 100ns at worst. On the other hand, a HPC cluster have worst case latency of 10.000ns - which makes them only usable for parallel workloads where you dont need to access data far away.

This M6 server is for running huge nonclustered database configurations, all from memory. Oracle is concerned with the SAP Hana memory database, and this is Oracle's answer: a huge SMP-like server capable of running everything from RAM. So SAPs Hana RAM database is not a threat to Oracle's database. Thinks Larry Ellison.

This M6 server is very intricate built as we all can see. The are no other vendor building large database SMP servers with sockets more than 32, than Oracle and Fujitsu (the new 64-socket SPARC64 server M4-10s). As far as I know. HP has a 64 socket server, but it is old and not updated. I dont know if it is sold longer.

Anyway, you will not see a Linux NUMA cluster server running non clustered databases.




I thank you for looking these up. I really wish they were using *identical* hardware and only switching a single variable between tests (instead of switching the OS AND the hardware vendor).

Here are the 8-socket Solaris vs Linux SAP benchmarks I talked of. They use very similar hardware, opteron cpus of almost the same model, but Linux uses higher clocked. Linux has 128GB RAM and Solaris 256GB, because the Linux HP benchmarking team wanted to use faster RAM memory sticks, so they had to use 128GB RAM. Solaris uses slower memory sticks.
download.sap.com/download.epd?context=B1FEF26EB0CC34664FC7E80B933FCCAC 80DD88CBFAF48C8D126FB65D80D09E988311DE75E0922A14

download.sap.com/download.epd?context=40E2D9D5E00EEF7CCDB0588464276DE2 F0B2EC7F6C1CB666ECFCA652F4AD1B4C




This shows a glaring scalability problem with RHL. We're left to infer that RHL has a scalability problem compared to the Solaris chart on the same page.
http://blogs.oracle.com/jimlaurent/resource/HPDL980Chart.jpg

However another chart on a different blog post (on different hardware) doesn't show the scalability problem under RHL.
http://blogs.oracle.com/jimlaurent/resource/HPML350Chart.jpg

There are not the same scalability problem, but it has other problems. The Linux graph is very stuttering and not smooth. Linux struggles with the workload, and is very stuttery. Solaris is not.



As before, I'm not asserting that Solaris isn't better, it very well may be, but it would be naive to trust Oracle sources at face value.

Linux has never been tested on larger servers than 8-sockets, so I would be very surprised if Linux could scale well. But yes, I agree you need to be careful with Oracle marketing, too. I prefer independent benchmarks. If they dont exist, we can do nothing. But still, the Oracle benchmarks shows huge performance advantages to any other cpu or OS. I expect Oracle could tweak benchmarks slightly, but not completely? It should not be possible to make a lousy cpu look great? Or?



Can you disclose whether you are connected to oracle?

Sure. I am not connected to Oracle in any way. I work in finance, not IT. I just happen to be a geek liking the best tech out there. I admire good tech. I like the IBM POWER7 when it was released because it was the best, back then, better than SPARC. And I said so, in posts, yes. I acknowledged the superiority of POWER7, back then. I am also a fan of Plan9. In my opinion it might be the most innovative OS of them all. I prefer GO, to Java. etc. I just like the best tech. It does not really matter who is doing it. OpenBSD for security. Solaris for being the most innovative Unix. SPARC for the fastest cpu. ZFS for the safest fileystem. etc. If BTRFS would be better than ZFS, I would switch and dont look back. I am pragmatic, prefer the best tech.

But I dont like lies and FUD. To that I react and I want to dispel FUD.
-IBM Mainframes have very weak cpus, they are not strong. No matter what IBM says.

-Linux scales quite bad. No matter what Linus Torvalds say.

-Linux code quality is non optimal. Which Torvalds and other kernel devs agrees on. Here is what Con Kolivas, the famous Linux kernel developer, says when he compares source code quality of Solaris to Linux:
http://ck-hack.blogspot.se/2010/10/other-schedulers-illumos.html

http://www.forbes.com/2005/06/16/linux-bsd-unix-cz_dl_0616theo.html

http://www.theregister.co.uk/2009/09/22/linus_torvalds_linux_bloate...

Reply Parent Score: 2

RE[7]: Solaris is doing well
by Alfman on Sun 22nd Sep 2013 17:08 in reply to "RE[6]: Solaris is doing well"
Alfman Member since:
2011-01-28

Kebabbert,


"Linux has never been tested on larger servers than 8-sockets, so I would be very surprised if Linux could scale well."

"But I dont like lies and FUD. To that I react and I want to dispel FUD."

"-IBM Mainframes have very weak cpus, they are not strong. No matter what IBM says."


So you admit that you've never seen the data? Neither have I. This is the problem; everything you or I say is mere speculation.

Surely it must have been tested by oracle on an apples to apples comparison, but we just aren't allowed to see the results they don't approve of. Off the top of my head VMWare does the same thing and it's annoying as hell that their marketing heads say one thing and third parties aren't even allowed to publish contradictory evidence.

I use&like Oracle's database products, they're top notch, but censoring benchmarks sure is fishy. You are clearly taking everything oracle says at face value, and provided that I were to take everything they said at face value, then you are right: linux scales poorly.

However I frankly wouldn't put it above them to employ the same marketing FUD and bias that you are accusing their competitors of doing. Understand that I'm just trying to explain why I'm skeptical, not trying to persuade you that you are wrong. Like you, I just don't have the data that would settle the question in factual terms.

"-Linux scales quite bad. No matter what Linus Torvalds say."

"-Linux code quality is non optimal. Which Torvalds and other kernel devs agrees on."

I actually agree with you, the code quality isn't great and IMHO the kernel abstractions are poor. Linus refuses to have stable API/ABIs and consequently individual pieces of code are rarely stable even if they don't have bugs in them. I have many gripes with linux and am not a blind fanboy, however none of this actually speaks against linux scalability on SMP.

In order to form my opinion on the matter, I'd want more impartial data, and ideally some evidence that the system is correctly configured.

Reply Parent Score: 2