G5 vs x86 Performance: The real culprit…

Guest post by Micah Bartell 2005-06-03 Benchmarks 99 Comments

Anandtech did an excellent job benchmarking some of the latest CPU’s from IBM, Intel, and AMD. This is primarily aimed at server performance, but some interesting details surrounding Mac OS X surfaced.

99 Comments

2005-06-03 8:58 pm
Anonymous
Pity OS X is slow for server applications. I mean really slow.
2005-06-03 8:59 pm
Anonymous
The processors seemed pretty much on par with each other, each having strengths. I’m curious how the benchmarks would have looked if they had used the best compliers possible for each architecture, instead of GCC 3.3.
It would also have been nice if they had run the server tests using YDL on the PowerMac. That would have given some data points that did not involve the lousy OS X thread handling.
I would not however say this is restricted only to the server applications, I’ve certainly seen OS X’s lousy thread handling affect Firefox with alot of tabs, especially when the thread count exceeds 15-20.
– Kelson
2005-06-03 9:05 pm
Anonymous
There are at least a few factually incorrect statements in the article.
They disqualify Xserve systems as viable servers without comparing the same operating system on all machines.
They even point out the fact that when you clock the machines equally and the software is optimized properly it runs faster on the G5.
I have mixed feelings about this article…
2005-06-03 9:05 pm
Anonymous
I liked the “Apple’s G5 versus x86, Mac OS X versus Linux” review,
I thought it was well done and had lots of good information and didn’t
seem to have a bias one way or the other. I felt evaluation process was fair. Would like to see performance benchmarks for Sybase and Oracle on OS X and Linux 2.4 and 2.6 kernels in the future when. I know some folks would probably like to see how OS X does with IBMs compliers XL C/C++ Advanced Edition (6.0) for OS X and XL C/C++ Advanced Edition (7.0) for Linux versus Intel’s compliers.
This is interesting and useful information since I’ll be deploying OS X server with ARB over the summer for distributing software, file sharing, hosting a basic web sever and if I have time experiment with Xgrid. Guess services like MySQL are better served on the Linux platform.
2005-06-03 9:06 pm
Anonymous
Flops.c G5 beat the Xeon (can’t see the opteron)
Queens G5 ties Xeon, both lose to opteron
Povray G5 beats Xeon, both lose to opteron
Lightwave raytrace All 3 about tied
Lightwave radiosity G5 loses
NOTE: Anandtech says “Aftereffects and Final cut optimized for G5, we did not use. Lightwave *meticulously* optimized for SSE-2, we did use it.”
Cinema 4D G5 ties Opteron, both beat Xeon
So, this seems to show the G5 beating the Xeon in overall performance review, and at least comparable with the Opteron.
2005-06-03 9:06 pm
Anonymous
I would say maybe the Opteron was the best X86 chip.
2005-06-03 9:16 pm
Anonymous
Unfortunately, it was really a GNU/Linux on x86/x86-64 vs. MacOS X on PPC comparison.
I’m rather surprised the author wrote “I am no operating system expert, but with the data that we have today, I think that a PowerPC optimised Linux such as Yellow Dog is a better idea for the Xserve than Mac OS X server” when he didn’t actually test such. GNU/Linux distros and NetBSD and FreeBSD may or may not run better (as server OSes) on an Xserve than OS X Server does, but that statement is not backed up by the actual benchmarks presented.
2005-06-03 9:23 pm
Anonymous
Shawn: What was factually incorrect? The points you made weren’t factually incorrect statements.
There are two ways to deal with a non-flattering review (as this was for Mac OS X as far as server threading performance goes). You can try to discredit the article, attacking minute details or methodology issues that are largely irrelevant, or you (or rather Apple) can acknowledge it, and go back to the drawing board and try to figure out a solution.
It seems like Apple has a serious performance problem, one worthy of further investigation. Of course, it wouldn’t affect an application like Apache 1.3 or PostgreSQL since they aren’t thread dependent, so it’s not entirely *dire* per say, but it is serious.
2005-06-03 9:38 pm
Anonymous
@TonyB
There are two ways to deal with a non-flattering review (as this was for Mac OS X as far as server threading performance goes). You can try to discredit the article, attacking minute details or methodology issues that are largely irrelevant, or you (or rather Apple) can acknowledge it, and go back to the drawing board and try to figure out a solution.
I don’t own one single piece of Apple Equipment software, hardware, or otherwise, not even an iPod. So, I’m not exactly a fanboy. I just feel it would have been more fair to compare the same applications on the same “basic” operating system on all the machines, instead of a mix and match.
I also think that pronouncing doom and gloom because a specific application performs poorly is rather poor taste. Some applications work better for certain access patterns. There are many threaded applications that work fine on OS X. Obviously MySQL is doing something wrong. That doesn’t mean that OS X’s threading performance can’t be improved, performance can almost always be improved on a system.
Taking one application and then pronouncing that all applications that are threaded must logically run slow because of an obvious threading performance issue seems a fallacy at best. A fairer comparison would have been an in-depth pragmatic analysis of custom threading software or software specifically designed to analyze threading performance of various OS’s.
I also don’t see the point of comparing a 2.5 or 2.7 ghz processor based system to a 3.06 or higher ghz system. It’s like “duh” of course a faster clocked processor is going to be faster at some things and slower at others. I’ll be the first to say that I never believed Apple’s claims of being faster than a 3 ghz.
In the end, you’re free to believe whatever you want to, and I’m free to believe what I want to. That doesn’t make either of us right or wrong. It just makes us people with a opinion.
2005-06-03 9:42 pm
Anonymous
Ease of use is where it’s at, and Linux doesn’t have any of that.
2005-06-03 9:45 pm
Anonymous
The conclude at the end of the article that Yellow Dog linux would be better on an XServer than mac OS X server. Note that they never benchmarked an xserver anywhere in the article.
2005-06-03 9:46 pm
Anonymous
@TonyB
Another problem I had is the use of Apache 1.3 instead of Apache 2. There are a lot of production sites that are starting to use Apache 2. My company is because of the dramatically increased performance for what we’re using it for.
Additionally, the article stated itself that the version of GCC they used was poor at vectorizing critical areas of the software they compiled.
The article also espouses several views of the internal kernel structure of OS X, views I would question the accuracy of. I would only be happy if they were direct quotes from an Apple engineer instead of pure speculation or educated guesses.
I would also question their claims that it’s unsuitable for server applications, since the xserve raid boxes that Apple sells have been noted for very good performance, price, and reliability. They have been very popular in environments demanding high levels of performance, such as video capture server farms, virtual hosting, etc.
As I said, I have mixed feelings about this article. I never said it was a complete piece of trash or that it was horrible (because it’s not) or that all of the information was inaccurate (because most of it is). I just don’t think it’s as balanced or reasonable as it could be.
2005-06-03 9:48 pm
Anonymous
@[city]
The conclude at the end of the article that Yellow Dog linux would be better on an XServer than mac OS X server.
Yet they never benchmarked it in this “oh so thorough” article. I just love wild speculation…
Note that they never benchmarked an xserver anywhere in the article.
Which is also interesting to me.
2005-06-03 9:51 pm
Anonymous
The G5 did fairly well (comparable to Xeon in most cases) in the workstation type of benchmarks.
It was Apache & MySQL where it was *more* than 10 times slower.
Can’t be the hardware that is at fault, because all 3 systems are kind of close (cpu powerwise).
So, it looks like OSX is not made for running servers.
I was disappointed that Linux was not installed on the G5 for comparison.
Anandtech was able to put Linux 2.4 or 2.6 on the Opteron & Xeon. Why was it so tough to do that on the G5? Or aren’t these benchmarking apps available for G5 in Linux?
2005-06-03 9:56 pm
Anonymous
I wonder if they will actually try a PPC based linux on that dual G5 system or not? I’m very curious about those numbers.
This review is far more useful comparing Xeon vs Opteron running Linux than about looking at OS-X.
Mach itself is an ancient piece of OS code, it tries to be a microkernel but just has too much baggage it carries along with.
It is unfortunate Apple didn’t buy out BeOS.
2005-06-03 9:58 pm
Anonymous
And I assume it is, since Mach is known for having a dreadful threading (L4 and others have not), then apple has a serious problem on their hands, which explains a lot of problem areas in the UI area as well.
Given the fact that UIs rely a lot on threading, I am not sure if Apple is doctoring on the right end side of things, with Quartz2dXtreme.
The main problem I see, is that with a better threading probably problem areas like resizing can be improved tremendously (the speed hit by the resizing operations is caused by the redraws, which themselves probably are not linear but thread based.
A typical example of a huge general performance boost by better threading is Linux 2.6 compared to 2.4 the better threading system and the lower latency times which came with the new scheduler increased speeds 2-3 times what they used to be.
Apple probably really should look into the issue, but I am not sure if the problem is fixable at all without dumping mach in favor of something else, or breaking the apis significantly to move threading out of mach!
2005-06-03 9:59 pm
Anonymous
Yes, but you can make some inferences about the Xserve based on the PowerMac. The processors are about 200mhz slower, and they use ECC memory, which is a bit slower. Other than that, it’s pretty much the same. There is not a significant architectural detail that is going to make OS X or Linux behave drastically different than on a PowerMac.
The issue with OS X that they found to be the culprit would exist in the same form and fashion on an Xserve or PowerMac with the same impact.
Shawn:
You will note at the beginning of the article, they give special thanks to 3-4 engineers from Apple Europe. There is a specific issue w/ OS X and it’s ability to handle threading well. This is also documented by Jon Siracusa from Ars.
– Kelson
2005-06-03 10:02 pm
Anonymous
The server performance of the Apple platform is, however, catastrophic. When we asked Apple for a reaction, they told us that some database vendors, Sybase and Oracle, have found a way around the threading problems. We’ll try Sybase later, but frankly, we are very sceptical. The whole “multi-threaded Mach microkernel trapped inside a monolithic FreeBSD cocoon with several threading wrappers and coarse-grained threading access to the kernel”, with a “backwards compatibility” millstone around its neck sounds like a bad fusion recipe for performance.
Workstation apps will hardly mind, but the performance of server applications depends greatly on the threading, signalling and locking engine. I am no operating system expert, but with the data that we have today, I think that a PowerPC optimised Linux such as Yellow Dog is a better idea for the Xserve than Mac OS X server.
———————————————————-
I guess he says it all, need I say anything? Ok, I will:
1) Is he benching hardware?
2) Is he benching OS
He could have benched OS X on the G5 and then benched it with Linux and Darwin and used the same GCC. Now if anybody tells me that I should preform the benchmark, then I will be gracious enough to accept your new G5 donation.
P.O. Box pending your response. 😉
PS: I am not a MAC user. And yes, there are millions of ways to do benchmarks and in any case, no one is every satisfied. Yes, I know this but I still feel the need to complain. 🙂
Enjoy
2005-06-03 10:05 pm
Anonymous
Workstation apps mind a lot, graphics apps probably wont, because most of them have to have vector units but not good multitasking.
But java for instance relies heavily on threads, pretty much every gui outside of java also which has to try to push speed in the ui departement, overall the general feeling of a ui can be improved even on Aqua level, if they can push tile painting or font rendering parallelized into threads.
The workaround these problems for ui seems to push as much into altivec as possible or try to push the GPU more (which only works on macs with GPUs) it will help but will not solve the problem ultimately, but I assume the problem is not really solvable, the Mach foundation might be too problematic.
2005-06-03 10:06 pm
Anonymous
As others have said, it would be interesting to see how Linux/PPC fares in the comparison.
2005-06-03 10:17 pm
Anonymous
I have to say I’m rather impressed with AMD. I remember back in the day, AMD had FPU performance so bad it was ridiculous. Now, they’re the FPU leader for mainstream CPUs. Sounds like hiring those ex-DEC Alpha guys paid off
One thing Anand didn’t pick up on, though. The G5 has about half the transistors of an Opteron, due mostly to the cache. One wonders how it would perform if IBM hadn’t been so damn slow to improve it since its release. Back when the G5 was released, it was common for a desktop CPU to have 512K of cache. These days, even fairly cheap Athlon64s come with 1MB, and the P4s are coming with 1-2MB. Plus, at 90nm, IBM should easily be able to double the transistor count with no problems. Not to say anything of clockspeed (its interesting that Anand noted the same “Apple overvolting” issue), which has stagnated.
2005-06-03 10:36 pm
Anonymous
Stop complaining about how this is some GNU conspiracy to discredit Apple. They clearly pointed out OSX/Server performance hits were due to Mach kernel design and GCC’s problem with full Altivec optimization. I would like to see Apple fix their kernel to a more modern BSD design and GCC to make a good ppc release.
2005-06-03 10:41 pm
Anonymous
With Linux being the standard unix for the PC, and with OS/X being the standard Unix on a Mac, The comparison is fair.
2005-06-03 10:43 pm
Anonymous
I would have also been interested in seeing YellowDog data for comparison, but I do have to ask how likely is one to find Linux on Macs in real-world professional deployments? Most people who buy Apple hardware do so with the intention of using OSX. I’m not even sure what Apple’s support policy is for hardware with alternative operating systems installed.
2005-06-03 10:46 pm
Anonymous
the comparison is fair enough, i agree, but nonetheless it would be interesting to see how much of a difference the OS actually makes.
2005-06-03 10:46 pm
Anonymous
I disagree. I would have liked to see both OSes run on the same machine. That way you can conclude if it was MySQL/Apache, or the MacOS X threading issue that was the problem. Testing on different systems can’t bring you to that conclusion.
2005-06-03 10:55 pm
Anonymous
5 – Posted on Jun 3, 2005 at 9:01 AM by wessonality
What about installing Yellow Dog Linux on the XServe?
9 – Posted on Jun 3, 2005 at 9:21 AM by JohanAnandtech
Wessonality: Our next project if we can keep the G5 long enough in the labs.
2005-06-03 11:06 pm
Anonymous
I also think that pronouncing doom and gloom because a specific application performs poorly is rather poor taste. Some applications work better for certain access patterns. There are many threaded applications that work fine on OS X. Obviously MySQL is doing something wrong. That doesn’t mean that OS X’s threading performance can’t be improved, performance can almost always be improved on a system.
As for functionality, their tests of MySQL worked “fine” on Mac OS X, it just worked slower (a lot slower) than on other operating systems. Functionality wasn’t the concern, the concern was performance. They also did some micro benchmarks on thread creation if you’ll note, and that also shows as significant slowdown when compared to other operating systems, further giving support to their conclusions.
And given that MySQL does so well in other tests, MySQL isn’t “obviously” doing anything wrong (you take the article to task about its well researched conclusions, but you make conclusions with little support). MySQL depends on the threading library (libpthread) on the various operating systems for thread creation. MySQL is showing a weakness of the Mac OS X threading library. Now there could be work-arounds to add to MySQL to try to bring performance up, but MySQL itself isn’t doing anything wrong.
I wouldn’t characterize what they’re saying about Apple as doom and gloom, but it’s certainly not sunny for the server side. They were very positive about the FP performance and workstation performance.
Taking one application and then pronouncing that all applications that are threaded must logically run slow because of an obvious threading performance issue seems a fallacy at best. A fairer comparison would have been an in-depth pragmatic analysis of custom threading software or software specifically designed to analyze threading performance of various OS’s.
They showed that thread creation was markedly slower, so it’s logical to assume that applications that rely on fast thread creation would run slower on other operating systems.
I also don’t see the point of comparing a 2.5 or 2.7 ghz processor based system to a 3.06 or higher ghz system. It’s like “duh” of course a faster clocked processor is going to be faster at some things and slower at others. I’ll be the first to say that I never believed Apple’s claims of being faster than a 3 ghz.
Even Intel has abandoned the “greater the GHz, the faster the processor” marketing mantra (either directly or implied). While clock speed is certainly part of a processors raw power, there can be (and are) processors 1 GHz slower in clock speed yet more powerful. Just take a look at the SPEC website. (http://www.spec.org)
Another problem I had is the use of Apache 1.3 instead of Apache 2. There are a lot of production sites that are starting to use Apache 2. My company is because of the dramatically increased performance for what we’re using it for.
What difference does that make, since the Linux tests were also on 1.3? It’s an old benchmark nullifcation technique: “You should have used X application/version instead of Y”. I believe more people still use 1.3 as well, because of the PHP thread-safe issue (some PHP libs aren’t thread-safe).
Besides, performance could easily have been worse, since Apache 2.0 is threaded, versus 1.3’s use of processes, although that would be an interesting test.
Additionally, the article stated itself that the version of GCC they used was poor at vectorizing critical areas of the software they compiled.
Since the vast majority of applications are compiled on GCC (especially open source apps) for Mac OS X (it comes with GCC), that’s just part of the operating system characteristics. Also, vectorization isn’t likely the root cause of the poor threading and process creation performance. It’s fair game.
As for the x86/64 side, the same is true there as well, although GCC is better at optimizing x86 code, attritubed at least in part to it being in such wide use.
The article also espouses several views of the internal kernel structure of OS X, views I would question the accuracy of. I would only be happy if they were direct quotes from an Apple engineer instead of pure speculation or educated guesses.
Why would you question the accuracy of it, just because it’s not flattering?
I would also question their claims that it’s unsuitable for server applications, since the xserve raid boxes that Apple sells have been noted for very good performance, price, and reliability. They have been very popular in environments demanding high levels of performance, such as video capture server farms, virtual hosting, etc.
They quantified their results in terms of performance, not popularity, which is much harder to quantify. Also, video capture farms depend on data throughput, which wasn’t measured here. Virtual hosting would be
As I said, I have mixed feelings about this article. I never said it was a complete piece of trash or that it was horrible (because it’s not) or that all of the information was inaccurate (because most of it is). I just don’t think it’s as balanced or reasonable as it could be.
You said there were many factual mistakes, but you haven’t actually pointed out any.
I apologize if what I said implied that you thought the total article was crap. My comments were more in general to the responses that this article was likely to get, that being that people will try to discredit it simply because they don’t like the results, rather than the methodology.
It had some good things to say about the PPC chip and Mac OS X, and some bad things to say about Mac OS X.
I would also have liked to see Linux on PPC, to see what the difference is, but I agree it’s perfectly fair to compare Linux on Intel and Mac OS X on PPC.
2005-06-03 11:09 pm
Anonymous
Know what you’re benchmarking. The author knows what they’re benchmarking, a few people here don’t seem to.
They’re benchmarking a system, in this case the system is inclusive of the hardware and software.
You may not agree with the methodology, but that doesn’t in anyway invalidate the results or conclusions, save that little YDL comment was stupid.
2005-06-03 11:11 pm
Anonymous
correct me if i’m wrong but kernel being used in FreeBSD is pretty much the same they use in OSX (or isn’t it?). yet i think i have read a few comparisons and afair FreeBSD did on pair with linux (i mean if there might have been some differences but the magnitude was same, FreeBSD was not 10 times slower)
2005-06-03 11:15 pm
Anonymous
@kelson
You will note at the beginning of the article, they give special thanks to 3-4 engineers from Apple Europe. There is a specific issue w/ OS X and it’s ability to handle threading well.
Please show me a quote from the article or a sentence that states “Apple engineers say that threading performance has serious issues.” That’s not what I saw, what I saw was a quote from Apple saying that companies like Oracle were able to get good threading performance anyway. This tells me that MySQL could be doing things better, even if it isn’t doing something “wrong” in general. What’s right for one platform is not right for all platforms.
I could write an article without consulting any apple engineers at all, and then thank Apple and specific people at the end of the article. That doesn’t mean anything, it certainly doesn’t prove that I actually talked to those people. Direct quotes and citation. That’s how real research is done.
2005-06-03 11:19 pm
Anonymous
correct me if i’m wrong but kernel being used in FreeBSD is pretty much the same they use in OSX (or isn’t it?)
the diferences are explained in the article. the freebsd stuff is running on top of mach 3.0.
2005-06-03 11:22 pm
Anonymous
They wrote a pretty interesting article. Sure, they didn’t use best compiler nor updated one, maybe, and maybe some comparison wasn’t very homogenous, but in the end it was a comparison very close to real-world situations. Real-world developer tend to mix and match sometimes. But I’m talking about workstation part of analysis.
When they got to server part, they have fundamental problem: they had no OS X Server to do a full test. Now, if results were fairly close to each other, one could try a wild guess and suppose they are accurate. But honestly I cannot believe Apple doesn’t know that they’re server counterparts are 3-4-5 times faster. I cannot believe they aren’t aware. Since Apple is proud of OS X Unix roots and since they sell such roots as a value point, I simply cannot believe they’re trying to sell a server system which is so slow.
So I believe that OS X Server is different than its workstation counterpart. Either because they got aware that is was so slow or, probably, (hint) because they don’t want to allow Joe Average to turn their desktop system into a full blazing fast server system… could that be?
2005-06-03 11:23 pm
Anonymous
A performance comparison I would like to see is Linux, probably Gentoo vs. OS X on a G5.
2005-06-03 11:30 pm
Anonymous
Not surprising really. Linux is a very high-performance server operating system. Very fast system calls. User-space locking is very fast with futexes. The processors seemed well-matched, although again it doesn’t surprise me that the Opteron is the fastest.
2005-06-03 11:31 pm
Anonymous
May not mean anything, but I’ve noticed in trying to compile apps on my G5 powerbook, using fink and such, the compile time is dreadfully slow. And I mean even in comparison to say my old 1.2 Ghz Athlon.
2005-06-03 11:40 pm
Anonymous
hey
how did you get G5 powerbook?
i want one too!
2005-06-03 11:46 pm
Anonymous
FTA (page 1): “The 64 bit Apple Machines were running OS X Server 10.3 (Panther) and OS X Server 10.4.1 (Tiger)”
2005-06-03 11:49 pm
Anonymous
@Shawn: This tells me that MySQL could be doing things better, even if it isn’t doing something “wrong” in general. What’s right for one platform is not right for all platforms.
Yay for passing the back. That’s the sharpest way to rephrase “Oracle had to hack around the brokenness of Darwin” that I’ve yet seen.
@pokryfka: OS X does not use anything resembling a FreeBSD kernel. The best way to explain Mach is to tell the history. NeXTStep was based on Mach 2.5 and 4.3BSD. Ie: Mach 2.5 was the kernel, and 4.3BSD was a userspace server providing OS functionality. When Apple got ahold of OS X (might have been earlier at NeXT, I’m not sure), they updated the OS to Mach 3.0 and 4.4BSD-Lite2 (the last Berkeley release). In the process, they shoved BSD back into kernel mode, replaced message-passing with function calls, and added some of their own stuff (like the IOKit). They also took the FreeBSD userland, such as all the system utilities and the C libraries, and ran it on top of the underlying Mach/4.4BSD kernel. This is what Apple speaks of when they say “based on FreeBSD 5.0”. Over time, OS X has incorporated significant subsystems from other BSDs. I believe the UFS implementation in there now is FreeBSD’s, while the IP stack is from NetBSD. However, a lot of the core performance-critical stuff (the VM, threading, and block I/O subsystems), is still based on 4.4BSD-Lite2 + Mach 3.0 + NeXT-developed updates to both.
2005-06-03 11:53 pm
Anonymous
You mean a G4 powerbook or a G5 tower? The compile times depend on what your building want to be specific?
It was a fun read. Lots of people would like to see the linux on the G5 vs the Darwin on the G5. Plus it would be interesting to see the newest compilers with the proper flags and such just for fun. And other crossplatform database software just for the heck of it.
Again anandtech has given a good review.
2005-06-04 12:03 am
Anonymous
The OS X kernel is composed of bits of the FreeBSD kernel on top of the Mach micro-kernel (which was long ago derived from BSD). The FreeBSD kernel doesn’t have a problem with kernel-level threads (apart from the giant-lock, but that’s not relevant), but Mach does.
2005-06-04 12:04 am
Anonymous
I’m running ydl on my pbook and thinking about throwing it on an xserve I use for web/mysql/etc. However, I am currently on freenode #yellowdog (as per their web sites instructions) and I see: “Topic for #yellowdog set by UdeS at Sun Mar 27 00:17:18 2005” It’s a ghosttown.
Anyone running y-hdc on their cluster xserves?
2005-06-04 12:15 am
Anonymous
I wonder how many of these problems have been fixed but have just not made it up to the application level?
The non-autovectorising in gcc is already known, Tiger comes with gcc 4.0 which includes it, why didn’t they test with it?
I’ve seen other tests where the IBM compiler was thrown into the mix, it made quite a difference.
The kernel interfaces in OS X have been frozen for the foreseeable future, I believe this is to allow them to fix things like the threading problem.
I know the latency was a problem on the G5 but had never seen real figures, this was also an issue in POWER4 but they fixed it in the POWER5 (with a corresponding performance boost), I wonder how well the 970GX/MP will do?
I’ve always said the G5 is a good CPU held back by compiler and software issues, taking much longer than I thought to fix them though.
I wonder what’ll happen when the 350* makes it to the market, it’ll be much more dependant on the compiler than even the G5.
*350 (if that’s what it’ll be called) is from the same lineage as the PPE in the Cell and the XBox 360’s cores.
2005-06-04 12:24 am
Anonymous
Quote: “What’s right for one platform is not right for all platforms. ”
Obviously not. But – if you have 4 platforms and mysql works fine in 3 of them, then one should ask the question, “why is it not working properly in this 4th platform?”
It does appear that there is a major threading issue with OS X and the mach microkernel/freebsd 5 kernel, that is NOT present in the other operating systems.
You state that we should ignore it because Apple has not officially announced an issue, come on! Are you that gullible? For Apple to officially recognise this problem tells people that there is a design/implementation problem with their operating system kernel. That means they lose face, and they lose sales. Apple is first and foremost a PR company – as an ex employee I can honestly tell you that staff are drummed to be positive about Apple products and NOT to publically condemn a product, even if there is *serious* problems.
At some time in the past, Apple tested the Linux kernel. We can all make suppositions as to why Apple didn’t use the Linux kernel (read: they don’t like the GPL and having to return back to the community, which is why they picked the BSD license).
I’d like to see what a Linux kernel, with Aqua running on top if it, with the sexy Apple hardware/looks and a optimised G5 cpu. I think it’d be kickass. For workstations and servers. Of course, Apple would never do such a thing because of the licensing issue, and the fact that they like to sponge off the open source community without making little returns. Or making returns that are as worthless as ‘tits on a bull’ as the saying goes.
Now if only Apple swallowed some pride (well, Steve Jobs maybe), became a *true* open source supporter, not a half cocked PR supporter that they currently are and used a different kernel they’d have a kickass system that would make their users happy, increase sales, increase share holder profits, and eat into the crap that is Microsoft Windows. To the benefit of everyone.
Dave
2005-06-04 12:28 am
Anonymous
thanks for answers. i always though osx is heavily based on FreeBSD. anyway since many (most?) of the userland is GNU it should be pretty much the same. ok in case of *BSD they oftenly have different origins still company like apple can choose the best implementation out there (and if they don’t like GPL they can pick up one from Free/Net/OpenBSD) and since they all are supoused to be POSIX compilant it should not make much difference (or am i wrong?). also (following same reasining, though i’m not sure if correct) would it be very difficult to change a kernel? to write/add some system calls. (?) i mean if i happen to use unix (say solaris or unix) on university account it doesn’t make me much difference really: though i’m not really familiar with them most of the userland tools are same. also POSIX and system v. (though of course there are some differences, for instanece shared libraries work a bit diferently)
2005-06-04 12:48 am
Anonymous
“It is unfortunate Apple didn’t buy out BeOS.”
Be wanted to much! Plus who cares now, Apple has Be Engineers working on OS X.
Nice, article, really shows the muscles of the G5 970fx
2005-06-04 12:53 am
Anonymous
Please show me a quote from the article or a sentence that states “Apple engineers say that threading performance has serious issues.”
What you got was something far, far worse. Buck-passing, backside-covering and “Oh, but we think Oracle and Sybase may have got around the issues”. That’s a serious admission in my book. The performance they showed there is truly, truly bad and nothing I would ever want to see from an OS I was looking at. I wouldn’t expect it from Windows.
The moral of this story? Never, ever use Mac OS X as a server that does any kind of serious work, which rules it out of quite a bit. It would be interesting to see Linux running on the G5 as we’re simply not going to know how good it actually is with Mac OS, but quite rightly, they were looking at real-world usage.
Apple has a far more serious problem with their new found OS X toy than I had ever imagined. They make a distinction between the server and workstation in this article, but a serious and responsive workstation and desktop relies a lot on the performance of threading. It can’t really be understimated.
Oh yer – and AMD continues to rule so much it brings tears to your eyes :-). Even their budget offerings in the Duron, and now the Sempron, are shocking value for money given their performance. They’ve come a long, long way since the < K6 days and I know of no one for many, many years now who even has a vague idea who doesn’t buy AMD. Some of the guys there produced enviable stuff with DEC/Alpha and it looks like they’re worth their weight in gold again. They’re building that brand and reputation for quality that the Alphas had and it will put AMD in an even better position in times to come.
2005-06-04 1:05 am
Anonymous
Not up to Anandtech’s usual standards thats for sure. The GCC compiler is known to be a poor test for PPC’s. As everyone else said to claim Yellow Dog is a better choice without testing it is silly. BTW if you don’t claim to be an operating system excpert than don’t make definitive statements aobut them based on your assumptions.
2005-06-04 1:12 am
Anonymous
That has got to be some sort of joke. Why on earth would Apple do that? Do you think their developers want to port everything over, so soon after the OS X transition? Plus, why move to Intel.
And that damn article gets OS X’s lineage wrong again. It’s a “FreeBSD varient” now is it? The fact that 4.4BSD and Mach 3.0 already ran on x86 (and already run on x86 in the form of Darwin), that couldn’t have anything to do with it? I’m so god damn tired of the media now it’s not even funny.
2005-06-04 1:12 am
Anonymous
@ pokryfka
OS X is based off of BSD, but doesn’t work exactly the same way (changes made to it).
It uses a microkernel whereas BSD, Linux, Windows use monolithic kernels.
Microkernels – software interacts with servers (adds a layer) and these (servers) interact with the kernel.
Monolithic kernels – software interacts directly with kernel.
The problem is:
“Mach basically failed to address the sum of the issues that microkernels were meant to solve.”

2005-06-04 1:19 am

Anonymous
“It is unfortunate Apple didn’t buy out BeOS.”
I’m happy Apple didn’t buy them, because now we have Zeta for x86 hardware. Had Apple done so we wouldn’t have a BeOS based OS for x86 (well maybe someone, Haiku?, would have made a clone from scratch).
I guess though it is bad news for Mac users, because only choices seem to be OSX (which many seem to love anyways) or Linux.
BeOS was great & hopefully will be once again with Zeta.

2005-06-04 1:30 am

Anonymous
These benchmarks are great. Since we’re debating Apple/Linux at work now these are just at the right time!
Course it’d be nice to see intel compiled stuff compared to ibm compiled stuff.

2005-06-04 1:41 am

Anonymous
It is unfortunate Apple didn’t buy out BeOS.
i agree. if the time and r&d spent on next was instead spent on beos, apple would be lightyears ahead of wintel

2005-06-04 1:43 am

Anonymous
In reference to
http://www.anandtech.com/mac/showdoc.aspx?i=2436&p=3
According to http://www.amd.com/us-en/Processors/ProductInformation/0,,30_118_88…
Opterons actually has FSB i.e. 11.2 – 20.8 GB/s (@ core).

2005-06-04 1:44 am

Anonymous
“At some time in the past, Apple tested the Linux kernel. We can all make suppositions as to why Apple didn’t use the Linux kernel (read: they don’t like the GPL and having to return back to the community, which is why they picked the BSD license). ”
Actually, Jobs offered (sorta) Linus a job back in like ’96. It’s in “Just For Fun” if you need a reference (it’s a short book, but I don’t know what chapter still sorry).

2005-06-04 1:50 am

Anonymous
Opteron 250(and 252) has been superceded by dual core Opterons…

2005-06-04 2:13 am

Anonymous
Not up to Anandtech’s usual standards thats for sure. The GCC compiler is known to be a poor test for PPC’s. As everyone else said to claim Yellow Dog is a better choice without testing it is silly. BTW if you don’t claim to be an operating system excpert than don’t make definitive statements aobut them based on your assumptions.
And why isn’t up to their standards? Because you didn’t like the results? It was an extremely comprehensive, very thurough review.
Mac OS X is the operating system most people use when running Mac servers, so it’s perfectly valid. And the GCC issue would still be there if they used Linux. It’d be nice to see Linux yeah, but that doesn’t invalidate the article.
Yeah, GCC isn’t as hot on PPC as it is on x86, but GCC is what most people use, except for a few high-end proprietary software companies. And even then, the GCC problem isn’t directly related to the threading problem.
Also, on x86, GCC specific optimizations don’t really make much of a difference for MySQL, since it’s heavily I/O, threading, and general system call dependent.
It’s really obnoxious when people trash a benchmark only because they don’t like the results. Operating systems don’t get better when you pretend there’s not a problem, and shoot the messenger.

2005-06-04 3:48 am

Anonymous
typo, no G5 for me, G4 here…
of course, if the article above this one’s true, looks like there NEVER will be a G5 powerbook from apple

2005-06-04 4:06 am

Anonymous
As soon as I read “The RISC ISA, which is quite complex and can hardly be called “Reduced” (The R of RISC)” I knew I should take everything that followed with a very very large grain of salt. Naturally, I was not disappointed.
For a supposedly knowledgable fellow, Johan has no credibility if he somehow relates the number of registers or instructions with the R in RISC. RISC stands for Reduced Instruction Set Computer. There are other issues people have pointed out with his testing methodology, but I will keep on track here.
Read that definition again. Reduced Instruction SET Computer. It does not mean reduced number of registers (I’d like to see that be possible as compared to x86), reduced number of instructions (again, a common mistake), it means reduced SETS of instructions.
For example, hardly anyone uses BCD instructions (Binary Coded Decimal) anymore. They got axed. So too did all the unique and exciting addressing modes on 680×0 which were necessary, but made redundant after massive numbers of registers were introduced. Ditto for all those string operations in favor of register loading operations which were better anyhow.
Did Johan understand this at all? Does he know that the in-flight operation of x86 processors that he’s so proud of is a kludge, a hack to make backwards compatibility faster? I notice he’s whipping the G5 for having a massive pipeline, but no real mention of the ridiculously large pipeline on the Pentium 4? If he wants to get realistic, he should look to the Cell processor for the future of how computing shall be.

2005-06-04 4:32 am

Anonymous
There’s a reason you’re commenting in a forum, and people are publishing his work:
1.) He was likely joking. People often make fun of the number of instructions on PPC. And it is a lot for a RISC system (which does stand for Reduced Instruction Set Computer).
2.) RISC is better described as load-store. Which he didn’t mention, but did talk about how it has more general purpose registers than x64. This obviously makes load-store easier, because you can have more in-processor values.
3.) RISC also implies that you will attempt to make all instructions complete in the same number of cycles. Where CISC processors were historically known to mix complex and simple instructions; RISC processors believed compilers were smart enough to make good complex sets of simple instructions.
4.) The internal design of Intel’s chips is said to be RISC; with a heavy frontend (which cost them dearly in performance on Pentium I) to sort-of translate.
What are you calling a hack? If it actually improves performance these days without causing errors it’s hardly a hack. Anything seems to be up for grabs to improve performance these days. Is there some way for it to be pure and elegant that seperates it from hacks?

2005-06-04 5:12 am

Anonymous
Is this really surprising? It is old news that Linux is a better, more scalable, more portable and more efficient kernel than whatever Apple patched up under OS X. Apple’s forte has never been any of the aforementioned attributes . However, they shine when is comes to user engineering, eye candy and marketing. Why is this so shocking? Anybody using OS X as a server is just being a passionate fanatic. That’s fine.

2005-06-04 8:16 am

Anonymous
> 1.) He was likely joking. People often make fun of the number of instructions on PPC. And it is a lot for a RISC system (which does stand for Reduced Instruction Set Computer).
I’m sorry, but joking about a core distinction between two types of computing styles is not a reliable way to go without explaining or referencing it correctly (which he didn’t).
>What are you calling a hack? If it actually improves performance these days without causing errors it’s hardly a hack.
I’m calling it exactly as I observe; that is, a kludge or a hack, in reference to the in-flight operation system of the x86 processors. Look to Cell with the in-order execution which does away with the uncertainty, which ultimately over the long term is a better, more aesthetically pleasing system. x86 grew from a simple accumulator chip to what it is today, and the amount of extra baggage that entails is crippling it, slowly but surely.

2005-06-04 8:56 am

Anonymous
Regarding MacOS performance on G5/PPC970FX; I have noticed a couple of quriks relating to performance;
1) When using the internet/network, I find that my connection quickly becomes flooded and takes an incredibly long time for connections to be killed off, for example, I’ll exit out of Bittorrent, and there is this massive lag between quiting the application and the eventual killing off of connections, meaning, the internet is almost unusable until those connections have been killed off.
2) Perfomance quirkyness – sudden stalls when doing this; I can’t really isolate it down to anything. It would be nice, however, for Apple (and the powers that be) can fix these quirks.
Just relating to the article, I think it would be rather sort sighted as to assume that Linux would be the panacea to PowerPC performance wowes. The underlying fact of the matter is, no matter which OS you choose to run underneath the benchmark, the underlying thing will be, it’ll be compiled by GCC.
Now, what I would like to see, hopefully, is a greater effort by Apple to push releases back to 2 1/2 years, split the development group in two and get the kernel side/foundation side of the operation working soley on cleaning up the the bottom layer, finely grain the kernel, clean up the scheduler, make the code from FreeBSD and Mach 3.0 integrate nicely rather than having two of everything, and optimise the code by eventually compiling the operating system with IBM’s PowerPC compiler, and work with IBM to bring Objective C and C++ to their compiler; licence the compiler off IBM, maintain it themselves and bundle it free with MacOS X, paying a royalty of say, $10 per unit, to IBM.

2005-06-04 9:03 am

2005-06-04 9:06 am

Anonymous
Now, what I would like to see, hopefully, is a greater effort by Apple to push releases back to 2 1/2 years, split the development group in two and get the kernel side/foundation side of the operation working soley on cleaning up the the bottom layer, finely grain the kernel, clean up the scheduler, make the code from FreeBSD and Mach 3.0 integrate nicely rather than having two of everything, and optimise the code by eventually compiling the operating system with IBM’s PowerPC compiler, and work with IBM to bring Objective C and C++ to their compiler; licence the compiler off IBM, maintain it themselves and bundle it free with MacOS X, paying a royalty of say, $10 per unit, to IBM.
IBM is making a major push with their Power processors, with the pServer line, Linux on POWER, and so forth. I would think that it would behove IBM to either open source/freeware their compiler, or have their engineers work on GCC to get the optimizations going for GCC (which would benefit both Linux and Mac OS X).

2005-06-04 9:41 am

Anonymous
True.
Lets be completely and utterly optimistic for a second (yes, this is a first time for me, its completely new territory, so bare with me).
Lets assume that IBM completely openstandardises the POWER archicture, and makes it completely royalty free and openstandards, just like SPARC is (the current version is V9 of the SPARC ISA).
Assuming that the *WHOLE* thing is Openstandards based, it would be of no loss for IBM to either completely opensource the whole PowerPC compiler collection, and simply charge for the higher up layers – the IDE etc. etc.
I’m not hopeful that they’d openstandardise it like SPARC has, but it would be nice if they did; if they were able to share the POWER costs amoungs more companies, the over all costs should decrease, if they also start working with SUN, get Solaris moving over to POWER as well, the opportunity for IBM to cut costs and drive up mind share would be huge.
Heck, if IBM came out tomorrow and said that in 6 months time they’re going to sell a POWER based workstation with 1MB Level2 cache, PCI Express graphics card + Solaris 10/JDS, and sell the whole thing for US$1500-US$1800, I would be more than happy to purchase such a machine.
You’d have a nicely designed machine, top class performance, and a rock solid operating system. You’d get all the perks of the POWER plus the added bonus of running a mature UNIX without the massive price tag attached.

2005-06-04 12:52 pm

Anonymous
Looks like in competition Mac OS X versus Linux winner is Linux. And in competition x86 versus G5 winner is x86. As im using in home x86 and Linux, i must say im happy to see this result.

2005-06-04 12:59 pm

Anonymous
Can you smell that ‘Alpha’ breeze comming from those nice Opteron chips: Lowest clock of the bunch, fastest in 95% of every test.
Well, looks like AMD IS the real winner in all this, and I’m glad I’m running on those babies!

2005-06-04 2:54 pm

Anonymous
If you have 2 variables and only one equation, you’re screwed. Thats what the article is: screwed. fix one of the variables and you can argue over an appropriate conclusion, but testing different hardware and software and then developing conclusions about just the software or just the hardware is just sad to see in a supposedly technical article.
“In science when testing, when doing the experiment, it must be a controlled experiment. The scientist must contrast an “experimental group” with a “control group”. The two groups are treated EXACTLY alike except for the ONE variable being tested.”

2005-06-04 5:02 pm

Anonymous
How does the RISC discussion (one that has no bearing whatsover on the actual results) invalidate the results of those tests? People like PC love to troll message boards, picking apart other people’s work, perhaps because they think it makes them seem in-the-know, authoritative, smart.
Oh? Did you see me complaining about the test results? It’s already been picked apart on /. how Johan short changed himself with “well, we didn’t get the software” “nobody uses that compiler in the real world” and other excuses which blew his credibility out of the water IMHO. I don’t really want to get into that side of things, to be honest.
What I’m discussing is someone who should know better who made a serious factual error pretty much up front, then brazenly misdirects everyone with correct information that was out of context. In fact, this happened a few times where Johan blithley throws pseudo-technical information with a few benchmarks and tries to nail it down as “more is better” or “less is faster” without actually thinking about it. If you’re willing to email me I could go into more depth with some specific examples.
But they never say how they would do the tests, let alone endeavour to put the work necessary to conduct a better set of benchmarks and tests. Why? … someone like them will fill the comments section up with the same pseudo-critical claptrap, and they’re afraid.
Oh really now? I’ll make an example of some of the benchmarks:
Micro CPU Benchmarks: isolating the FPU
First off, I would find the leading FPU benchmarking program per CPU. FLOPs is not exactly a reliable way to go, and using a program written in 1992 (yes, that’s flops.c. Find it for yourself to see!) is certainly not a good way to go. I’d start with a recognised benchmark like SPEC, then followed by Matlab running standard operations (such as FFT) or commonly used cross-platform apps (ie, 3D renderers), followed by some hand-cranked C code turned into assembly to see how much it can be tuned up per CPU to find the bottle necks and list them with caveats. (Also, make sure the vectorizor stays out of it. Vector != FPU, remember?)
If I really wanted to mess around, I could use Acovea to plug some standard code through gcc to get the best results, and better still, get an eval version of the Intel compiler and the IBM G5 compilers and let them go at it.
Micro CPU benchmarks: isolating the Branch Predictor
This one’s pretty much the same as above – find a decent program that exercises the branch prediction capabilities of the processor. That’s pretty difficult as there doesn’t seem to be an easy way to do it conclusively. I probably wouldn’t use Queens unless I couldn’t find a better program. In any case, tools like MONster (on PowerPC) can nail this to a tee fairly easily, and there are programs on Intel/AMD that read the specific performance registers to give a numerical value to the number of branches taken, successfully predicted and mispredicted.
It takes a lot of guts to publish a benchmarking series, especially if it’s going to get a lot of attention. It’s likely to tick a lot of people off, simply because they don’t like the results.
Benchmarking done well with respect for the scientific way is OK. Johan did not apply a rigorous discipline to his benchmarks or have it refereed to make sure there were no errors (hint: his list of references is pathetic – nobody should accept that, not even for 1st year CompSci paper). Instead, it’s pseduo-technobabble to confuse the masses into thinking he’s good, and a lot of excuses why he can’t test properly. Let’s just say, if I did it, I’d be passing it by as many people as I could to make sure there’s no oversights.
Perhaps I may do a benchmarking paper… I shall add it to the pile, right now I’m dealing with means of conclusively testing hard disk media and then after that a means of conclusively testing battery capacity, so I’m pretty busy for the forseeable future. (I consider hard disk media tests more important than a bunch of benchmarks. Benchmarks don’t lose data, for starters).
Then on the other hand, how about you do your own benchmarks? Or are you scared of having it exposed to the harsh light of peer review like you say I am exposing?

2005-06-04 5:11 pm

Anonymous
… Testing different hardware and software and then developing conclusions about just the software or just the hardware is just sad to see in a supposedly technical article.
Yes, that’s right, it is sad. What is worse is that I see Johan making amateur mistakes and glossing over the lack of substance and it fools 99% of the people reading it into not seeing the real faults in the article – lack of scientific discipline (or even an attempt at it).
In science when testing, when doing the experiment, it must be a controlled experiment.
Exactly. Pretty tough on computers, though. There’s tons of variables hiding out there, someone who works only at the application and “throw C code through gcc with -O2” level won’t nail them all (or anywhere close to it).

2005-06-04 5:50 pm

Anonymous
Benchmarks don’t lose data, for starters).
No HD’s do.However not really an issue when you have a raid5 configuration and you do incremental back-ups regularly.

2005-06-04 5:54 pm

Anonymous
Do you have any idea of Johan’s background? Have you seen any of the incredibly in-depth reviews on cpu architecture, memory technology, etc. on both aceshardware.com and anandtech.com?
Johan probably knows 100x more than you about the advantages and disadvantages of backwards compatible x86 architecture as well as the promises of the new future Cell processors. Hell, he has been able to play around and benchmark one, and anandtech has a very nice article about the Cell, have you even seen one in person?
Also, your description for how you would perform benchmarking sounds rather nice, however how long do you think it would take to finish a benchmark suite like you are suggesting? Months? Years? And you would really consider your benchmarking suite more ‘real world’ than his? He makes points several times that he uses certain approaches because those are the most common uses, i.e. ‘real world’ usage.
Sorry for the long rant, I just get sick of so many armchair quarterbacks who like to insult Johan when he has given much more to the hardware community than any of these people have or probably EVER WILL!

2005-06-04 8:35 pm

Anonymous
i think that they should compare x86 server processors (opteron and xeon are server processors) to REAL ppc server processors like ibm’s power5. i mean ppc970fx is a workstation processor and he does a impressive job compared to opteron/xeon. i mean opteron has 1mb l2 cache and xeaon has..2mbyte?
i want to say that a single ppc processor can never beat ALL x86’s that are aviable on the market. because ibm cant come up with new processors (like ibm ppc970gx with 1mbythe l2 cache) fast enough. but lets compare power5 to xeon/opteron. even a 1.65ghz power5 kicks opterons/xeons ass (and eitanium 2’s too). and to speak of the 1.9ghz version of power5. its still the fastest processor out there.

2005-06-04 8:39 pm

Anonymous
This review was adequate, but incomplete (to draw to a definitive conclusion). Should have included Linux or BSD on the Mactintosh.
People (like myself) are a little upset that he was able to load Linux on the x86 hardware, but didn’t bother doing the same on the ppc systems.
So, he tested the Opteron & Xeon with Linux OSes & the G5 with Mac OSX. You got different hardware (variable 1) & different OSes (variable 2). Only the hardware should be different, not the OSes.
Example – Ok, say for instance I have an AMD 2400 system running Gentoo Linux vs a P4 3.6Ghz with Windows 98. I run my benchmarks, get my results & compare. Do you think the results will be fair? Both are x86 hardware, but I used different OSes to bench them. Now, if I bench both systems on Linux (or Windows 98) my findings will be fair.
It is kind of what happened here. OS X is different from Linux (Linux uses monolithic kernel, & OS X microkernel, + other differences). Not accurate to say if the performance differences are because of the OS or hardware.
I understand that most people with a G5 will run OS X on it so that could be the main explanation for using it as only OS, but I was disappointed to not see the results with Linux on the G5. Just for comparison – so I could see ppc with Mac OS X vs ppc with Linux (vs x86 with Linux). This way I could also compare performance of OS X to Linux on G5 (see which OS is better on ppc).
The 2 best computer hardware sites are Tomshardware & Anandtech & I enjoy reading the reviews they do. They’ve kept me well informed on technology. The review was good, but was just lacking this 1 important item.

2005-06-04 10:05 pm

Anonymous
I respect what you are saying (except for the part about tomshardware, but to each their own), however if you read the third paragraph of the article (emphasis added by me), I think it shows exactly what the purpose of this article was:
“This article is written solely from the frustration that I could not get a clear picture on what the G5 and Mac OS X are capable of. So, be warned; this is not an all-round review. It is definitely the worst buyer’s guide that you can imagine. This article cares about speed, performance, and nothing else! No comments on how well designed the internals are, no elaborate discussions about user friendliness, out-of-the-box experience and other subjective subjects. But we think that you should have a decent insight to where the G5/Mac OS X combination positions itself when compared to the Intel & AMD world at the end of this article. ”
I too would like to see Linux on PPC (mainly because I expect it to smoke OSX from a Server standpoint) however I think the author clearly stated whath the purpose of this article was and Linux on PPC is outside this scope. He also mentioned in the comments section that as long as they had enough time with the G5 they were hoping to do Linux on PPC.

2005-06-05 12:31 am

Anonymous
Here’s an Apple Engineer’s take on this review. It looks like the reviewer is a bit hazy on some details of OS X.
http://ridiculousfish.com/blog/?p=17
In particular, be sure to follow the link to this mailing list post by Dominic Giampaolo, the filesystems guru:
http://lists.apple.com/archives/darwin-dev/2005/Feb/msg00072.html

2005-06-05 12:40 am

Anonymous
Oh, here’s some additional info to support the bogus fsync() claim that the blog post I linked to above makes.
http://hardware.slashdot.org/article.pl?sid=05/05/13/0529252

2005-06-05 2:29 am

Anonymous
No HD’s do.However not really an issue when you have a raid5 configuration and you do incremental back-ups regularly.
I’d like to say the majority of people do that, but you and I both know that Average Joe does not have 3 hard drives in a computer (let alone a laptop) or does incremental backups regularly. I should be working on some software soon that reports and tests HDD reliability working with SMART to warn people early on if a HDD is about to fail so they can get it looked at before it goes critical.
I had 4 HDD’s from 4 separate people in the space of 4 weeks where I work all fail. SMART had detected the errors, yet they continued on using the HDD’s despite numerous symptomatic warning signs. I managed to get the data back in all of the cases (1 was sort of OK, it was a real mess), but it all could have been avoided. None had backups. Neither Windows, MacOS X or the BIOS in the machines reported the SMART errors.

2005-06-05 3:51 am

Anonymous
Do you have any idea of Johan’s background? Have you seen any of the incredibly in-depth reviews on cpu architecture, memory technology, etc…
No, and I don’t care [that much]. I’ve been in computing so long now that I take “background” with a grain of salt. I’ve dealt with incredibly arrogant academics in CS who should know better. I’ve dealt with incredibly smart people without a degree to their name. I’ve dealt with people who know nothing about CS or computing, yet their insights are spot on and can put a so called expert to shame. If Johan revises his benchmarks I’d have more respect for him, so we shall see…
I did have a browse, it seems to me that Johan is a little preoccupied with faster, bigger and better. This is a trap if ever there was one, he would do well to start evaluating embedded systems and Mini-ITX so he can round out his knowledge and start getting wise about things. It seems he likes to write to the speed fan base a lot, they’re easy to fool and seem unable to think about things in depth.
Johan probably knows 100x more than you about… backwards compatible x86 architecture…
I doubt it. Maybe 5x-10x, if that. Johan is knowledgable, that is observable, however I doubt he is wise yet. Knowledge != Wisdom. Wisdom = Applied Knowledge. Think about those statements carefully.
… He has been able to play around and benchmark one … Anandtech has a … article about the Cell, have you even seen one in person?
I had a look around, I don’t see any mention of Johan benchmarking Cell. Enlighten me please by sending a URL of this. Last time I checked, Anand himself did that article on the Cell, not Johan. To be fair, I haven’t seen a Cell CPU in person, but then again, it costs about $3000 Au each time for me to travel over to USA + 4 days worth of travel time, so it’s not exactly easy for me to gander over whenever I like to check stuff out. I’ll probably buy a PS3, boot Linux up on it, work my way out of emulation and start experimenting if I can (I own a PS2 Linux kit, before anyone asks).
Also, your description for how you would perform benchmarking sounds rather nice…
“Sounds rather nice”. That sounds like you don’t understand that there are hardware performance counters in modern CPU’s that give you information on a very precise basis. Sounds like Johan doesn’t either, if he throws a few programs together and uses rough statistics. How about you search up on “hardware performance counters” and let me know if you think they’re more accurate or not.
… How long do you think it would take to finish a benchmark suite like you are suggesting?
At the rate I work, probably years. Not surprisingly, this is a tough job to do. SPEC (who work on these things) typically spend 5 years working out benchmark tests. (They released in 1995, 2000 and soon to get 2005 out). Again, I mention that I have more important things to work on right now which require my attention more urgently than basic benchmarking.
… Would really consider your benchmarking suite more ‘real world’ than his?
Depends. I’d have to make two benchmarks. Theoretical and Real World. Theoretical is to test the absolute max speed, then Real World to see what you actually get on certain apps to see how close you get to your Theoretical benchmarks. Naturally, you can’t test every single app that everyone uses, so for some people, it wouldn’t be all that useful. However, I’d document it pretty clearly with references so someone else could do exactly what I did and get the same results, which is what is required.
… I just get sick of so many armchair quarterbacks who like to insult Johan…
And how about you? I notice you’re not even offering a real email address, unlike myself, and I haven’t noticed any emails coming in from my offers to explain my position further. If I were really an armchair quarterback, I’d be posting anonymously and whining away without any justification whatsoever, let alone making any sort of effort to refute or expand my earlier claims.
Is pointing out factual errors “insulting” now? If anything, you should be insulted that you were being led along by the nose by a so called expert who has made some serious errors in his benchmarks, then as soon as someone queries it everyone objects and say they’re “insulting” the expert. Whatever happened to independent research and critical thinking to analyse what they’re saying? You insult your own intelligence by not thinking about it critically.
As a side note, I notice a few others have noticed what I have seen and pointed out other serious factual errors which should have been picked up ages ago (along with all the other comments in /., OSNews, etc) – http://ridiculousfish.com/blog/?p=17 I should mention that I have respect for Johan, but this recent set of benchmarks draws serious doubts in my mind that his research, test and benchmark methodology is complete and executed in a disciplined manner.

2005-06-05 6:07 am

Anonymous
Is pointing out factual errors “insulting” now?
Your First post:
Title: “Sigh… another wannabe expert…”
– “I notice he’s whipping the G5 for having a massive pipeline, but no real mention of the ridiculously large pipeline on the Pentium 4?”
FALSE! (from the second page of the article) :
– “The 970FX is deeply pipelined, quite a bit deeper than the Athlon 64 or Opteron. While the Opteron has a 12 stage pipeline for integer calculations, the 970FX goes deeper and ends up with 16 stages … [which] might make you think that the 970FX is close to a Pentium 4 Northwood, but you should remember that the Pentium 4 also had 8 stages in front of the trace cache. The 20 stages were counted from the trace cache. So, the Pentium 4 has to do less work in those 20 stages than what the 970FX performs in those 16 or 21 stages. When it comes to branch prediction penalties, the 970FX penalty will be closer to the Pentium 4 (Northwood). But when it comes to frequency headroom, the 970FX should do – in theory – better than the Opteron, but does not come close to the “old” Pentium 4.”
– “The design philosophy of the 970FX is very aggressive. It is not only a deeply pipelined processor, but it is also a very wide superscalar CPU that can theoretically sustain up to 5 instructions (4+ 1 branch) per clock cycle. The Opteron can sustain 3 at most; the Pentium 4’s trace cache bandwidth “limits” the P4 to about 2 x86 instruction per clock cycle.”
Seems to me like you are throwing around quite a few insults NOT pointing out factual errors. And at the same time are blasting the article for things that aren’t really there. If you read the second page of the article, overall he makes many good notes about the architecture of the G5 and notes several advantages it may have over both Opteron and P4.

2005-06-05 8:39 am

Anonymous
Seems to me like you are throwing around quite a few insults NOT pointing out factual errors. And at the same time are blasting the article for things that aren’t really there.
As opposed to just accepting whatever I read? No, you are correct to challenge what I say for further clarification. If you observe Page 3 (http://www.anandtech.com/mac/showdoc.aspx?i=2436&p=3) I couldn’t help but notice the 31-39 stage pipeline for the Xeon/P4 Irwindale which he barely mentions, yet seems fit to say something like “it’s OK, 8 stages are before the trace cache” while not listing the Pentium 4 Northwood and then implying that longer pipeline = bad, shorter pipeline = good in respect to the AMD Opteron in how it’s Integer/FP performance rocks. So, he compares the G5 to a older P4 (Northwood), which he doesn’t list, then doesn’t bother comparing it to the Xeon/P4 (Irwindale) which he does list. How does that figure out? I will say I should have specified which Pentium 4 I was talking about, I apologize for the confusion, which I give you credit for pointing out.
OK, let’s keep on going, since you want examples.
Johan then talks about Grouping where the Althon 64 (notice, no mention of it on Page 3) has a group of 3 instructions as compared to the G5 with 5, then claims that it’s inferior as it’s being NOP’ed or resource restricted all the time with no supporting evidence whatsoever, while blithley claiming “… but there the compilers should help the CPU with getting the slots filled.” for the Itanium, implying the Itanium is A-OK with that, while the G5 is disadvantaged, and IBM hasn’t got a clue about instruction grouping and scheduling whereas Intel does. Odd, given Johan never even used the Intel C compiler at all, by his own admission, and never researched it up properly, I find it pretty suspicious.
Then Johan paints the G5 as being crippled as its RAM latency is 135 ns. Opteron is 60 ns, and 100-115 ns with the P4. Yet, in his chart it shows the Xeon at about 150 ns (not mentioned), the G5 at 300 ns, which is way off base, and the Opteron at 133 ns (again, way off base). Now, I don’t know about you, but if in the space of 1 page I say it should be x ns and then it shows up to be 2x ns, don’t you think there’s a problem? Or at least explain it better? I know what he’s implying, but he should do a better job of it.
He then conveniently forgets to mention that a dual G5 has actually 21.6 GB bandwidth as they both run on individual 10.8 GB FSB’s, no mention of the Xeon, Opteron or whatever else in a dual processor configuration (which is what he’s testing on) – http://www.apple.com/powermac/architecture.html, for reference. Instead, it’s all about memory latency since the Opteron, Xeon and G5 all have the same memory bandwidth from the controller to the memory in order to paint the G5 as being worse off, then a dismissive “Enough theory.” to avoid getting into specifics.
Shall I continue?
The results are quite interesting. First of all, the gcc compiler isn’t very good in vectorizing. With vectorizing, we mean generating SIMD (SSE, Altivec) code. From the numbers, it seems like gcc was only capable of using Altivec in one test, the third one. In this test, the G5 really shows superiority compared to the Opteron and especially the Xeons.
I’m sorry to say, AltiVec and SSE is not the same as testing FPU performance. Johan confuses the two while managing to use code (flops.c) that was written 13 years ago and hasn’t been updated since. I’m curious to know how GCC 3.3.3 somehow autovectorized flops.c since that functionality was supposedly introduced in GCC 4.0.0 to test and is expected to get more mainstream in GCC 4.1.x.
Did Johan download 3.3.3 and compile it both for x86 and the G5? Or is he using Apple’s default 4.0.0 on the G5 under MacOS X 10.4.1? Isn’t that suspicious? Don’t you think using a hand-tuned GCC vs a bog standard one might throw results off a bit? I should point out the default compiler on MacOS X 10.4.1 is actually a modified GCC 4.0.0, dated 20041026, and that GCC 3.3 dated 20030304 is optional on MacOS X 10.4.1. Did Johan use gcc_select at all? Maybe he was using GCC 4.0.0 all along and “forgot” to mention it, throwing his results out of whack.
He then goes on about how the results were odd when he disabled SSE-2 and makes some general broad assumptions about how GCC generates code without checking the disassembly, then says he should use the Intel compiler, but hey, hardly anyone uses that in the real world, so he won’t bother. When someone says “the funny thing” and “It seems” they’re making assumptions. Not good.
If you read the second page of the article, overall he makes many good notes about the architecture of the G5 and notes several advantages it may have over both Opteron and P4.
True, but the amount of wriggle room from the examples above still leave me doubtful that Johan knew 100% what he was doing. The above sounds insulting, but if you research it up yourself you’ll see that there is still large gaps in Johan’s benchmarks that make no sense, which is what I observe. Again, if he redoes it properly, then I will have no complaints.
How about that Johan benchmarks Cell URL you mentioned. I’m still waiting. I’m giving you an easy way to back your words up, but instead you decide that me not telling you everything that I found in Johan’s article is “… me … throwing around quite a few insults …” like I have an obligation to tell you all I know otherwise I’m “insulting”. Have you researched up on those hardware performance counters yet? Or are you “insulting” me by not doing so? Or am I “insulting” you by not telling you all about it?
Again, if you do so, feel free to email me to discuss further. I welcome intelligent discussion on issues, and you’re free to disagree with what I say. However, don’t say I didn’t make an effort to answer to your requests and that I’m an armchair quarterback who likes to cut people down for no good reason.

2005-06-05 11:52 am

Anonymous
Always consider PRICE VS PERFORMANCE.
$$$ will always get you “the best performer.” But sometimes you’re a small business without an unlimited budget for IT.

2005-06-05 12:59 pm

Anonymous
Good arguments. I’ve always respected Johan and his work at Aces. Him, Chris Rijk and Brian Neal write some of the best and most enjoyable hardware reviews I’ve ever read.
That said, your analysis of the article does raise some interesting issues that he really should address, such as autovectorizing in GCC 3.3 (?!?) and other idiosyncrasies.

2005-06-05 3:32 pm

2005-06-05 5:46 pm

Anonymous
I’ve seen lots of interesting responses, but no one seems to be checking the factual accuracy of the original posting.
I’m not in a position to duplicate their mysql tests, but I just tried the following apache tests on my desktop, a dual 1.8 G5. This is not an Xserve, but if there’s a serious OS issue, it should show up. Note that this is a significantly slower machine than the one they used. I used the default apache configuration. Unfortunately they didn’t give enough information to know what options they used. Here are tests with a range of options:
using localhost on the same machine, with 5754 byte file
10000 no concurrency 1516/sec
10000 -c 20 2070/sec
10000 -c 20 with 71 byte file 2212/sec
10000 -k -c 20 with 71 byte file 3818/sec
from a very old Sun workstation on the same 100 M network
10000 no concurrency 427/sec
10000 -c 20 706/sec
10000 -c 20 with 71 byte file 969/sec
10000 -c 6 with 71 byte file 983/sec
10000 -k -c 20 with 71 byte file 3754/sec
An indication that the Sun client may be limiting is that when I tried 10000 -c 20 with 71 byte file from a faster Sun across campus I got 1223/sec rather than 969/sec. However at that point the network distance is an issue. So I’d say 1223 is the minimum, but with a fast local machine I might do substantially better.
At any rate, I’d say the most conservative number that is likely to compare with the article’s 200 or so is 1223. That uses a test across the network, without turning on keepalive (which sends several requests on the same connection). That’s still slower than the Linux 3776, but my system is dual 1.8 GHz compared the dual Xenon 3.6. So it may not be a speed daemon, but it isn’t the disaster claimed. And if they used keepalive, or if they tested on localhost (which is very common when publishing ab benchmarks) then my numbers look very good indeed.
With most OS’s, the critical item for apache performance is networking. Quite often it involves details of how connections are opened and closed. (In fact some benchmarks are done with parameters modified to avoid the full close protocol.) Both ends are critical for this, so the tuning of the Sun (from which the tests were done) is critical as well. I note in passing that I got significantly slower results doing the same tests to Sun’s commercial web server running on a 4-processor Sparc. So far I haven’t found a server on campus other than my Mac that does better than 669/sec, using the same test from the same system on which mine does 1223/sec.
So I ask: forget interpretation, are the benchmarks in the article right? The Apache number is so far off that I wonder whether he’s got a network problem, e.g. a mismatch between configuration of his computer and the switch port to which it is attached.

2005-06-05 7:11 pm

Anonymous
So I ask: forget interpretation, are the benchmarks in the article right? The Apache number is so far off that I wonder whether he’s got a network problem, e.g. a mismatch between configuration of his computer and the switch port to which it is attached.
That “mismatch ” would have occured when they had ran more operating systems on the same machine during the tests.On the other hand i may assume they have switched cables just to be sure the hardware attached isn’t discriminating the test results.But is it true?Don’t know i wasn’t there,fact is the final word hasn’t been said yet,and there’s enough still to be proven.
This is not an Xserve, but if there’s a serious OS issue, it should show up.
Does that apply the other way around when regarding the threading tests?

2005-06-05 7:15 pm

Anonymous
By the way threading,who says this isn’t exact the problem why Apple maybe seeks cooperation with Intel.If it’s an hardware issue,that is.We will not know from the article since they didn’t test for example SLES 9 on all the test machines.

2005-06-05 7:39 pm

Anonymous
>Does that apply the other way around when regarding the threading tests?
I have no direct knowledge of the difference between workstation and server kernels. But you’d think underlying mechanisms such as threading would be the same. However the original article didn’t do any threading tests. They checks signals and fork/exec. I would like to see fork/exec overheads low. But because fork is fairly expensive in many versions of Unix, server software tends to be built to avoid forks. E.g. Apache 1.3 maintains a pool of processes, to avoid forking for each operation, and Apache 2.0 uses threads instead of forks. (Despite the wording, the article didn’t test threading, and even if it had, because 1.3 doesn’t use threads, it wouldn’t matter to the Apache tests.)
Where fork/exec would matter is if you have some application that actually created a new process to process each transation. An example would be the original CGI mechanism, or a PHP/perl script that used “exec”. Most coders avoid that approach where possible, because it’s comparatively expensive even on OS’s with a lower fork overhead than OS X.
Again, the Apache benchmark ought to test primarily networking. But certainly other facilities are used, include I/O and signaling. To be sure you understand what’s going on requires a level of profiling that the original article doesn’t do. That’s assuming that there’s a problem in the first place, which I’m dubious about.

2005-06-05 9:03 pm

Anonymous
Here’s an Apple Engineer’s take on this review. It looks like the reviewer is a bit hazy on some details of OS X.

2005-06-06 7:25 am

Anonymous
I am sorry to say that, but the guys who wrote this do not seem to understand really what they are talking about.
– Fisrt why are they talking about processes and why did they measure the time to call fork() and exec(). fork() creates a new process, and MySQL does not. MySQL deals with threads, not processes, so what are they talking about here? How can they claim to comment on threads performance if they do not know what is a thread, or if they confuse threads and processes? MySQL creates threads as its running, so NEITHER forl() NOR exec() ARE CALLED WHEN MYSQL IS RUNNING. So their results from LmBench do not say anything to us, NOTHING.!!!!!
So how can they say anything about the performance of creating threads without profiling MySQL and try to know what is happening when the application is running. Just by guess? We are not here to guess, if they want to state something, they have to prove it proporly. Their misunderstood of the difference between processes and threads and about what fork() does exactly, is simply unacceptable.
– Then the authors wrote:
“Another problem is the way threads could/can get access to the kernel. In the early versions of Mac OS X, only one thread could lock onto the kernel at once. This doesn’t mean only one thread can run, but that only one thread could access the kernel at a given time. So, a rendering calculation (no kernel interaction) together with a network access (kernel access) could run well. But many threads demanding access to the memory or network subsystem would result in one thread getting access, and all others waiting.
This “kernel locked bottleneck” situation has improved in Tiger, but kernel locking is still very coarse. So, while there is a very fine grained multi-threading system (The Mach kernel) inside that monolithic kernel, it is not available to the outside world.”
What, ……what, what does it mean this? Do they really understand what they are writting? It seems that do really do not know how Darwin is designed. Let me explain in few words, because so many people really confuse everything.
Darwin is built upon two entities: the BSD layer and the Mach kernel which are combined into a kernel called xnu. Yes xnu is a monolithic kernel, and yes Mach itself is a microkernel. But in xnu, Mach is implemented with the BSD layer monolithically. Why Apple decided to implement it like this? Well, since MacOsX was not intended to work as a multi-server, and a crash of a BSD server was equivalent to a system crash form a user perspective, the advantages of protecting Mach from BDS were negligible and therefore messages passing was short circuited by having BSD directly call Mach functions. However the abtraction are maintained within the kernel at source level. xnu exports both Mach 3.0 and BSD interfaces for userland applications to use. That’s how darwin is built.
What about the locking strategy implemented by darwin that the author refered to. Well prior to Tiger the osx kernel used the so called funnels to serialize the access to the mach protion of the kernel. Funnels are built on top of Mach mutexes, and there used to be two of them: on for networking and the second for everythong else, mainly the file system. Funnels acted mainly like the FreeBSD Giant lock that controls access to the kernel ressources for the threads wishing to access to the kernel. Like in BSD, the funnels result in less than effective way of allowing multi-threads to exist inside the kernel, and moreover it did not allow for scalability for large multi-processors machines.
In Tiger, Apple managed to get ride of the funnels, the networking and the file system implement fine-grained locking for access to those parts of the kernel. The VM was already thread-safe as it was runninng inside Mach, and Mach has provided a thread-safe VM for a while. In Tiger the kernel implements full fine grained locking where it needs to, so saying such thing like “This “kernel locked bottleneck” situation has improved in Tiger, but kernel locking is still very coarse. So, while there is a very fine grained multi-threading system (The Mach kernel) inside that monolithic kernel, it is not available to the outside world.” simply does not make sense, they are not even aware about what has been done in Tiger which improves a lot the networking and the file system tasks on multi-processor systems. And moreover, implementing fine grained locking is fundamental in order to achieve high performance on dual core processors, and by doing it, Apple has open the door for dual cores processors for their machines.
And i am very againts the statement of the author that osx does not use kernel thread to implement users threads, of course it does, every thread created in the user land maps directly to the the Mach threads. What are the “several threading wrappers” that they are talking about? What does it mean?
-About the micro benchmarks of the fpu, we can read:
“The results are quite interesting. First of all, the gcc compiler isn’t very good in vectorizing. With vectorizing, we mean generating SIMD (SSE, Altivec) code. From the numbers, it seems like gcc was only capable of using Altivec in one test, the third one. In this test, the G5 really shows superiority compared to the Opteron and especially the Xeons.”
Heyyyyyyyyy……!!!! What is going on here? Gcc 3.3.3 do not generates vector code. Where did they get this? Auto vectorizing has been introduced in Gcc 4.0, so Gcc 3.3.3 does not generate any vectorized code, no way, please, someone tell them that they are completly out of the question. Apple introduced Gcc 4.0 with Xcode that is able to generate vectorized code for Altivec, this auto-vectorization should produce very good speed bumps for code that handles a lot of array of data, but of course will not provide the same level of optimisation as a hand-tuned code for Altivec. And moreover they seem to confuse Altivec and SSE, as both are not implemented in the same way in both architectures.
And i am suprised that some guys out there can believe that they can get a good image of the FPU performance of a processor (X86 or G5) by testing an application that was coded in 1992, and was not desiggned to run on an architecture like one of the G5.
They conclude like this withe the FPU tests:”The normal FPU is rather mediocre though. ” Well, can i believe this, when i know that the G5 FPUs are directly taken form the beefy Power4, which was leading flotting point performnace. And why G5 systems always give better results than any x86 systems on Linpack test which is a recognized floating point test suite among the industry.
So finally, the authors of this article come up with conclusions completely un-verified with tests that are not relevant at all. Their analysis about the thread implementation is simply ridiculous, only based on guessing things that they even don’t understand completely.
If there is performnace penality on osx when running MySQL, i don’t think that it comes from the thread implementation in osx, and the authors completely failed to prove that, as they are testing different things in the LmBench. The reason noted here
http://ridiculousfish.com/blog/?p=17
seems to be valied because any MySQL Queries implies read and write operations.
Anyway, more investigations is needed, and coming up with guessing is simply not acceptable……

2005-06-06 9:08 am

Anonymous
Look here, its a review of the xserve G5 published by pc magazine. They tested the performance of Apache running on osx and the G5 with their WebBench test, and the performance are very strong, someting very different with the results of the Apache test shown by AnandTech. So what????????
http://www.pcmag.com/article2/0,1759,1630329,00.asp

2005-06-06 12:05 pm

Anonymous
For fun, I’ve just tried to compile flops.c on GCC 4. For some reason, the Apple shipped GCC 4 don’t provide information on what loops get vectorized (-ftree-vectorizer-verbose=5 does nothing). I compiled it on Linux. Here is what GCC said.
flops.c:240: note: not vectorized: nested loop.
flops.c:249: note: not vectorized: number of iterations cannot be computed.
flops.c:269: note: not vectorized: number of iterations cannot be computed.
flops.c:308: note: not vectorized: number of iterations cannot be computed.
flops.c:325: note: not vectorized: number of iterations cannot be computed.
flops.c:365: note: not vectorized: number of iterations cannot be computed.
flops.c:405: note: not vectorized: number of iterations cannot be computed.
flops.c:445: note: not vectorized: number of iterations cannot be computed.
flops.c:486: note: not vectorized: number of iterations cannot be computed.
flops.c:531: note: not vectorized: number of iterations cannot be computed.
flops.c:574: note: not vectorized: number of iterations cannot be computed.
flops.c:174: note: vectorized 0 loops in function.
From this, we can see that even GCC 4 couldn’t vectorize that code. It is highly unlikely that GCC 3.3 did.

2005-06-06 12:20 pm

Anonymous
And anyway the authors are more wrong when you consider that if you look at the source code of Flops, you will see that the application uses double precision floating point numbers. And Altivec do not support double precison floating point numbers only single precision. So anyway even GCC 4.0 can not vectorize anything for Altivec within this code, which make me believe that the authors of this article only say bullshit, they clearly don’t understand what they are talking about.

2005-06-06 12:25 pm

Anonymous
“Anandtech did an excellent job benchmarking some of the latest CPU’s from “……………………well i really don’t think so!!!!!!

2005-06-06 12:44 pm

Anonymous
Yes, these issues I notice popped up on AnandTech’s comments as well as here (see my earlier post). I am glad that other people out there are thinking critically about these things and actually trying stuff out rather than taking everything hearsay. Hopefully, we may see an updated article and benchmarks soon to clear things up…

2005-06-06 1:50 pm

Anonymous
Hi,
Theses Benchmarks are interesting to prove that this configuration xyz, is or is not good performing. But nothing isolate the hardware. It would be good to see where it goes wrong.
But since i’m lucky , i had time + hardware, i,ve benchmaked a ibook g4 1333Mhz with both linux and MacosX 10.3.5, to see what would be the “realworld” performances on the same hardware.
here are the results:
L M B E N C H 2 . 0 S U M M A R Y
————————————
Basic system parameters
—————————————————-
Host OS Description Mhz
——— ————- ———————– —-
Medusa Linux 2.4.18- i686-pc-linux-gnu 652
localhost Darwin 7.5.1 powerpc-apple-darwin7.5 1331
localhost Linux 2.6.8-1 powerpc-linux-gnu 1331
Processor, Processes – times in microseconds – smaller is better
—————————————————————-
Host OS Mhz null null open selct sig sig fork exec sh
call I/O stat clos TCP inst hndl proc proc proc
——— ————- —- —- —- —- —- —– —- —- —- —- —-
Medusa Linux 2.4.18- 652 0.48 0.73 2.66 3.82 24.0 1.34 3.99 173. 1277 5823
localhost Darwin 7.5.1 1331 1.01 1.76 5.51 7.69 17.3 2.13 7.28 1458 4050 8444
localhost Linux 2.6.8-1 1331 0.15 0.28 2.52 2.85 10.2 0.60 2.65 184. 665. 3135
Context switching – times in microseconds – smaller is better
————————————————————-
Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
——— ————- —– —— —— —— —— ——- ——-
Medusa Linux 2.4.18- 1.130 5.6400 16.3 7.5200 130.7 29.1 130.7
localhost Darwin 7.5.1 3.220 6.6700 18.7 9.1700 92.0 18.7 190.8
localhost Linux 2.6.8-1 0.700 2.0700 14.6 4.6600 69.0 7.23000 163.2
*Local* Communication latencies in microseconds – smaller is better
——————————————————————-
Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP
ctxsw UNIX UDP TCP conn
——— ————- —– —– —- —– —– —– —– —-
Medusa Linux 2.4.18- 1.130 5.731 9.52 30.8 59.9 43.9 81.6 299K
localhost Darwin 7.5.1 3.220 17.9 18.0 37.6 43.8 109.
localhost Linux 2.6.8-1 0.700 2.905 5.89 12.7 25.0 17.3 30.8 63.1
File & VM system latencies in microseconds – smaller is better
————————————————————–
Host OS 0K File 10K File Mmap Prot Page
Create Delete Create Delete Latency Fault Fault
——— ————- —— —— —— —— ——- —– —–
Medusa Linux 2.4.18- 44.8 6.9790 142.6 15.4 276.0 0.999 2.00000
localhost Darwin 7.5.1 181.2 269.8 2178.6 476.4 2311.0 22.9 5567.0
localhost Linux 2.6.8-1 15.4 4.7750 47.8 12.2 62.0 0.153 7.00000
*Local* Communication bandwidths in MB/s – bigger is better
———————————————————–
Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem
UNIX reread reread (libc) (hand) read write
——— ————- —- —- —- —— —— —— —— —- —–
Medusa Linux 2.4.18- 554. 281. 84.3 352.7 449.9 140.0 129.9 449. 177.5
localhost Darwin 7.5.1 344. 343. 214. 342.8 338.8 332.7 215.6 339. 690.3
localhost Linux 2.6.8-1 512. 638. 284. 472.4 337.1 204.5 213.9 339. 676.6
Memory latencies in nanoseconds – smaller is better
(WARNING – may not be correct, check graphs)
—————————————————
Host OS Mhz L1 $ L2 $ Main mem Guesses
——— ————- —- —– —— ——– ——-
Medusa Linux 2.4.18- 652 4.599 19.9 132.0
localhost Darwin 7.5.1 1331 2.254 7.5130 128.8
localhost Linux 2.6.8-1 1331 2.254 7.5200 127.4
A Intel P3 650Mhz 384MB gnu/linux gcc 2.95 has been added for references. This outline how much Darwin 7.5.1 can be slow or gcc 3.3 for apple produce “slow code”.
Where those latencies come from?
Maybe the BSD kernel? Seen similarity’s with other BSD based OS’s.
That was YLD orion 4.0 vs MacosX 10.3.5 using gcc 3.3 on both with no special Cflags.
To me it is fairly clear that there is a performance gap.
This could be easily transposed to the G5 performances.
Running linux this one would have shown a far better performances. Maybe better than some contenders? who knows?
Is there any lucky guys with spare times and Hardwares?
I will try Darwin on ix86 and maybe ppc too? Always comparing the same hardware vs same OS, I’m eager to see the differences.
Stop speculating, start benchmarking….
Charles 😉

2005-06-06 2:26 pm

Anonymous
Thanks for the lmbench results, but they don’t tell us anything we don’t already know. The problem with the benchmarks in the Anandtech article is that they aren’t benchmarking what the reviewer thinks they’re benchmarking (whoa… mouthful).
lmbench is well and good, but it measures the time it takes to fork() and exec() on your respective OS. These functions deal with process creation and execution (look up man pages) and Johan then proceeds to extrapolate that this is why _threads_ in MySQL are slow.
Linux is heavily optimized for process creation and has only recently had a decent threads implementation with NPTL. LMBench shows this. What we’re interested in is why the performance on MySQL is so bad and LMBench adds nothing to the argument.

2005-06-06 5:56 pm

Anonymous
@Charles – Thanks for your benchmarks
Processor, Processes – times in microseconds – smaller is better
—————————————————————-
Host OS Mhz null null open selct sig sig fork exec sh
call I/O stat clos TCP inst hndl proc proc proc
——— ————- —- —- —- —- —- —– —- —- —- —- —-
Medusa Linux 2.4.18——652–0.48–0.73–2.66–3.82–24.0–1.34–3.99—173–1277- -5823
localhost Darwin 7.5.1–1331–1.01–1.76–5.51–7.69–17.3–2.13–7.28–1458–4050–844 4
localhost Linux 2.6.8-1-1331–0.15–0.28–2.52–2.85–10.2–0.60–2.65—184—665–31 35
File & VM system latencies in microseconds – smaller is better
————————————————————–
Host OS 0K File 10K File Mmap Prot Page
Create Delete Create Delete Latency Fault Fault
——— ————- —— —— —— —— ——- —– —–
Medusa Linux 2.4.18——44.8—–6.979–142.6—15.4—276.0—0.999—2.0
localhost Darwin 7.5.1–181.2—269.8—2178.6–476.4–2311.0–22.9–5567.0
localhost Linux 2.6.8.1–15.4—–4.775—47.8—12.2—-62.0—0.153—7.0
Results that really stood out are above.
Take a look at the results for Darwin 7.5.1 (OS X) vs Linux 2.6.8.1 (YDL). Just by comparing you can see there is a problem.
ie:
1st table : 1458 vs 184, 4050 vs 665, 8444 vs 3155
2nd table: all of the values.
Looking @ Charles benchmarks I can definately say OS X wasn’t programmed all that well (from peformance perspective). Also, good chance that YDL (or other Linux variant) on ppc would have obtained performance close to the x86 hardware (for MySQL & Apache) if only had it been included in the test.
Everyone can believe what they want, but only when Anandtech includes Linux on ppc benchmarks to the review will it be fully complete & thorough. And probably come to the same conclusions stated by me.