Benchmarks: WinXP, OSX on MacBook Pro

Submitted by mcsimpson 2006-03-29 Apple 28 Comments

GeekPatrol uses their GeekBench tool to compare Windows XP and OSX, both running on MacBook Pros. “Overall, there are areas where the Windows XP MacBook Pro was faster, areas where the Mac OS X MacBook Pro was faster, and areas where they were both roughly the same. Looking at these results, it’s hard to say which configuration comes out on top, although I think you could make a convincing argument for Windows XP (with Visual C++) being a bit faster overall than Mac OS X (with GCC).”

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

28 Comments

2006-03-29 10:18 pm
plainstyle
This output of the benchmark doesn’t really compare the two OSes: They compare two Operating Systems, by compiling the same code but using two different compilers. Why not compile the Windows version with GCC ( using CYGWIN?). So, this is actually a benchmark between VC++ and GCC and outputs nothing new!
Edited 2006-03-29 22:32

2006-03-29 10:20 pm
ma_d
Because Windows programs aren’t usually compiled with Gcc.
They could have tried the absoft compilers on the Mac. I don’t know if they’d have got permission though (they’re only free for educational use).

2006-03-29 10:40 pm
plainstyle
But, compare two OSes while changing a very important factor? Everybody knows that GCC is not the fastest compiler out there and the reason for being default in most OSes, is its opensource nature, not its performance. Also if VC++ wasn’t a product of MS, GCC would possibly be the default for WinXP, too. With the money required to buy WinXP’s default compiler (VC++), someone can easily buy a faster than GCC compiler for MAC OS X, too.
For me, this was just another GCC V VC++ comparison (and not WinXP V MAC OS X) that outputed nothing new.

2006-03-29 11:17 pm
jayson.knight
“With the money required to buy WinXP’s default compiler (VC++), someone can easily buy a faster than GCC compiler for MAC OS X, too.”
Except for one thing, the VC++ compiler is free, and has been for some time: http://msdn.microsoft.com/visualc/vctoolkit2003/
2006-03-29 11:44 pm
ma_d
It’s a more realistic how it’s gonna get used test.
It is possible to do non-scientific benchmarks to estimate more real-world scenarios.
It’s default because it’s flexible, and familiar to many developers, and it compiles more languages than anything else. And yea, it’s free($$) and modifiable too .
But I really think using the typical compiler + OS gives a more realistic benchmark of the _PLATFORM_. I suppose they should talk about it that way then.
2006-03-29 11:48 pm
rayiner
Actually, VC++ is free these days (as in beer). In any case, everyone compiles with VC++ on Windows, even though they could spend money on a better compiler (Intel C++). The same will likely be true of GCC on OS X. Also note that only GCC supports Obj-C on OS X, so its pretty much the only reasonable compiler consideration for most OS X apps.

2006-03-29 11:46 pm
rayiner
The operating systems aren’t really relevant in these benchmarks, except as noted (the standard library implementations). Only a very bad operating system would have an adverse effect on what is basically a bunch of CPU benchmarks. What is relevant, however, is the platform. Simply put, GCC is part of the OS X platform, while Visual C++ is part of the Windows platform. GCC-code performance on Windows is more or less irrelevant, as is non-GCC code performance on OS X.

2006-03-30 1:00 am
modmans2ndcoming
Great… so this benchmark is saying “if you are doing windows programming.. use VC++ not GCC”
wow… what was the point of this test then?

2006-03-30 9:10 am
molnarcs
wow… what was the point of this test then?
To annoy mac fanatics – I mean fanatics as in people who automatically deny anything that has the slightest negative implications to their platform with religious fervor

2006-03-29 10:20 pm
ma_d
I couldn’t help it. It’s not horribly surprising, especially on the stdlib scores.
All in all I think they’re pretty close in _most_ of the benches. But it’s neat to finally be able to compare them on the exact same hardware. Imagine telling your dad this 15 years ago! Or if you’re old enough, telling yourself!
And some memory benchmarks depend on the OS. If they make use of malloc/free n times, where n is the length of the benchmark. But I think most of these don’t do that, definitely not bzip2.
And it also shouldn’t show much difference if the malloc’s are all the same size.
2006-03-29 10:32 pm
Anonymous
… And I’ll say it again — YOU DO NOT PUBLISH BENCHMARK SCORES SUBMITTED BY READERS AS CONCLUSIVE.
Seriously.
Just to name a few things off the top of my head … you don’t know what kind of tweaks/optimizations have been done to either installations. You don’t know what other programs are running during the benchmark. You don’t know if someone has disabled DEP on their XP installation. You don’t know if the tests were run repeatedly to iron out abnormalities, or whether abnormal scores for a few tests were submitted.
And so on, and so forth.
I appreciate what these guys are doing in terms of writing a neat little benchmark, but they’re going by it completely incorrectly. Oh, and about the topic of compiler optimizations …
MSVC and GCC are *very* different compilers. Most seasoned developers will provide custom optimization flags for each specific benchmark source file, knowing which optimizations are beneficial to that specific code. You can’t just use a few generic flags for everything. Each of these optimizations are also very specific to the version of the compiler being used, let alone different compilers.
This is all just very silly. They should borrow a MacBook Pro from a friend and run their benchmark in a consistent manner, documenting all of the settings used for both OSes.
2006-03-30 12:01 am
rayiner
If I didn’t type something in wrong, the geometric mean of all the WinXP results is 1.2177. With the four extreme outliers removed, the geometric mean is 1.1873. So basically, we’re talking about a 20% advantage for Windows. Not hugely significant, but not peanuts either.

2006-03-30 5:32 am
PowerMacX
If I didn’t type something in wrong, the geometric mean of all the WinXP results is 1.2177. With the four extreme outliers removed, the geometric mean is 1.1873. So basically, we’re talking about a 20% advantage for Windows. Not hugely significant, but not peanuts either.
Well, either you did type something wrong, I did, or geometric mean has a different meaning to you and me.
Removing the four extreme values (higher 2 & lower 2) gives me a geometric mean of of 1.0606 – that is, a 6% advantage.
Going a bit further, if I do the obvious thing and separate the data in single-threaded vs. multi-threaded, taking out the lower & higher values, the results are 4% advantage in multi-threaded and 6.7% advantage in single-threaded tests. (1.039 & 1.067 geom. mean respectively)
Of course, “geometric mean” has NO MEANING in this case to begin with, since each value measures a different element, and neither would arithmetic mean, modal, etc.
A true test if you want to compare the performance of both systems is testing Real World Application(s) X(YZ) in system A vs. Real World Application(s) X(YZ) in system B.
Edited 2006-03-30 05:42

2006-03-30 5:40 am
rayiner
Heh. I know jack about statistics. However, according to Wikipedia, the geometric mean of a series f is sum(f, 1, n)^(1/n). What definition of geometric mean are you using?

2006-03-30 5:45 am
PowerMacX
Excel’s geomean. That is, (value1*value2*…*valueN)^(1/N)
http://en.wikipedia.org/wiki/Geometric_mean

2006-03-30 5:58 am
rayiner
Doh! My eyes saw PI, but my brain saw SIGMA. Consider my previous comment rescinded
Edited 2006-03-30 05:59

2006-03-30 1:28 am
deathshadow
Given what a total DOG GCC is when targeting x86 compared to MS VC++ the results are hardly a surprise.
Haven’t seen anything this pointless on OSNews since the linux vx. win / SATA vs. ATA-133 benchmarks that used drives of differing sizes, cache and manufacturer.

2006-03-30 1:45 am
rayiner
Hmm. I couldn’t call a 20% performance difference dog-like. In fact, given VC++’s limited focus, and GCC’s much broader one (both in language and platform support), a delta of 20% on this particular benchmark seems quite good to me. When you consider that GCC’s current register allocator is quite primitive, and that its SSA infrastructure has yet to mature, future improvements could easily wipe out that difference and even turn it into an advantage. These improvements are going to come one way or another, either via the integration of LLVM’s SSA framework, or via alternate proposals that have cropped up.
Edited 2006-03-30 01:45

2006-03-30 4:21 am
sappyvcv
In the end, a 20% difference means a lot when you’re developing applications. Not to take away from GCC though, but it’s if you’re developing for windows with C++, there really is almost no reason to use it.

2006-03-30 3:03 am
Hank
This is one of several benchmarks that show the stdlib memory allocation functions as being woefully slow. Apple really needs to do something about that. A 35x speed hit in this benchmark and similarly, though not equally, bad benchmarks on other memory tests leads me to think Apple needs to concentrate their optimization folks in that part of the OS.

2006-03-30 5:09 am
rayiner
The 35x number on this benchmark reallly should raise some alarm bells about the validity of the test. Based on the documentation I’ve found, the malloc() in both OS X and Windows are thread-safe. There are only so many ways to skin the malloc() cat (particularly the thread-safe version), so I find it really hard to believe that OS X’s malloc() is that much slower than Window’s. Looking at the OS X libc code, it appears that the malloc implementation is one of the few parts of the stdlib portion that are not based on FreeBSD code. The OS X malloc implementation appears to have been written to replace the previous (presumably 4.4BSD-derived) malloc implementation in 1999, and it looks fairly sophisticated.
Therefore, I find it far more likely that either geekbench is hitting some sort of pathalogical case in OS X’s malloc(), or hitting some fast-case optimization in Window’s. It would be illustrative to see how the stdlib test performs on Linux, which uses glibc’s very good malloc() implementation.
Actually, after using ObjectAlloc to look at Geekbench’s allocation profile, I have to say the benchmark sucks. Basically, it repeatedly allocates and then immediately frees a 32KB memory block. This case is trivial to optimize for. All the malloc code has to do is cache the last freed block and its size, and if the new allocation size matches the old one, return the cached block. Windows is full of such “do nothing case” optimizations, but their advantage for real-world code is minimal at best.
A far better benchmark would randomize both the allocation size, and whether an alloc(), free(), or both would happen on any given iteration. It would certainly use a mix of both small, odd-sized allocations (think strings in a program), and large allocations that are multiples of the page size (think I/O buffers). This would test both the ability of the allocator to handle varying block sizes, the ability of the allocator to manage fragmentation within pages, and exercise the allocator with a non-trivial loading pattern with multiple outstanding allocations.
The asm code for the stdlib.allocate benchmark follows:
00016068 lwz r3,0x2c(r31)
0001606c addi r30,r30,0x1
00016070 bl 0x1b8c0 ; symbol stub for: _malloc
00016074 bl 0x1b930 ; symbol stub for: _free
00016078 lwz r0,0x30(r31)
0001607c cmplw cr7,r30,r0
00016080 blt cr7,0x16068
Note bene:
malloc() has the signature int -> pointer
free() has the signature pointer -> nil
On PowerPC, the first 4-byte integer or pointer argument is passed in GPR3. The return value, if it is a 4-byte integer or pointer, is stored in GPR3.
On PowerPC, large constant values (in this case, 32768 and the loop maximum) must be loaded from memory, as immediates values have a limited size. That’s what the lwz’s are for.
Edited 2006-03-30 05:12

2006-03-30 5:49 am
jfpoole
stdlib.allocate on Linux is about twice as fast as stdlib.allocate on Win32 on the same hardware (and with Linux running under VMWare to boot).
Thanks for the comments on the benchmark itself. I’ll look into re-working the benchmark so that it’s not quite as synthetic as the one in Preview 2.
2006-03-30 5:50 am
alcibiades
Thanks Rayiner. Informative and to the point as usual.
2006-03-30 11:46 am
mcsimpson
Well slap my butt and call me sally, a constructive post. Well done sir, well done indeed.

2006-03-30 2:12 pm
andyleung
You guys don’t have to post anything other than a statement of:
You are not comparing Apple to Apple!
2006-03-30 3:32 pm
bnolsen
the Mach base of OS-X definitely doesn’t help it much.
Basically you’ve got one “hack” OS (XP) compared with another OS using an antiquated 1980’s core. Neither of them are well equipped to handle multiple cores.
Apple needs a kick in the nads for not adopting BeOS. Forward looking design for it’s time. And it would have been cheaper than acquiring NeXT & Steve Jobs.
2006-03-30 7:29 pm
TomB7
Mar 30th, 2006 at 2:14 pm
OSX did much more poorly than I would have anticipated, but it is “early days”– and the numbers certainly aren’t too shabby.
Regarding the alleged shortcomings of Mach: Ave Tevanian has been let go; he was a serious Mach booster. A kernel overhaul may be high on Apple’s list of priorities.
Anyone know how much trouble replacing the kernel would cause software developers? I am not a developer, but I suppose it would be pretty transparent.

2006-03-30 10:33 pm
rayiner
Avie Tevanian wasn’t “let go”, he left the company to pursue other interests. In any case, he’s been in a managerial position since 2003, so if replacing Mach was on Apple’s list of priorities, Apple could’ve done it by now. The reason they haven’t, and likely won’t, is because fixing Mach’s limitations is likely to be much easier than ripping Mach out and replacing it. The BSD component of XNU is quite intimately tied to Mach, as is IOKit (and by extension all OS X drivers), and even some of the userspace. Replacing Mach would require a lot of time and effort on Apple’s part, and would hardly be transparent to developers.
On the other hand, if Apple spends some time working on Mach’s threading limitations, and continuing the locking work they’ve already started for Tiger, they can probably get XNU into pretty decent shape. It’ll never be in the same league as FreeBSD, Linux, or Solaris, in that it’ll probably never be a good fit for 64 CPUs, handle 10,000 threads per machine, or gracefully handle a process select()’ing 1,000 file descriptors, but to be honest, for Apple’s purposes, it really doesn’t need to. Apple isn’t in the high-end server business, its in the workstation business, and for such apps all you want is an OS that’s good at getting out of the way, which XNU does adequately.
All that aside, Mach really has nothing to do with these benchmark results. None of these benchmarks should spend a non-trivial amount of time in the kernel.