Frank Schoondermark has posted an overview of compiler performance with the FLOPS benchmark. FLOPS is a synthetic benchmark to measure floating-point performance in isolation. According to the test, the best compiler for full Pentium4/SSE2 optimizations is by far Intel’s ICC 7.1.
Is this really a surprise, that Intel manages to optimize its compiler the most, as they know all published and unpublished internals, like pipelining stuff etc. ?
Still, for people developing engineering number cruncher software, this is a really great product.
I think everyone else doesn’t need speed improvement by a small constant factor, when orders of magnitude are lost already by disk access and the user!
Check the tests, ICC is twice faster than any other compiler when SSE2 is turned on!
This explains why Apple didn’t use ICC for its G5 SPEC tests and used gcc instead.
I think this is a great example of how open source software benefits the world at large. Intel would probobly not be as dedicated to improving compiler performance if the GCC wasn’t such a contender, and so widely used. Borland hasn’t released any C compilers for Linux (I think Kylix is just C++), so Intel and GCC are the only game in town for C on Linux on Intel chips.
The really warm fuzzy is that all platforms the Intel compiler is released for benefit from this.
This is would really be something interesting if ANOTHEER compiler managed to beat ICC. I mean, Intel’s compiler being the best on Intel’s architecture. Heck, AMD recommended using ICC for best performance on the Athlon for a while. This will likely change as AMD64 matures.
Still, it would be incredibly cool if a lot of the better compiler and optimization technology weren’t locked up by patents. GCC would likely have improved a lot more quickly if more research were in the public domain, or at least royalty-free.
Ah well, the GCC 3.x series is a huge overall improvement over egcs and the 2.xx series so I’m actually quite happy with what I can get for free.
–JM
This was really silly, I agree with Ulrich Hobelmann, I mean wow, intel can make the best compiler for intel chips, wow, stop this press. this is real news.
I’m kind of thinking i liked it better when the PC architecture was a lot faster than the Mac. Now that the G5 has caught up (surpassed?) the speed of PC’s we are going to have ALLLLLLL of these debates all over again.
True, i was on the pc side last time. But now there is something called Os X. It crushes windows with its eyes shut. Its open and ready for the tweaking, where windows is really just a pain to do any of that. To me, the mac will always be faster than an Intel CPU, cause well, an Intel CPU can’t run Os X. So the G5, G4, and G3 all stomp it.
Raw cpu strength is pointless these days. Regular users couldn’t use it up if they tried. There are a couple of fields that haven’t a super cpu is good in, 3D, Video, Audio, and so on. Not average user stuff. I think the argument needs to go to the software now. And Os X has got that beat, they have the strength to RUN the software, Where do you wanna edit video? On a mac or on a PC? Where do you wanna do Audio? On a Mac or a PC? We all know the shortcomings of the PC architecture for realtime audio anyways, soooo……
I was really hoping this argument wasn’t going to come up, but as soon as the G5 was announced peolpe have been trying to nullify it. So Apple has another bad ass cpu? So what? Why do people need to put it down? Maybe its the fact that they have a BAD ASS Operating System to back it up with now. So maybe those who put down Apple should read a little more, and maybe admit that they are wrong, and tell the world how MUCH they want to run a Mac and have Photoshop AND a unix shell open on the same desktop. Jealous?
> I mean wow, intel can make the best compiler for intel chips, wow, stop this press. this is real news.
It actually is. Even AMD uses ICC instead of GCC when publishing SPEC results for their own chipsets. ICC is so much better than it works best even for AMD.
. . . and the Sun compilers generates better code for sparcs. . . surprising news.
>and the Sun compilers generates better code for sparcs. . . surprising news.
It is funny how defensive people get when they can’t have their favorite app [insert favorite/used to use OSS application here] to always win. They will then use reversed-psychology to show us that this is just not news, that it is just normal and not worth posting in the future.
Sorry guys, doesn’t work that way on OSNews. We report and support closed source software just as much as your favorite toys [not speaking of gcc specifically, which is otherwise a very-very useful app].
it’s quite justly!
intel makes p4, so anyone can beat intel? MS? NO.
I think it’s interesting why GCC and the other compilers are slower than Intel’s. If you read the little bit at the bottom from the GCC people, they say that Intel hasn’t told them (or based on the results told anyone else) enought to be able to write a good instruction scheduler for the P4. That said, I think it’s quite impressive that GCC is pretty close untill SSE2 is enabled. That means we’re doing quite well in the OSS world. I think it would benefit Intel alot to help the GCC folks, but we all know that won’t happen since Intel makes and SELLS their own compiler. If it wasn’t for that, I bet they would help.
Also, this does show just how important SSE2 is. Let’s face it, the P4 wasn’t that great a chip (the Athlons were wining for a while) untill SSE2 started to show up more and really showed the P4’s true ability. That and the clockspeed ramping are have really helped the P4.
Makes since…
The funny thing is everyone on the PC side was complining about Apple using the GCC compiler. Did you notice that the GCC compiler did better then MSFT’s compilers!
The PC fans were complining that no one used GCC under Windows and thus it wasn’t a good number! Well most Windows programs are compiled using MSFT’s compilers and they’re slower then GCC; thus, Apples numbers were acctually making the PC look better! I think Apple should have used the MSFT compiler and ran the SPEC2000 tests under Windows; that way the G5 would look ever better (damn Apple, why did they use the middle grade compiler instead of the worst).
I don’t think anyone is getting defensive, they are just makeing a statement of the DUH factor.
I mean realy, if intel can’t make a compiller that can make THE BEST binaries for their architecture, then I say tehy are in trouble.
I hear all these compliants from the PC fanboys that Apple was not being truthful back in the G4 days because they were using an Altivec optimised Adobie and the Windows version did not have that…..so, why when SSE2 is turned on for bench marks is it “WOW ICC ROCKS!!!” and “WOW, the p4 kicks butt!!!!”
Ok, given that the chip makers are perhaps best qualified to make the highest perfomant compilers for their CPU, what’s the timeline on IBM releasing GCC patches for GCC and the G5?
I mean, IBM is all fuzzy with Linux and Open Source. They should be motivated to have their chips do That Much Better than whatever GCC is doing today. Then, of course, Apple can leverage those mods and inflate their benchmarks that much more.
Does IBM sell a native, high performance compiler suite for their Power architecture?
Once they get a better compiler, to get the current offerings more on par with Intel, then they can start pushing the P4 to match clock ratings with Intel and brutishly pummel them that much farther into the ground.
The main reason of ICC development is to support the Itanium architectural features (speculation & predication). They’re pushing hard because one of the arguments against Itanium was “It’s difficult to write compilers for EPIC architecture”.
However that’s what makes Itanium amazing: leaving all parallelism related complications to the compiler. We don’t usually care how long compilation of our software is going to take thus we can expend longer time to create a binary which works as efficiently as possible. Trying to solve parallelism problem on hardware level in real time (what current processors try to do) increases the amount of circuitry and is not as efficient. Thus in theory all the circuitry in Itanium serves pure parallelized code. But designing an efficient compiler becomes an issue.
When it comes to Itanium which is ICC’s main strength gcc is way behind. And Intel is migrating to Itanium soon. I remember hearing that their last x86 design is taped out (design complete, ready for testing then production).
That would make icc the preferred compiler even for the open source projects in a couple of years. I wish they port icc to BSDs too soon and could get it to a point one could compile kernel. I remember there were ports for FreeBSD but it wasn’t working seamlessly last time I checked (it couldn’t generate native binaries by itself and one needed to use gcc after the objects were created). AFAIK it’s not in a level of maturity of kernel compilation even in Linux platform which Intel supports…
“Does IBM sell a native, high performance compiler suite for their Power architecture? ”
Yes they do, but they are only supported under AIX (and maybe OS400). IBM makes excellent development tools, not just compilers .
“Does IBM sell a native, high performance compiler suite for their Power architecture? ”
Isn’t that why Apple developed Xcode?
http://www.apple.com/macosx/panther/xcode.html
AS/400, AIX, and zOS!
IBM and Apple are both pushing patches into GCC. SUSE(I think thats the right company) is also under contract with Apple to improve GCC for the PPC.
Apple used to have their own compilers (MrC and MrCPP) which were based on the MOT compilers from back in the 68K days. They don’t seam to be able to donate any of that code into the GCC project(I’m think MOT contracts). I also noticied that the Apple compiler team is no longer an active part of Apple; I’m not sure where they went but their not working on compilers anymore.
IBM patents everything they do. Thus, they wont open their compilers to Apple or the GCC folks. This leaves Apple having to do their own compiler (a lot of work and money) or working with the GCC team (costs less and makes you look good with the OSS folks).
around the time of the Pentium 90/100 launch. IIRC they had a modified egcs optimized for the Pentium around that time. They went on to develop their own compiler afterwards.
Of course it helps to optimize a compiler for a specific architecture if you happen to also be the sole manufacturer of that same CPU, own all the patents and have all the engineers around.
OTOH when the gcc people get down to optimize SSE2 usage for floating point I guess we can expect that gcc will catch up with icc, to a certain point.
I can understand why some people are surprised with the results in that article, but those that have been following gcc development and/or have done some benchmarking with different compilers are not really seeing anything that they were not aware of before.
BTW these differences in performance of compiled code are only relevant for floating point code that can be vectorized. That´s 0.01% of the code in a normal Linux distribution, if that much.
So, yes, if you have a specific application that makes heavy use of vectorizable floating point code, AND you are using a P4, AND the precision provided by the SSE2 unit is good enough for you, AND you are happen to have have icc installed on your hard disk, why not use it?
More power to the masses, that´s what I say. ;^)
“I also noticied that the Apple compiler team is no longer an active part of Apple”
You might want to look at Xcode
http://www.apple.com/macosx/panther/xcode.html
People here confuse the current http://www.aceshardware.com GCC 3.2 tests with the ones that Apple did for G5 with GCC 3.3…
Let’s see http://www.aceshardware.com report with GCC 3.3 instead of GCC 3.2 then flame each other (not) about G5 vs P4 GCC tests!
Until then, calm down people! :p
–> I didn’t want to double post! I wanted to edit my post but I couldn’t find such a thing…
Good compiler writers usually separate the backend of the compiler from the frontend (actually, it is a bit more than that) so that they can easily port it to different architechtures/platforms. I doubt Intel cares much about this – and having not to split up your compiler to remain portable enables you to perform optimizations not available to gcc.
BTW these differences in performance of compiled code are only relevant for floating point code that can be vectorized. That´s 0.01% of the code in a normal Linux distribution, if that much.
Your kernel, desktop – most of what you touch will not be affected.
“Does IBM sell a native, high performance compiler suite for their Power architecture? ”
Yes they do, but they are only supported under AIX (and maybe OS400). IBM makes excellent development tools, not just compilers .
Yeah, and I bet if Apple had used AIX as the OS on their tests people would have cried foul!
Can AIX even run on a PowerMac G5 I wonder??
I believe the Intel compiler uses the Edison Design Group front end, leaving Intel to focus on tuning the back-end for high performance.
http://edg.com/
ICC was around a long time before the Itanium. Intel has needed their own compiler to work hand-in-hand with VTune. Together, VTune and ICC make it possible to optimize an application better than any other tools available on the Intel platform.
XCode is the replacement for Project Builder. It still uses the GCC compiler. Apple has placed their own modifications to it.
Anonymous: I’ve been looking at the patches for GCC coming out of Apple and I haven’t seen any of the old MrC/MrCPP people listed.
If Apple has truely added partial compiling to GCC, then I can’t wait for it to be back ported into GCC.
Question: Please note that this is a serious question and not meant to to be “leading” though I am certain that some will take it that way. Is there any way of being sure that there is no code in the ICC compiler which is designed to make it look better for this kind of synthetic benchmark? Similiar to the “Quake” optimizations with the (was it nVidia?) video driver?
For example, this code could be tweaked in a few different ways to accomplish the same result, such that if the compier were looking for certain patterns to determine that a synthetic benchmark was being compiled, such tweaking would break them, leading to dramatically differnet results? Or another test would be to do an “app test” level where a bunch of full applications which made heavy use of SSE would be compiled with each compiler and then tested in more real-world conditions? Such a test would legitimitize the results because, in a more real-world test it’s harder to cheat, and of course, if you really do get better results in a real world app — it’s not cheating anyway!
Erik
Unlike the previous Coyote Gulch benchmarks, this one is too simplistic for its own good. It’s a valid benchmark, but it measures a very specific situation: low-levle C code churning through (mostly) streamable floating point operations. Thus, there are some important things to keep in mind:
1) The 2x performance increase of the ICC compiler in SSE2 mode is entirely due to the fact that ICC is the only C++ compiler on Intel that auto-vectorizes code. In other words, it detects which loops can use SSE2 to process multiple pieces of data in parallel, and automatically generates the appropriate code. None of the other compilers are really using SSE2 properly. When SSE2 support is enabled, GCC and Visual C++ will generate SSE2 instructions, but will only use the SSE unit to work on one operand at a time. To really utilize SSE2 with these other compilers, you have to use an intrinsics library. Intel’s intrinsics library, for example, defines a vector type that contains inline assembly to use SSE2 instructions for certain operations.
2) This is a low-level C benchmark, and doesn’t test the compiler’s ability to do high-level optimizations, like inlining and whatnot. These days, very little code is written that closely to the metal, and a good high-level optimizer is very important.
3) This is an FPU benchmark, while most code people run is integer in nature. Historically, GCC’s integer performance is a lot better than its floating-point performance.
That said, ICC is an excellent compiler all around. It’s got good error messages (a god-send when debugging template code), a very good high-level optimizer, and a good integer code generator. I use both ICC and GCC a lot, and I must say that in real world code, the two are a lot closer than you’d think from this benchmark. ICC on the whole is faster, but GCC often surprises me. For example, I was benchmarking a memory allocation algorithm recently. GCC was consistently 50% faster than ICC, no matter what I did. Further, libstdc++ (GCC’s implementation of the C++ Standard Library) seems to have a performance edge over the Dinkumware library that comes with Intel C++.
Actually, most current compilers are quite good at detecting synthetic code. The optimizers (particularly the powerful whole-program optimizer in ICC) can often detect when no useful work is being done (the results of computations are being ignored) and just remove the dead code. Usually when that happens, though, you can tell because your benchmark finishes in 0.01 seconds Of course, this doesn’t seem to be the case here, since Intel C++’s results (as I said) are entirely consistent with it using the SSE2 unit in vector mode, while the other compilers are using it in scaler mode.
Now, what I really want to see is a benchmark of the P4, Itanium, Itanium2, AthlonXP, Athlon64, and Opteron using the Portland Group compilers:
http://www.pgroup.com
One thing that has not been shown is the Opteron using a nicely optimizing compiler in 64-bit mode. The PGI compilers (version 5.0) support Opteron in 64-bit mode with SSE/SSE2/etc. Anyone wanna’ bite the bullet and do it?
aix will run on the ppc 970. there will be a line of workstations using it from ibm,
Just a note to you people compiling for speed..
use -march=yourcpu , -fomit-frame-pointer (renders debugging impossible on x86, but gives you another register). -O2 , -O3 usually doesn’t speed things more up , and optimization bugs are often found at -O3 . -mfpmath=sse,387 , if you have an sse capable processor, this greatly speeds things up if doing lots of floating point fiddeling..
What the heck are you gonna do with ICC compiled SPEC benchmarks, while you running dumb slow lame windows on your P4 and all of your softwares were compiled with m$vc???
This is lame!
I think GCC’s doing pretty damn well, seeing as the tests were done on the platforms made by the same people as the other compilers (Win / Pentium, MS / Intel). Eugenia, don’t get so uppity when people mention this. It’s not some psychological thing, not a defense mechinism. Just think about it. It’d REALLY be news if ICC WASN’T the fastest compiler. That’d be a bad show on Intel’s part. That’s all people are saying. Are you saying it’s noteworthy that Intel makes the fastest compiler for their own chips / the chips whose basic architecture they made from the ground up 25 years ago? I wouldn’t.
However, I do think it’s interesting to see the difference between GCC and Visual Studio.
Yes, icc is very good in optimizing small application than can be vectorized.
But, on my project, less than 1% is vectorized, and the result of my sofware in either a segfault (using STL slist) or slow as hell (replacing slist by STL list).
This code has been tested on MIPSPro C++ compiler and with
all gcc version since 2.95. It run perfeclty in both case.
“is funny how defensive people get when they can’t have their favorite app [insert favorite/used to use OSS application here]”
And that includes you Eugenia.
One really has to remember with benchmarks such as these that if you are really doing any work that uses these types of optimizations, stability is the most important aspect.
And stability is still not all together there on mac/pc/windows/linux platforms without a good deal of tweaking.
I have been using Intel’s C++ compiler for building my crypto apps and I can say that ICC’s (or ICL’s) optimization can lead to UNFAIR and inaccurate results.
In Schoondermark’s benchmark test, the following ICL equivalent options should have been used with gcc (also applicable with MinGW):
-O3 -march=pentium4 -ffast-math -mfpmath=sse2 -fforce-addr -fomit-frame-pointer -funroll-loops -frerun-cse-after-loop -frerun-loop-opt -malign-functions=4
ICL/ICC does a lot of optimizations behind your back, some even alter the logic of your code, which is UNFAIR when used in benchmarks. Currently when using -O3 with ICL 7.1, the following “unfair” optimization is enabled:
-Qparallel
The Qparallel option actually creates parallel threaded code. For example:
Original code as it should be:
for (i=1; i<100; i++)
{
a[i] = a[i] + b[i] * c[i];
}
With the -Qparallel option, the following happens:
:Thread 1
for (i=1; i<50; i++)
{
a[i] = a[i] + b[i] * c[i];
}
:Thread 2
for (i=50; i<100; i++)
{
a[i] = a[i] + b[i] * c[i];
}
The -Qparallel option causes the compiler to do generate multi-threaded code on your behalf. The threaded code will provide speed improvements as it runs in parallel. A definite boost for HyperThreading and SMP systems.
I find that this option is turned on by default in ICC/ICL 7.x and not in previous versions. I believe that Intel decided to turn it on by default to improve HyperThreading and SMP performance. No compiler on the market does this, except for Intel’s own compiler! Thus it is UNFAIR to have this option turned on (eventhough unknowingly) during benchmarks! The author should be aware of options that should be turned off in order to provide fair results!
A fair benchmark between the G5 and P4 should be done using Metrowerk’s C/C++ compilers (CodeWarrior v8.3 for MacOSX and Windows). It is well designed and takes good advantage of the CPU on both platforms without altering code logic! It supports SSE/SSE2/3DNOW2/ALTIVEC!
I find the compiler debate lame when people don’t realise that GCC is designed to generate reliable and accurate code, whereas ICC/ICL is designed to generate faster code that makes unacceptable sacrifices at times.
For example, STLport works reliably in GCC with all optimizations turned on. For ICC/ICL, this isn’t always the case!
For a properly done benchmark check out the following links:
http://www.willus.com/ccomp_benchmark.shtml
http://www.coyotegulch.com/reviews/intel_comp/intel_gcc_bench2.html
Just my two ringgits.
-Magg
I was really hoping this argument wasn’t going to come up, but as soon as the G5 was announced peolpe have been trying to nullify it. So Apple has another bad ass cpu? So what? Why do people need to put it down?
Most people I know who can maintain an objective and rational perspective on this topic (admittedly few in number) don’t try to “put it down”, they merely take issue (and sometimes downright offense) at Apple’s deceptive marketing methods. Like running Bytemark binaries compiled for a 486 on a Pentium 2, or picking and choosing Photoshop filters and paremeters with disproportionately better performance on G4s to make a general performance claim, or the current dodgily made SPEC results. Or this gem from Apple’s G5 propaganda page:
“The Power Mac G5 is the world’s fastest personal computer and the first with a 64-bit processor — which means it breaks the 4 gigabyte barrier and can use up to 8 gigabytes of main memory.”
The new G5 Macs are undoubtedly fast – no arguments there. Although their 64-bitness is more “wow” than “useful” at this point. What bothers me is the practically insulting way Apple makes its performance and feature claims. This is not the hyperbole of “makes the internet faster”, it’s the deception of “these numbers demonstrate that our maches are X times faster”, nearly always with no details about the actual benefit. I thought the SPEC comparison was going to signify a change in Apple’s marketing methodology until I read the details of how it had been performed.
That’s the difference between Intel and Apple. I read Intel’s marketing and I just feel sceptical or have a bit of a chuckle. I read Apple’s marketing and I feel insulted.
So maybe those who put down Apple should read a little more, and maybe admit that they are wrong, and tell the world how MUCH they want to run a Mac and have Photoshop AND a unix shell open on the same desktop. Jealous?
Not really. I can have Photoshop and a unix shell open on a Windows box right now.
I like Macs and I like OS X. It’s good to see the hardware now price/performance competitive at the mid to high end, with machines that are hopefully actually fast enough to run the pig at a usable speed. If they can manage to make a fast enough lower end machine for a decent price, I might even buy another one (I had a PB667 but got rid of it because it was so frustratingly slow to use). I’m hoping for a revamped iMac line with 1.4 – 1.6Ghz processors at the current price points. More BTO options would be nice too, but that’s probably asking a bit much (as is a modular LCD).
Ok, given that the chip makers are perhaps best qualified to make the highest perfomant compilers for their CPU, what’s the timeline on IBM releasing GCC patches for GCC and the G5?
Apple have actually contributed quite a bit back to GCC in the form of improved optimisations for PPC. They’ve done a lot for it.
Overall, Apple were a bit silly to use GCC. It’s not particularly good at optimising for PPC _or_ x86. If they’d used one of the better PPC compilers (and compared it with a good x86 compiler), they could have had better PPC reaults _and_ have avoided all the people calling shenanigans on their SPEC numbers.
It’s hardly unfair that Intel C++ implements optimizations that other compilers don’t. Intel’s compiler does whole-program optimizations too, which can drastically change the order in which program logic is executed. Does that make it unfair? As long as the compiler doesn’t generated incorrect code (has only happened to me once, with ICC 6.0) it can do whatever it bloody-well pleases.
Order of logic execution does change the meaning of code. Other compilers don’t do it because of the bad side effects that are a result of this. Benchmarks should not only take speed into account but reliability and accuracy as well. If not then the benchmarks are unfair/unpractical.
Incorrect code generation by ICC/ICL has happened to me more than once! Especially math operations that are floating point intensive.
-Magg
Intel C++ tends to push the bounderies of the C++ standard rather hard. You’d be surprised at the amount of stuff that’s undefined in standard C++. Most compilers do things the way you’d except, but ICC will often reorder things to the point that it exposes flaws in subtly broken code. Of course, my uses of ICC mostly involve integer code (I use it for its good C++ optimizations, not for its low-level ones). It might very well be less safe for floating point code.
“Intel C++ pushes the boundaries of the C++ standard?”
What is that suppose to mean?
Compilers are supposed to comply with the C++ standards!
“ICC will often reorder things to the point that it exposes flaws in subtly broken code”
The code I use and write is not broken in anyway. ICC/ICL breaks the code by making incorrect optimizations.
At times ICL/ICC can’t even compile code that is plain correct. Recently I helped a bunch of students to optimize their game code. We tried optimizing SDL using it and it failed often that we dropped it and used GCC instead.
Try compiling SDL_gamma.c (included in the SDL 1.2.5 sources) with “all” optimizations turned on and tell me if it doesn’t fail!
“It might very well be less safe for floating point code.”
You mean to say: “ICC/ICL is not safe for floating point code.”?
-Magg
GCC can fail with “all” optimisations turned on too. What a moot point.
What’s with the “Kayaking in the Belgian Ardennes” photo at the end of the article? Was ICC used for that? 😉
The compilers might be great but they’re bug ridden – we get monthly updates that we need to apply as critical updates to our intel compilers. Never had these problems with Alpha Tru64 or SGI MIPSPro compilers . . .
They should include hand-crafted assembly to these tests also.
This proves what about the compiler exactly?
It would show if the compiler is more or less efficient than a human writing assembly code. I’ve seen GCC occasionally produce some completely unnecessary bits of assembly code. I would guess that if the compiler can produce code that runs faster than an Assembly programming expert’s code, then it can produce some pretty efficient code. In fact, I think it would be interesting to benchmark each compiler against what the human experts consider to be the most efficient hand-written assembly implementation of the code.
That’s where the guy’s on holiday to. He’s basically rubbing it in
It is funny how defensive people get when they can’t have their favorite app [insert favorite/used to use OSS application here] to always win. They will then use reversed-psychology to show us that this is just not news, that it is just normal and not worth posting in the future.
People aren’t being defensive here when they say that it’s not a surprise. This is Intel making a compiler optimized for Intel. They designed and made the friggin chips themselves and have been doing it for a while now. Why is it a surprise that the mega-corporation who made the hardware would make the best software for it?
Talk about stating the obvious.
It would show if the compiler is more or less efficient than a human writing assembly code.
I think it would be interesting to benchmark each compiler against what the human experts consider to be the most efficient hand-written assembly implementation of the code.
You’ll like this article, People have been saying compilers produce better code than humans for quite some time, however when put to the test it is clear that this is a myth.
http://www.realworldtech.com/page.cfm?ArticleID=RWT041603000942
I know it`s outdated now but some years ago it was the best (almost). Could it become in a icc and gcc killer? I would like to see a compasation of its current state against the big guys
You’ll like this article, People have been saying compilers produce better code than humans for quite some time, however when put to the test it is clear that this is a myth.
That isn’t quite what the test showed. The assembler case was never the fastest of the lot. The result, and conclusion by the author I might add, is that a smarter algorithm design can compensate for performance better than some magical compiler optimization. However the two often go hand in hand anyway. The conclusion was not that hand coded machine code is always, though not never, faster than machine code.
Furthermore, while the hand coded assembly code performed admirably it runs on one platform only. The fastest algorithm of the above will probably see a similar speed improvement on MIPS, PA-RISC, SPARC, PPC and countless other platforms.
The lesson therefore is not to code in assembly to make your algorithm as fast as possible. The lesson is to come up with an efficient algorithm in the first place.
> The lesson therefore is not to code in assembly to make
> your algorithm as fast as possible. The lesson is to come
> up with an efficient algorithm in the first place.
Efficient algorithms are of course the backbone for coding. But nothing prevents from coding in assembly _and_ using the most efficient overall structure. It’s quite amazing how many people believe that these two exclude each other.
Efficient algorithms are of course the backbone for coding. But nothing prevents from coding in assembly _and_ using the most efficient overall structure. It’s quite amazing how many people believe that these two exclude each other.
I never said the two were mutually exclusive. I was highlighting the false notion that coding in assembly will automatically make your code optimal. For long term maintainability and portability however, assembly is a disaster, and you know it.
GCC can fail with “all” optimisations turned on too. What a moot point.
Again, 10:1 or 100:1 means that ICC may take too many risks or hasn’t been tested with a broad enough set of test cases, and it breaks more often than GCC.
Just because GCC may break ocassionally doesn’t nullify the argument that Intel should do better testing, and not just use ICC for Marketing.
and tell the world how MUCH they want to run a Mac and have Photoshop AND a unix shell open on the same desktop. Jealous?
Nope. I don’t have photoshop and I don’t like Adobe, never did, really.
And unix shells are nice and all but OSX needs some work before it can compare to my Linux CLI environment. It may have less bugs and it may be able to run photoshop, but it has the same problems I have with video transcoding and costs more, lots more.
The price/performance ratio of a 2 Ghz AMD shuttle PC to a comparable Mac is almost 5:1. Jealous?