Linked by Thom Holwerda on Mon 16th Jun 2008 21:51 UTC, submitted by irbis
AMD AMD has seen a few serious setbacks lately, especially with their Barcelona server processor, but it seems as if the company is trying hard to get things back on track. The first step in solving an issue is acknowledging it exists in the first place, and AMD CEO Hector Ruiz did just that last December. "We blew it and we're very humbled by it and we learned from it and we're not going to do it again." Reseller Advocate Magazine asks, are you ready to believe him?
Permalink for comment 318720
To read all comments associated with this story, please click here.
RE[2]: dont underestimate AMD
by bert64 on Tue 17th Jun 2008 09:29 UTC in reply to "RE: dont underestimate AMD"
bert64
Member since:
2007-04-23

Riiiiiiiight,

Here on planet earth, under similar conditions (same ISA, system, OS, configuration and optimization flags) there is no way a wider CPU, with better OOO scheduler, more aggressive branch predictor, larger caches, and which is running almost 25% faster clock cycle performs wors.


It varies heavily depending on workload, different processors do different things better, for comparison i am using a Phenom 9600 (2.3ghz quad core), and a Q6600 (2.4ghz clocked to 2.3 for comparison purposes), both have 2GB DDR2/667 ram, tho the Q6600 is running in dual channel mode and the phenom is not (order f--ked up, it should have 4gb dual channel).
Both are running 64bit gentoo, compiled using gcc 4.3.1, cflags are =-O2 -fomit-frame-pointer -march=" with core2 or barcelona used as the -march appropriately.
All benchmarks are single threaded, and run with nice --19 so they can hog 1 core... Nothing else is running in the background aside from ssh (my login process).

John the ripper (single threaded) DES benchmark make linux-x86-64 using SSE2 asm:
Core2: Many salts: 2197K c/s real, 2201K c/s virtual
Phenom: Many salts: 1669K c/s real, 1669K c/s virtual

Compiling the same program using make generic (gcc optimizations, no sse2 asm) yields different results, using the default john CFLAGS:
Core2: Many salts: 1061K c/s real, 1061K c/s virtual
Phenom: Many salts: 1130K c/s real, 1130K c/s virtual

Running the synthetic flops benchmark (http://www.firenzee.com/flops.c) (compiled with -O3 -fomit-frame-pointer -march=core2/barcelona:

AMD:
Module Error RunTime MFLOPS
(usec)
1 4.0146e-13 0.0079 1769.8765
2 -1.4166e-13 0.0074 945.6464
3 4.7184e-14 0.0097 1751.3078
4 -1.2557e-13 0.0103 1457.3055
5 -1.3800e-13 0.0283 1025.4144
6 3.2380e-13 0.0176 1644.2968
7 -8.4583e-11 0.0254 471.5272
8 3.4867e-13 0.0274 1094.7969

INTEL:
Module Error RunTime MFLOPS
(usec)
1 4.0146e-13 0.0135 1038.9985
2 -1.4166e-13 0.0122 575.1838
3 4.7184e-14 0.0091 1868.4960
4 -1.2557e-13 0.0071 2118.3583
5 -1.3800e-13 0.0282 1026.7758
6 3.2380e-13 0.0138 2095.7179
7 -8.4583e-11 0.0407 294.7069
8 3.4867e-13 0.0283 1061.8881

So both processors are faster in some tests...

OpenSSL benchmarks are similar too, Intel seems to be faster at things like RC4, while AMD is faster at AES, tho the benchmarks are a bit too big to paste...

See:
http://www.firenzee.com/openssl-intel
http://www.firenzee.com/openssl-amd

I will re-run these tests when the dual channel memory for the AMD system turns up, but i doubt it will make much difference considering the nature of these benchmarks...

Ofcourse these benchmarks are a rough unscientific idea... Core2 may be faster at SSE2 code, or the asm written for john might simply be optimized for core2... Similarly with gcc, it could simply be that the architecture specific optimizations are better for phenom.

My experience has been that AMD systems generally seem quicker under load, and multi processor systems seem to outperform their Intel counterparts. The AMD system also seemed to compile the system packages faster than the Core2 system.

Of course, even if AMD cpus aren't up to scratch this time round, it was Intel's turn not so long ago with the P4 that was ridiculously inadequate. We need healthy competition between these two companies, or we'd be stuck with overheating single core P4 systems and proprietary RDRAM.

Reply Parent Score: 2