Home > AMD > AMD Announces 8-Core Bulldozer CPU AMD Announces 8-Core Bulldozer CPU Guest post by fran 2010-08-24 AMD 33 Comments “You can’t say that AMD is ever boring. The company says its next-generation Bulldozer CPU core will take a unique approach to computing that goes beyond Hyper-Threading, which some believe could offer phenomenal performance.” 33 Comments 2010-08-24 11:43 pm Tuishimi I was seriously getting ready to purchase a new AM3 socket motherboard with a 1090T CPU simply because in previous posts about bulldozer it was intimated that it would be an AM3 socket… so that people could keep their motherboards designed for the more recent AMD CPUs. I am a little perturbed and relieved… I am glad I did not pull the trigger. So now I’ll just be patient and wait for mobos to be delivered in anticipation of bulldozer. 2010-08-25 12:52 am looncraz You and me both! Guess I’ll still be able to do an inch-by-inch upgrade, though… Buying a RAM and a MOBO, then the CPU later. That should make it easier to get into the bulldozer game, should the promises be realized. –The loon 2010-08-25 1:49 am Tuishimi Just be careful… the new socket for it will be AM3+. I wonder how soon it will arrive. I assume the mobo manufacturers have the specs for it if they are making an announcement about it. 2010-08-25 8:53 am Kivada From what I’ve been reading on HardOCP it’ll be the same deal as with the Phenom and the AM2+ sockets, just that you will lose the extra memory bandwidth and some of the power optimizations so overclocking will be inhibited. I’d say that an 8 core/4 module chip would be the most you would want on a dual channel board anyway, even with DDR3 2Ghz CAS 7-9-7-24 it would be constrained by most of the readership here with virtual machines, protein folding and video editing. While I’m hoping that they can at least level the playing field with Intel AMD really needs to get on top of marketing and get the ball rolling on OpenCL. Right now AMD is the only game in town with the complete set of CPU, GPU and chipset and thus have the option of doing some very interesting things. Its that or we can start praying that Nvidia and VIA merge so we can have a 3rd option in the x86 market. Though it would likely take the resulting company 2 years to come out with a solid CPU and chipset combo for the mid range, but it’d take longer to actually compete at the high end. Edited 2010-08-25 08:54 UTC 2010-08-25 3:18 pm Tuishimi The bad news is that it won’t be compatible with existing AM3 boards. Instead, AMD says it will introduce a new AM3+ socket. These sockets will be backward compatible with older chips so you could drop a Phenom II X6 in it. They are saying the new socket will be “backward compatible” but that bulldozer NEEDS it to work. And I so hope this CPU series is solid. I don’t need it to burn the i7s, I’d like to see it perform better than the 930 and I will be happy. The idea of Nvidia and VIA is nice… It would be great to have some real competition going again to give Intel something to sweat about. 2010-08-26 4:22 am bnolsen You guys over estimate the desktop market. It’s basically saturated, purchases are almost entirely replacements, not new machines. Profit in this sector is very limited…the desktop is commodity. nvidia taking on amd and intel would likely be suicide, especially with how weak via really is. The “new war” is in portable computing devices, ala smart phones. It’s a very different market with more room for innovation. 2010-08-25 4:43 am Brendan Hi, With shared floating point execution units, shared fetch and decode stages, and shared L2 cache; I’m wondering if Bulldozer could be more accurately described as “4-core with extra stuff” rather than “8-core with shared stuff”. I’m also wondering how Bulldozer compares (for both performance and power consumption) to a 4-core Nehalem with hyper-threading. -Brendan 2010-08-25 5:39 am FishB8 This article might better answer that question for ya: http://arstechnica.com/business/news/2010/08/evolution-not-revoluti… 2010-08-25 10:53 am Stratoukos I’m wondering if Bulldozer could be more accurately described as “4-core with extra stuff” rather than “8-core with shared stuff”. I don’t know a lot about hardware, but this is kinda the point of Bulldozer. Instead of adding more cores, they add parts of a core (if this makes sense). Intel’s hyperthreading is based on the same concept. Each core is still a single core, but with two sets of registers, one for each virtual core. So two threads can be loaded simultaneously and when the first one waits for resources the core can process the second. Bulldozer is based on the same concept, only it adds more stuff for each virtual core, namely the integer units. So, the parts of the two threads that can be processed by the integer units will be processed simultaneously. IMO the correct way to describe this is 4 physical cores, 8 virtual cores just as Intel’s hyperthreading. AMD claims that in the real world a HT core works like 1.3 cores, while a Bulldozer core works like 1.8 cores. 2010-08-25 2:44 pm uray …Each core is still a single core, but with two sets of registers, one for each virtual core.. hyperthreading is not double the register, its doubling the path or way to the core Edited 2010-08-25 14:45 UTC 2010-08-25 3:27 pm Stratoukos  What would be the benefit of Hyperthreading if when the first thread stopped executing to fetch stuff from the cache/RAM, the second thread would also need to fetch stuff? In order for HT to be effective, the second thread must be ready to execute when the first one blocks. If you also consider that HT spends half its time processing the first thread and half its time processing the second, not having a separate set of registers for each thread would be very inefficient. 2010-08-27 9:49 pm phoenix Which is why an OS scheduler needs to be hyperthreading-aware in order to make the most of the (secondary) logical core. For example, Windows XP’s scheduler has no concept of hyperthreading. It sees the two logical cores are 2 complete CPUs, and will try to use them as such, believing them to have 2 sets of registers, 2 sets of L1, 2 sets of integer units, 2 sets of FPU, etc. Which can actually slow things down. Intel’s hyperthreading is really nothing more than a method to allow instructions from a second thread to be processed while the CPU waits for data to be fetched/written to RAM for the first thread. There’s really only 1 thread active at a time. AMD’s method allows it to run 2 full threads at the same time, each with their own integer units and instruction caches. Then there’s the shared FPU, which means (in theory) one can schedule 3 full threads on the CPU. There’s a great graphic on one of the articles on Bulldozer that show this. There’s a red arrow showing thread 1 and a grey arrow showing thread 2 both going through the CPU at the same time; and the Intel version that’s really 1 arrow striped red/grey showing how HT works. 2010-08-25 6:01 am kaiwai I’m assuming that the bulldozer they’re talking about in the article is in relation to CPU for the desktop rather then it being for the laptop. I’d love to see what the have planned for the laptop though because unfortunately what I’ve seen from many laptop vendors is a reluctance to put AMD processors in their high end range which puts me off having to go for the cheap end of town and sacrifice features or quality because of it. I also wonder whether the sightings of Apple employee’s at AMD has to do with Bulldozer and whether they’re considering it as an alternative to Intel in the long term given the current stale mate between nVidia and Intel. When one couples the ATI video card which has better power/performance ratio than the current crop of nVidia GPU’s then add on top of that the nVidia/Intel fiasco, I wonder what the future has in store. 2010-08-25 9:04 am Kivada Well the Radeon HD6 series will be out this year, at the very least maybe the Mac Pro will receive a high end card while it’s STILL high end. That and Apple has been looking at OpenCL, they will likely be integrating it in all their future software to get whatever boosts they can to gloat over Adobe. There is also the potential that the Opteron version of these could end up in the XServe lineup since it looks like these should scream in allot of 3d modeling jobs. 2010-08-26 2:45 pm kaiwai Well the Radeon HD6 series will be out this year, at the very least maybe the Mac Pro will receive a high end card while it’s STILL high end. That and Apple has been looking at OpenCL, they will likely be integrating it in all their future software to get whatever boosts they can to gloat over Adobe. There is also the potential that the Opteron version of these could end up in the XServe lineup since it looks like these should scream in allot of 3d modeling jobs. True; it’ll be interesting to see how things work out in the long run especially when on one hand there is the Intel processor with the anemic GPU or the AMD CPU with a robust GPU – its going to be a difficult thing for Apple to resist with such a tempting carrot dangling in front of them. For me it’ll be interesting to see next year how it lines up with what Microsoft has planned for Windows 8 and how that compares to Mac OS X 10.7 – what the future will be like given that the application vendors are very much dependent upon whether the underlying operating system exposes those features in easy to use API’s to be taken advantage of by developers. 2010-08-27 9:50 pm phoenix That would be Bobcat, and is detailed in various places, usually one link down from the link to the Bulldozer info. 2010-08-25 2:43 pm deathshadow Won’t be any faster than a i7 870 or 930… and on many tasks still slower than a i5 750. Just like Thuban. Sad when multi-core is becoming a gimmick just like clock speeds used to be. Always good for a laugh when the top of AMD’s product line barely delivers the performance of the BOTTOM of the i5/i7 line. Edited 2010-08-25 14:49 UTC 2010-08-25 3:02 pm No it isnt It’s only “good for a laugh” if you’re an Intel stock holder or an idiot fanboy. AMD provides competitive performance for the money, and for most of us, that’s the only thing that actually matters. 2010-08-25 6:26 pm FunkyELF Couldn’t agree more. Performance per Watt and performance per Dollar bill are a lot more important. 2010-08-26 3:44 am deathshadow AMD provides competitive performance for the money, and for most of us, that’s the only thing that actually matters. Given that the 1090T streets the same as a i7 870, and that the i7 870 is 38 watts less power at max and 8 watts less power at idle… and benchmarks faster in most tests… Sorry but I think your definition of value and mine differs. I actually used to be an AMD only guy, but since core2 dropped AMD has been in permanent catchup mode… pathetically so. 1090T is no match for a 870 while priced the same, 1055T is a total joke compared to the i7 750, again priced roughly the same, and on many bench scores the i7 750 beats out the 1090!!! Hell, my four year old Q6700 gives everything in the AMD lineup prior to the hexa-core a run for their money… and in some (but not all) tasks is still FASTER than Thuban! (though christ it’s power hungry) Even the next step down – it’s sad when 99% of the X4’s and X3’s are pwned by dual core Wolfdales. AMD is supposed to be the best buy for low end, but if I’m given a choice between a 95W 2.8ghz Propus quad and a 65W 3.2ghz Wolfdale, I’m going Intel — The same or faster on most every benchmark (admittedly on some tests line winzip the wolfdale beats out even it’s intel quad core brethren), roughly the same street price of around $100, and lower operating costs across the board thanks to lower power use. Drop down into the real bargain basement and it’s the same deal. I’ll stack a E3300 Wolfdale against a triple-core Phenom 8600, 2.8 Regor X2, or Toliman 8250E any day… Street of around $50 for all of the above, and even the core advantage doesn’t even deliver on simple tests like handbrake compared to the ‘bargain basement’ wolfdale… and that’s before we talk operating cost where all the AMD’s pre-Regor are 95W, only the Regor equalling the intel’s 65W… and if you are actually fretting over performance per watt, AMD can’t even TOUCH the $40 1.8ghz 35W Conroe-L, the closest being the somewhat cheaper to buy but more expensive to operate $35 2.8ghz 45W Sempron 140… that despite it’s 900mhz faster clock still comes in SLOWER in testing than the Celery. I keep waiting for AMD to once again take the price performance lead again – and for the better part of four to five years it’s been a bunch of nothing. 2010-08-26 5:09 am looncraz You need to account for the additional expense for the motherboard, and triple channel vs dual channel memory, and upgrades. AMD got my money this time around because it was all I could really afford to piece-meal. The cheapest i7 mobo was more than my budget prior to selling my existing parts. The cost of entry into the AMD world was a grand total of $220. Intel needed that just for the motherboard.. then again for the memory… then again for the CPU. My current setup: AsRock A790GXH/128M Athlon X3 720 BE, unlocked 4th core – @ 3.2 GHz 4GB OCZ Reaper DDR2-1066 ( 2×2 ) ATI Radeon HD 5770, 1GB I can play Crysis, virtually maxed at 1280×1024. I scale back the textures a notch, though. All other games play smooth as can be, and I don’t have any tearing or jerking. So I’m happy with all that! I’m no fanboy – I just buy the best I can afford. AMD was – and still remains – the better buy. Almost entirely because of more affordable motherboards, and cheaper memory. That extra $100 can make or break a sale. Not to mention the cheapest builds, where the sale is even easier with the promise of simple future upgrades – my board is not fancy, buy I would have no issues throwing in the 1090T and having six real cores… which I could use… for benchmarks… hehe. –The loon 2010-08-26 3:24 pm deathshadow Socket 1156 only takes dual channel… so i5 750 and i7 870 both don’t cost any more or less than AMD… though if you wanted to cut that corner and drop down to DDR2, that’s what socket 750 is still hanging around for. I find your $220 number a bit dubious, let’s add up your parts (excluding video card) using NewEgg’s current prices. 4gb OCZ Reaper DDR2-1066 — $91 http://www.newegg.com/Product/Product.aspx?Item=N82E16820227289 Asrock A790GXH/128M — product no longer sold, but since it’s just a dual x16 board the A875 is ‘close enough’ at $60 http://www.newegg.com/Product/Product.aspx?Item=N82E16813157180 Not familiar with the 720 BE… also not available on NE, but it’s a 3.2 x3 so the 3.1 Rana is probably close enough at $80 http://www.newegg.com/Product/Product.aspx?Item=N82E16819103872 Which adds up to $245 delivered… Ok, a few months ago when your exact parts were still for sale I guess $220 is believable… But if I had $220-240 burning a hole in my pocket, I’d have gone with: E5400 Wolfdale – $80 — despite the lower clock speed and only being dual core, it STILL pwns “heka” and “regor” on the majority of benchmarks. http://www.newegg.com/Product/Product.aspx?Item=N82E16819116076 ASrock P45 Deluxe – $70 http://www.newegg.com/Product/Product.aspx?Item=N82E16813157164 Which with that same RAM ends up $240, same as my above example build… Not that I’d use that RAM since I’ve had WAY too many dead OCZ modules of late. Since it’s only 775 no reason to waste 1066 on it, and A-Data DDR2/800 is rock solid reliable http://www.newegg.com/Product/Product.aspx?Item=N82E16820211188 Coming to $235 delivered. … and of course you have long term expense — since a E5400 is 65 watts and your heka 720 is 95 watts. So much for power to performance. Of course, shell out $40 more for a Q8200, and you’ve just blown AMD’s entire quad core/lower product line out of the water on price/performance. Another pathetic thing about AMD’s offerings the past four years, even their quads get their backsides kicked by Intel’s previous generation duals… and once you go quad it’s not even a contest. Really sad since back when the Athlons first dropped I was strictly AMD since a 1ghz thunderbird was equal to a 1.5ghz P4, a 1.8ghz Barton was equal to a 2.5ghz P4, and a 2.4ghz A64 was equal to a 4ghz P4 – but since Intel got off their asses and dropped Core 2 on us, AMD has been a shadow of it’s former self losing on clocks to tasks done, on power to performance, on heat generated (they even seem to make more heat at the same wattage, which shouldn’t even be possible), and on price to performance. 2010-08-28 6:03 am looncraz I needed core count. An E5400 would have been a downgrade. The slowest Core 2 Quads of the time were around $160, minimum, for the performance. The memory was an absolute steal, IIRC, at about $50. The MOBO was $80, CPU $129. On the intel side I was stuck with no future upgrade path if I were to stick with the s750, and none if I went with the newer, faster, and much more expensive i5/7 lines. I can easily get more power out of my AMD system, without doing anything crazy. My motherboard will work with anything from a crappy old AM2 Sempron to the new Hexa-Core 1090T. That is why I really bought AMD this time. Besides, my setup is equal to or better than a Q9550 in any given benchmark, and much better in multimedia performance – in most cases. A Q9550 at the time was around $300. Granted, I would have overclocked that and had more performance, still. The extra price and the non-existent upgrade path was an issue. I am trying to find a way, now, to get into the six-core, but I may just wait for the Bulldozer. Core count is important to me as I have extremely parallel workloads. The three cores of the X3 720 was better than the two cores of my E7200.. even though the overall improvement in performance was virtually nil without overclocking. A bit of a sideways upgrade because I needed what the platform delivered – a way to the future, cheap. Of course, I’m not too happy about needing another 400MHz to match Intel’s Core 2 lineup, but I got that at stock voltage without so much a trying… heck, I can get 3.4 just as easy, I just don’t care. The unlocked fourth core was just gravy 🙂 –The loon 2010-08-25 6:44 pm FunkyELF So… This being OSNews… I’m curious. So now we have a new processor. What needs to be updated for it to work? Operating systems? Compilers? Or is all of it taken care of at the chipset / bios level where the processors just appear as 8 physical processors to the operating system and the compiler doesn’t doesn’t care either? 2010-08-25 9:56 pm SReilly I’m assuming the following, although an educated guess may be a better term, but as it’s an SMT design, just like Intel’s Hyper Threading, the OS will need to be optimized to use the new architecture effectively. When HT was launched, Intel advised people to deactivate HT if the operating system used wasn’t optimized as this could lead to a decrease in performance compared to a non HT enabled processor. The question is if Bulldozer’s design is close enough to HT for there to be no major difference from the OS’s side and can therefore use the same optimization code. Somehow I doubt it be here’s hoping. 2010-08-26 4:40 am Brendan Hi, For an “optimal scheduler”, you’d need to take into account hyper-threading (performance hit when the other logical CPU in the core is busy) and AMD’s SMT (probably a much smaller performance hit when the other core in the module is busy, unless both tasks are doing floating point instructions), and “warm cache” (try to run tasks on CPUs that may still have that task’s code/data in it’s cache, while taking into account knowledge of which caches are shared by which CPUs), and NUMA (try very hard to run tasks on NUMA domains that are “close” to the memory the task has allocated), and power management (try to distribute the heat evenly so one CPU doesn’t get too hot while others are cool, while also trying not to take CPUs out of sleep states when it’s avoidable), and things like Turbo-boost (better to use 2 CPUs in 2 different chips than to use 2 CPUs in the same chip, to maximise the turbo-boost). Maximising scalability within the scheduler is also a major challenge – avoiding lock contention (e.g. maybe per-CPU queues) while also providing effective load balancing(e.g. maybe not per-CPU queues). Then there’s the required functionality (CPU affinity, some sort of task priorities, etc) and extra functionality (hot-plug CPUs?). Lastly, you’d also want to do all of this with relatively low overhead. For example, you don’t want to spend 500 us making an “optimal” decision if you need to make a decision 1000 times per second (on average). Basically what I’m getting at is that “optimal” is extremely complex; and because of the complexity involved the schedulers in most OSs are nowhere near “optimal”; and regardless of whether or not there’s special optimisations for Bulldozer they’d still be nowhere near “optimal”… 🙂 – Brendan 2010-08-28 10:06 am cerbie I would think that if you schedule them just like hyperthreaded cores, you’d be 90% of the way to an ideal situation. Then kernel hackers can fine tune it over a few versions, to eek the most out in normal use (that extra 1-3%), and take care of corner cases, where HT scheduling may be bad for it. I would be quite surprised if took Linux more than six months to make excellent use of them, yet have them work fine out of the gate. With MS, it will probably be a normal update, and then specific tweaks moved into a service pack. 2010-08-25 7:12 pm FunkyELF I’ve been looking at building a Phenom II X6 machine for the last month or so. Now I’m not sure if I should do it or wait for this. They’re claiming a bulldozed core will be like 1.8 cores which might be optimistic. We know that there are cases where it might perform at the full 1.8 giving you the performance of 7.2 cores but there will be other times where it’ll perform at 1.0 giving the performance of a quad core. It looks like the situations where you don’t get much improvement would be floating point operations like photo editing, audio / video encoding. Am I right? I mean…. who does integer number crunching? I’m leaning on getting the full 6 cores because I know it’ll be 6.0 on the low end and 6.0 on the high end. The bulldozer cores would need to sustain an average improvement of 1.5 to equal a full 6. The bad thing is that I won’t be able to upgrade later on. Side note: I don’t like AMD (or is it just OSnews / gizmodo) calling it an 8-core. I searched the ars article for the number 8 and it only showed up in 1.8 and x86 so they’re doing it right. 2010-08-25 8:55 pm boldingd You’ll see the higher performance multiplier with jobs that parallelize well — which are the same types of jobs that would perform well on a machine with six proper cores. I expect that any job where you wouldn’t see a large performance gain on the Bulldozer chip is a job that wouldn’t see a large performance gain from having 6 fully discrete CPU’s either. My guess would be, therefore, “jobs that exploit parallelization well will run better on the Bulldozer, while jobs that don’t will run equally poorly on either processor.” Emphasis on “guess”. Edited 2010-08-25 20:55 UTC 2010-08-28 10:10 am cerbie The 1.8 is based on sharing resources. It should actually more along the lines of that being the worst-case scenario, rather than best-case. Low IPC tasks should be able to give you just as much as full cores. Now, how well that will compare with SB’s follow-on: ??? As far number crunching, why would it be worse? Each Phenom II core has a single 128-bit wide FPU, and each BD module has two of them, and either core can use however much it can get. So, in the worst case, it should perform at least as well as a Phenom II with as many cores, and have the potential to perform far better. 2010-08-25 8:53 pm fithisux I am more interested to see GPU/CPU on the same package with diferent OPEN instruction sets. I the CPU is a Bulldozer it is a welcome addition. I also would like to see offerings with 8/16/32 In-Order CPUs (bigger Atoms) coupled with 8/16/32 GPU cores as a heterogeneous system. Then OpenCL/OpenGL stacks could be open source and more widespread (and Vesa could improve its spec for 2D acceleration only). 2010-08-26 4:41 am looncraz Okay, what I’m reading @ ArsTechnica and my understanding of the design don’t seem to mesh well… ArsTechnica describes each module as little more than a more single spiffed up hyper-threading core. Problem is that virtually all hardware existing with two unique cores is still there… except for a single floating point scheduler, fetch, and decode. The structure is more similar to the Core 2 Duo, with its shared L2 cache than a hype-threading core. Each Bull-Dozer Core has: 2x Integer Units 2x Integer Scheduling Units (for re-ordering??) 2x 128-bit Floating Point Units —-OR 1x 256-bit FPU, however you look at it 2x L1 Data Caches 2x Sets of Instruction Pipelines 1x (Pre-)Fetch Unit 1x x86 Decode Unit 1x Floating Point Scheduler 1x L2 Cache The greatest performance advantages come from the following sources: Shared L2 between two execution pipelines, Flexible FPU – single 256-bit ( or dual 128-bit ), probably the use of Turbo Core to improve single-threaded performance quite noticeably ( under-clock one, perhaps, to get a bit more out of the other integer core ). I have to say this all sounds much more like two tightly coupled cores, rather than glorified hyper-threading. I would expect such a beast to run just a bit faster than the Core 2 Duo, clock per clock, if all else is equal with the phenom II – which is likely. This should mean parity with most of the i7 lineup, but with better FPU performance. When AMD says 1.8 performance of one core, I can only help but wonder what is causing the overhead. Certainly the fetch & decode will need to be lightning fast in order to pull double-duty, as will the FPU scheduler… perhaps AMD’s designs for these units aren’t fully up to the task of removing the overhead, and they know it… or just maybe they want to underplay the performance. I’d say a 15% bump per clock should be obtainable without much fuss. I can’t wait until we get real hardware! –The loon 2010-08-27 10:00 pm phoenix I have to say this all sounds much more like two tightly coupled cores, rather than glorified hyper-threading. Exactly. Where an Intel CPU with Hyperthreading actually only runs 1 thread at a time (executing instructions from the second thread while the first thread is stalled), an AMD CPU can run two threads simultaneously (possibly three with a pure-FPU thread). I would expect such a beast to run just a bit faster than the Core 2 Duo, clock per clock, if all else is equal with the phenom II – which is likely. That’s the plan. This should mean parity with most of the i7 lineup, but with better FPU performance. I doubt parity, but if it’s within 1 or 2 steps of the top i7, then it’s good enough. Especially considering *every* CPU includes all the same features (virtualisation, IOMMU, etc), making it a much better bargain. And a lot easier to decide which CPU to pick (number of cores and frequency are all that change).