The race is on to produce four-core processors for PCs. Intel, which is readying a bevy of dual-core chips for release in systems in the next month, is already plotting a move to quad cores, which some reports have said could come as soon as early 2007. AMD has already discussed a plan to begin offering a family of four-core chips in 2007, whereas Intel has only hinted about a four- core server chip thus far.
LOL. Always Intel trying to push only the high end, while AMD seems hell bent on getting the good stuff to ordinary people.
Go AMD.
Um, where do you think the 4-core AMD chip is going to be introduced first? The high-end server market, of course. Since the Opteron, all of AMD’s cool stuff appears in the server realm first, and only later trickles down to use regular-folk.
Note of course that Sun are shipping 8 core sparc cpus now. (And each of those 8 cores support 4 threads)
“Note of course that Sun are shipping 8 core sparc cpus now. (And each of those 8 cores support 4 threads)”
Nice, yes, but two things:
1) it’s not really consumer oritented, and it’s pricing reflects it.
2) “And, instead of the much-critizised one floating point unit per processor, the Niagara II will feature one floating point unit per core. ”
From: http://www.osnews.com/comment.php?news_id=12909
1 FP unit in an 8 core processor hardly sounds impressive.
1) it’s not really consumer oritented, and it’s pricing reflects it.
I missed the bit where it said they had to be consumer
oriented. If you look at the pricing you will see that
it’s very competitive, starting from $3000
1 FP unit in an 8 core processor hardly sounds impressive.
Well, it really depends on what you find impressive.
It’s only an issue if you run FP code. You’d be
surprised how little application code on a server
needs an FP unit.
Except each of these AMD cores will be easily twice as fast as each of those SPARC cores.
Except each of these AMD cores will be easily twice as fast as each of those SPARC cores.
They’ll completely suck at some stuff, there’s no doubt about that, but that’s not what they’re designed for.
Niagara is possible for Sun because a lot of their systems run apps which are easily multithreaded and don’t require super low latency. They also don’t benefit from complex multi-issue cores so the distinct lack of issues (1 per cycle per core) is in these cases an advantage. On those sorts of apps (Java app server based stuff, database, etc) Niagara is an absolute killer, outgunning a 4 way Xeon system at under 1/4 of the power consumption.
If you’re in the market for a machine to run a business Java app or you already do and want to save on your power bill you’d be a fool not to look at a Niagara machine.
They’ll suck at workstation stuff and scientific computing stuff as well. Heck, even for integer stuff they’ll suck, it’s just the suckage will be mitigated by the sheer number of cores.
That said, my statement about the speed is based simply on a theoretical comparison of the performance. Niagra comes out at 1.2GHz. If the quad-core K8 comes out at the speed of the current dual-core Opteron (2.4GHz), each core will be at least twice as fast as a Niagra core. The only case in which Niagra would win would be a completely memory-bandwidth limited task.
If the quad-core K8 comes out at the speed of the current dual-core Opteron (2.4GHz), each core will be at least twice as fast as a Niagra core.
In theory yes, but seemingly not given the benchmarks shown so far (Java appserver stuff, TCP-H). All the systems it’s been compared against (Xeons, Itaniums, Opterons) *should* be faster but the Niagara is still winning, it’s even winning some against POWER5 systems which is pretty startling.
A lot of server stuff like this has very low IPC figures (below 1) so OOO cores are no help, a high clock doesn’t seem to help much either as it would appear cache runs out and the CPUs just run from RAM. The thread switching overhead may also be more of a problem for some processors wereas this can be hidden on Niagara.
I think the devil is in the details: the advantages of multicore designs is really in the shared memory topologies and efficiency of core – core task coordination which is built into the hardware (this may be the Niagra advantage as well other design features that are more efficient and more easily achievable on a single die design.
The previous comment about SMP is well taken and the point of my previous post. For general purpose uses, an SMP design is usually more practical and efficient than a similar one with a faster single CPU. To some extent this is also because of effectively larger memory bandwidth with multiple CPUs in most designs as well. The multicore CPUs take this concept several steps further.
Note of course that Sun are shipping 8 core sparc cpus now.
Apples and oranges. Intel’s and AMD’s cpus have bigger, out-of-order superscalar cores that deliver much higher single-thread performance than the Niagara’s back-to-basics in-order cores.
(And each of those 8 cores support 4 threads)
Intel has its 2-way “hyperthreading”. They could and should extend that to more threads per core in order to better compete with throughput-optimized designs like the Niagara. AMD could do with it as well.
Every now and again Sun pulls a rabbit out of their freaking huge R&D hat. It was the super cool ( in temperature and size ) Galaxy units and then the multi-core Niagara. The thing to watch is the trend with Sun. They did this sort of thing way back, a decade ago with the 64-bit UltraSparc processor and were the only people around running 64-bit. It took them very little time to kick out the UltraSparc II that really dominated internet based services for the next five years. Now they introduce the Niagara at 1.2 GHz to start with. You need to fully expect that there are 32-way Opteron machines in the pipeline as well as faster Niagara like processors complete with multiple FPUs and other cool features, like a fully optical bus.
Just look to the past to see the trend in a company like Sun. It is best to think of their past trends before talking about the lack of features in some new product. Also keep in mind that HP has no real R&D at all and they are stuck waiting for Intel to do something brilliant. When is the last time HP invented anything worth looking at twice? Not since they had cool RPN calculators.
I think it is time to bury Moore’s law.
Processor speed hasn’t progressed in 2 years.
But processing power hasn’t gone up in significantly.
Manufacturers are trying a lot, especially multi-core, but let’s face it, no biggies last 2 years…
Moore’s law is about the number of transistors on a single chip. Period. Of course, over the years, Moore’s law was applied to processing power, which it followed pretty closely… until recently, so it’s not EXACTLY false. Otherwise, you’re right, Wondercool : it seems to me that AMD and Intel are fresh out of ideas to increase the processing power of their chips.
There is some confusion whether Moore’s “law” is about transistor counts or processing power or both. It’s certainly not about clock rates.
Transistor counts certainly continue to go up, but they’re now being spent on multi-cores rather than just more complex pipelines or bigger caches. It’s just that multi-cores require more effort on the software side to exploit the resulting increase in processing power.
Besides, clock rates went up unusually fast after they broke through 1GHz and before they got stuck on the current level. They used to be limited by transistor switching times, whereas now they’re limited by power consumption. Intel could probably deliver a 6GHz Pentium 4 right now, but unfortunately the market isn’t too keen on expensive liquid nitrogen cooling .
That is true, I’m SURE the government has like 10 Ghz comps….. they’d need them anyways. I’m sure in the future we’re not gonna have to buy new CPUs. We’ll just download a firmware of sorts and flash it for more speeds
Wouldn’t that be cool? 😛
I do find it impressive however how a 2.8 Ghz FX-57 is equal or usually better than the Intel 3.8 EE (Benchmarking wise). As keenly pointed out, Speed doesn’t matter anymore That’s why I use a FX-57 😉
The only thing I’m disappointed in is the next gen CPUs from AMD. Apparently the FX-60, which will be 2 FX-57 cores, will be clocked slower (2.6 i think). I saw yesterday that someone compared the FX-57 and a pre-release FX-60, and the FX-57 got higher scores by a little. Maybe today’s software is still not ready for Multi-core procs? I think so 🙁
Correct me if I’m wrong please.
(edit- spelling errors )
–ZaNkY
Edited 2005-12-09 14:31
Why are you so upset that the FX-60 is clocked slower?
CLOCK != PERFORMANCE !!!
If its faster, its faster, regardless of clock, and AMD calling it the FX-60 leads me to believe the FX-59 will not beat it.
Actually, AMD decided to skip 59 and go directly to 60. AMD is not making anymore single cores FXs anymore apperantly. Just dual cores, if not more. What I’m saying is that the performance of the 60 did not live up to the 57. I’ll look up the article and post it.
–ZaNkY
The government would be better off with ten thousand 1Ghz (low power and very cool) processors than any red hot 10Ghz ones. After all the governement just needs to scan all information constantly about every individual to keep up with everyones activities and relationships; otherwise they could never maintain an effective police state to protect us all from the enemies of democracy.
That depends, if they need to do independent calculations that not influence each other for most of time and that rarely need to excange small amount of data, then grid computing is the solution.
If they need to do tasks that could be easily parallelized but need heavy and constant data excange, they will need a supercomoputer that will not only feature many CPU’s but also very fast access to shared memories.
If they need to do non parallelizable tasks (each step determine the next) they will really need “simply” a super CPU with high clock and high efficiency per cycle, and maybe a fast bus to an useful amount of local memory if the task is even memory hungry.
What we can hope for is that cost to deploy new processors goes down. There seriously needs to be more competition in the processor market.
It seems the big players at this point are Intel, AMD and IBM, with IBM currently being the most innovative of the three. At this time, more than any other before, we need to be able to look at new techniques to do things smarter, which likely DON’T include archaic instruction sets like x86 & likely also PowerPC.
Each chip architecture leapfrogs its competitors for a while. Multi-core could be a big deal for Sun. The SPARC archetecture had a lot of challenges keeping up with competitors; it took a lot of work to get the UltraSPARC out the door and the Ultra only put Sun on top for a little while and they have been falling farther and farther behind as each new generation of chip has been too little too late. One advantage Sun has is that a SPARC core can be much smaller than its competitors so they should be able to cram more of them on their chips. I’ve heard discussions of putting 40 SPARC cores on a chip. Still, I’d put my money on IBM’s Power archetecture as the best technology.
2005 has been a terrible year for Intel. Hopefully, they are able to bring a much better line up in 2006 to compete against AMD. Otherwise, it will take forever for dual-core and quad-core processors to reach prices ($100-$400) consumers will actually be willing to spend.
You know, it would be great if there was software that even took advantage of 2 cores let alone 4-8. (Besides Windows) This is getting ridiculous until software devs catch up IMHO.
To a certain extent, BeOS/Haiku/Zeta do make better use of multiple cores without extra effort from the developer to do so, when running GUI applications. However, that’s a qualified “without extra effort from the developer” because writing a BeOS GUI application with a visible window absolutely forces the application to have a minimum of 2 threads to manage: the application object with its thread of execution, and one for each window (in BeOS, a window contains one or more views, which are not treated identically to windows, as in Windows, and each view in BeOS only works in the message loop of the window it belongs to/renders on) without doing anything extra for other things. In other words, while there’s no extra effort to utilize SMP in BeOS for the developer, exactly the same amount of effort is expended to develop an application for a single processor, because there’s no choice but to have at least 2 threads for a GUI application and deal with the thread management associated with that. This does not apply to command line/console applications, though.
While BeOS makes it the default to utilize more than one thread (and processor as a result) it can’t do anything to help it be easier to write software in an SMP manner that isn’t naturally suited for that in the first place, so unless you have enough separate processing threads in the same application active at the same time, or have enough other processes with threads to keep the processor cores active, it really doesn’t matter how many cores you have: most will be quite idle. This will be true of any OS.
In addition to all of that, no matter how many cores can be kept busy, the ultimate bottleneck for throughput is a function of how much the CPU L1/L2 cache is thrashed, combined with the latency and throughput of the memory FSB. Once you reach those limits, no number of additional cores will improve performance, and you’ll be better off with fewer cores consuming less power, because even idle cores consume a certain amount of power, and thus generate waste heat. As effective as modern CPU’s can be for heating a room (I keep my computer room/bedroom heated with my old dual p3-450 system by itself) they aren’t efficient heaters in terms of BTU/kw hour…
Jonathan Thompson
Only HPC and very high servers will get any use out of this.
We need higher mem and I/O speeds.
DDR2 will help, with AMD’s mem controler, its going to be very very nice.
Now with PCIe we can bypass all the old southbridge slow speeds. And lets just hope that they can tie SATA or any other serial I/O’s into PCIe.
We are just now getting drivers that can do multi core, 4x is pusshing it. Theres only a few games/apps out there that can do multi core right or at all.
That and the programmers out there really need to get with the times, and start really using all that cpu/video power.
You’d actually be surprised how little bandwidth the Opteron architecture can get away with. In most tests, for example, a dual-core Opteron is just as fast as a dual single-core Opteron, even though the former has only half the bandwidth. The Opteron is a much more latency-sensitive design, so DDR2 isn’t going to help there, at least not until the DDR2-800+ designs start becoming more common.
And SATA has been on dedicated busses for quite awhile now.
The T1 can have four thread contexts per core, but it only executes one thread per core concurrently. Intel’s SMT implementation actually schedules multiple threads across the functional units of their superscalar processors concurrently. The T1’s memory controller is more interesting than stuffing a lot of incredibly simple cores into one processor. The actual utilization of the T1’s design is a pretty limited-breadth ordeal compared to the AMD/Intel attempts to maximize ILP and improve TLP.
Browser: Links (2.1pre11; Linux 2.6.10-gentoo-r6 i686; 80×40)
“Browser: Links (2.1pre11; Linux 2.6.10-gentoo-r6 i686; 80×40)”
Wow, you are f–king cool.
This is just my opinion but; Shouldn’t compiler developer figure out a way to convert secuential code into multithreaded code without the average programmer having to worry about that?
edit:fixed typos
Edited 2005-12-09 15:59
The problem is, it’s not really possible with current technology. There is a limited ability for the compiler to find parallelism within a sequential program, but its hard to get good technology that can seperate entire threads of execution. Stuff like OpenMP directives help, but those are really just the programmer specifying threading in a different way.
Agreed.
In fact, I think the future of parallel programming is not with compilers that convert sequential code into parallel code. But with easier to use languages for parallel programs.
While there has been a lot of work done in this area before, I believe that we are going to now see even more work done since desktop programs will HAVE to be multithreaded.
We will probably need to make it `easy’ for the `average programmer’ to create programs that can scale across a large number of cores as time progresses. Fortunately, we have some time yet.
Shouldn’t compiler developer figure out a way to convert secuential code into multithreaded code without the average programmer having to worry about that?
That would be nice, but it’s not gonna happen any time soon.
The problem is that parallelising is a creative task, i.e. except for certain basic things like constant-count “for” loops you need a clever idea or two to restructure your sequential program (and its data) and turn it into a faster parallel one.
But so far computers are not terribly good at having ideas. And with parallelisation the problem space is too big for brute force. Something like genetic algorithms or neural networks might eventually provide a solution.
Why put more cores into the chip??
you are really limited by the off-chip bandwidth
and latencies.. Now all those cores will compete
for the system bus..
We still have these problems:
1) memory gap – latencies, latencies, latencies
2) lack of parallelism in many programs
3) disk drives speeds are still magnitudes slower.
HDD are very important because that’s where
all your data lives, and it’s nonvolatile.
We still have these problems:
1) memory gap – latencies, latencies, latencies
2) lack of parallelism in many programs
3) disk drives speeds are still magnitudes slower.
HDD are very important because that’s where
all your data lives, and it’s nonvolatile.
1) Overcome this by having a dedicated memory controller per core. In that way each core will have it’s own path to the memory. (just like existing MP Opteron systems do)
2) So? Most programs that need more crunching power than exiting CPUs are easily ported to multi-core. Think about it. Fluid simulation, raytracing, particle simulation, database servers, web servers, etc. All can use as many cores as you ccan throw at them.
3) Use more memory, or a faster HD array. I was just seeing that some drives are up to 76MB/sec. If you really need that much speed, make a array of 10/15 of these for a perfomance of close to 1GB/sec. Even editing film resolution video rarely needs that much bandwidth.
Personaly, I welcome our new multi-core CPUs.
Not only that, but Windows will balance the load to the processers per application.
So you literally can burn a DVD while playing an intensive video game (DVD burning process will be on core 1, game will be on core 2).
Why put more cores into the chip??
They’ve no other choice. Chips used to decrease power consumption by a large amount each generation, due to physical limits that no longer happens to the same degree.
There’s no longer any other real way of boosting performance with current designs. Boosting the clock on a complex chip makes it ever hotter and they’ve hit the limit of what can be reasonably cooled.
So, you can either add more cores at roughly the same clock rate, or simplify the cores and add more cores. Pushing the clock rate can still be done but it’s very difficult, only IBM are trying this but they’ve had to simplify the architecture in order to do this.
We still have these problems:
1) memory gap – latencies, latencies, latencies
It’s worse than that, latency is increasing.
2) lack of parallelism in many programs
…and the number of people who know how to parallelise is very limited.
3) disk drives speeds are still magnitudes slower.
HDD are very important because that’s where
all your data lives, and it’s nonvolatile.
There is hope there, Flash is catching up fast in $ per byte. Other types of RAM (FRAM, MRAM) are on the way which
—
The biggest problem is one nobody is talking about: Amdahl’s law.
Doubling your cores does *not* double your power.
I can see dual quad core machines being used for heavy heavy video and graphics work. I can see people running a couple distributed computing projects on 2 of the 8 cores while there is some video rendering goin going on on say 4 of the cores and the remaining 2 used to keep the OS running and someone gaming at the same time…technically this is possible with absolutely no lag right if the underlying OS is running a highly powerful scheduler…and the softwares being used are multithreaded. Problem is Linux supports this stuff the best from my limited experience not Windows and most people use Windows cause there are games to run and commercial non-linear ediotrs as well. We do not know yet what Vista will provide for us. That is a big problem.
Another problem I see as other learned members here have already stated and that is the fact that with all these cores how are they going to keep feeding the cores with data? Hypertransport 3.0? or even 4.0? Or maybe each core having its own bus to access data? More complex memory controller? Very large caches I am assuming…like at least 2 mb for each core. So is this where DDR2 is going to come into its own at plus 900 speeds?
And then add to that the list of problems that the anon dude has posted and I am torn between wow this is awesome! 4 cores! Or wow this is stupid! 4 cores!
This great, because the more they push toward 4 CPU’s the faster the dual core chips will drop in price. Sure there is little to no software that currently supports 64 bit adressing let alone dual core chipsets, but that’s irrelivant, because at some point the ISV will find a use for all these cores and by that point, the chips will be in the hands of the little guy. 4 units will be targeted at the upper end of the spectrum, but when you combine broadband networks with dual core chipsets, and Harware based acceleration for graphics and HD content, what you end up with is a computing experience where video conferencing while writing software, and playing games all at the same becomes a reality. God bless the never ending technology turnover wars, because in a years time, imaginge the type of PC you’ll be able to build for $1000 dollars. Thank you AMD for kicking the crap out of Intel and making the computing world a better place for us all! 😉
why not skip the really old things in the x86 architecture and get rid of real mode and use protected mode from the beginning but with bios calls directly to hardware. And skip the A20 hacks, then when we are at it how about make things as they should have been, like using MMX instructions and floating point at the same time with no collisions with floating point registers etc. etc.
And for the love of something skip DRM and all that shit, those of us who knows about it don’t want it. It is like the companies is so afraid that they not even trust the users without them having control of everything the user do, thus not leaving the freedom to the end user as it should be.
Intel should have skipped the old from 286 and below when they did 386, why oh why didn’t they do that.
I wonder if these multi-core procs will support the VT extensions (Vanderpool and Pacifica)?
The next generation Pentium 4 (Cedar Mill) is supposed to have the VT extensions, so I should presume that any multi-core Pentium X would also have them. Pacifica is supposed to come out sometime next year and, like most of AMD’s other extensions, I would imagine that it will be introduced across all the lines (like SSE3 was put on the Athlon 64’s with the Venice/San Deigo core and on the X2 line at the same time. http://en.wikipedia.org/wiki/Virtualization_Technology has some useful information on the subject.
>it will take forever for dual-core and quad-core processors to reach prices ($100-$400) consumers will actually be willing to spend.<
Check out http://www.pricewatch.com, I see numerous Athlon X2 3800’s and a few X2 4200’s for under 400 bucks (shipping included).
on a side note….. there are even a few X2 3800 mobo-cpu combos for under 400 bucks (shipping included)
I managed to pick up an X2 3800+ processor and an nForce4 motherboard for $279 at Fry’s on Black Friday.
Tell me THAT isn’t a steal!
I was looking at a relatively low-end prebuilt computer the other day (can’t remember the URL right off hand) it had a dual-core processor and if I remember correctly it was between $800 and $900. (I consider that to be relatively low-end, since I’m used to $2000 – $4000 computers.)
While that isn’t really REALLY cheap. That’s still awfully good for a prebuilt computer.
You really want performence?
32MB or 64MB of this: http://www.xbitlabs.com/news/memory/display/20051207213422.html
As an L3 on chip. Intel would $h1t their pants.
They need to bring back L3 for DDR2’s lag.
All of thise talk about application support for dual-core is nonsense.
$ ps aux | wc -l
115
You only need 2 active threads/processes to see the benefits of dual-core CPUs.
Most of those processes are asleep. A better figure is:
$uptime
0:05 up 11:56, 2 users, load averages: 0.19 0.08 0.03
Note the loadavg of less than 1.0.
I agree that most of the processes are asleep. The point I am trying to make is that if you are running one resource hungry process, there are still a LOT of other processes running on the system and each one of them requires CPU time at some point or another.
> I agree that most of the processes are asleep. The
> point I am trying to make is that if you are running
> one resource hungry process, there are still a LOT of
> other processes running on the system and each one of
> them requires CPU time at some point or another.
Yes, at some point or another. Let’s say you have one resource hungry process that keeps one cpu busy, and a bunch of little processes that keep the second cpu busy 1/10 of the time. If the big process properly supported multiprocessing, both cpus would be busy all the time (minus communication overhead, which approaches zero if the multiprocessing support is “done right”).
– Morin
I agree, and will push the point further. Obviously, the era of routinely coded intrisically multithreaded applications has not yet arrived. Yet even with conventional applications coded for single CPU environments, there are very clear performance improvements and advantages to a garden variety SMP environment which extends also to the newer multicore CPUs.
I’ve been running this sort of thing on my desktop for years, long before it was routinely fashionable. I’ve found that a machine running say two SMP 1GHz CPUs is at least as subjectively fast as an otherwise identical system running a single 2GHz CPU of the same family and generation. Further, I’ve usually spent no more, and ofter less for the SMP system using this stategy.
For most desktop applications, the feel of the system speed is based on the latency at the mouse-keyboard-screen to a user action. Once i86 CPUs passed 700MHz point, latency for most desktop users became a faily insignificant point. Most users see the latency increase and the system feel sluggish then several applications are competing for CPU cycles concurrently. With an SMP environment, one applications can continue to feel responsive on one CPU while another compute intensive application fully consumes the resources of the second CPU in the background (e.g. rasterization of a vector graphic file for printing). One CPU even at twice the clock speed would feel substantially slower.
For example, my main system on my desktop is an old Sun Ultra 60 with two 450 MHz 64bit SPARC CPUs, 1.5 GHz memory, and dual video cards on the UPA dedicated video bus in dual head config. It is an aging machine by any definition, but it does not feel slow, and remains very useful as a desktop. Aside from the SMP advantages, the video and memory buses are fast and wide. The CPU speed ultimately determines the absolute system throughput which by today’s standards are very modest, but for routine desktop tasks, this is not apparent to the user.
Most of the “speed” issues were previously with realtime graphical ( e.g. gaming ) applications that often were more bound by the memory bandwidth and speed of graphics subsystems than the system CPU. Now, with very high memory and graphics bus bandwidth and very fast graphics subsystems, the CPU is usually the determinant of system speed for these applications. Until the gaming programmers and others what use system resources in “Von Neumann” single thread style are developing for multithread multiCPU systems, systems with single system CPUs at maximum clock speed are going to provide them the best performance.
Wouldn’t it make more sense to push multi-processor machines instead of putting multiple “cores” into one CPU? Or am I totally mistaken here?