ARM unveils multi-processor core with Linux SMP support

Submitted by LinuxDevices Guy 2004-05-18 Hardware 39 Comments

ARM Ltd. will unveil a unique multi-processor core technology, capable running Linux SMP, at this week’s Embedded Processor Forum in San Jose, Calif. The “synthesizable multiprocessor” core — a first for ARM — is the result of a partnership with NEC Electronics announced last October, and is based on ARM’s ARMv6 architecture. ARM’s new “MPCore” multiprocessor core can be configured to contain between one and four processors delivering up to 2600 Dhrystone MIPS of aggregate performance, based on clock rates between 335 and 550 MHz.

About The Author

Eugenia Loli

Ex-programmer, ex-editor in chief at OSNews.com, now a visual artist/filmmaker.

Follow me on Twitter @EugeniaLoli

39 Comments

2004-05-18 2:43 am
Anonymous
If they made a desktop ATX board for the thing I’d be there. 🙂 Could be my new desktop…hah, cheap/lower power quad SMP? Very very cool!
2004-05-18 2:49 am
Anonymous
yeah,
if the industry transitions to Linux, why not ditch x86 as well, for low-power high performance RISC?
2004-05-18 2:54 am
Anonymous
Why the eagerness to ditch x86? You wanna trade fast, cheap, commodity CPUs for what, low-volume, relatively-expensive PowerPC chips?
2004-05-18 3:08 am
Anonymous
Yes — power efficiency has always been high when using ARM or PowerPC cores.
2004-05-18 3:45 am
Anonymous
since when RISC is more powerful?
Typically RISC requires many lines to do a task which CISC only uses one.
x86 is a pain in the ass though.
2004-05-18 4:11 am
Anonymous
yeah, if the industry transitions to Linux, why not ditch x86 as well, for low-power high performance RISC?
I could see it in small web kiosks, but then again you could always use a Pentium-M in this senario.
2004-05-18 4:43 am
Anonymous
This core is the direct blow to Symbian. Linux source code GPL requirement obliged NEC to deliver 2 CPU because if running on one CPU they would have needed to publish a lot of source code. Now they can run muItron on once CPU and Linux on the other.
2004-05-18 4:46 am
Anonymous
The thing is that one CISC instruction can take several cycles to complete. RISC tries to minimize cycles needed per instruction and create a small instruction set were every instruction is executed as fast as possible. This results in low power consumption.
If i remember correctly, x86 today are a RISC at core but there is a huge translator unit that translates x86 instructions to smaller peices for the processor to process. This not very efficient.
Both CISC and RISC have their advantages but I perfer RISC.
2004-05-18 5:37 am
Anonymous
Yes, I would prefer slightly more expensive chips that have an infinitely brighter future. They don’t cost *that* much, and they (ARM, PPC, etc) have growing opportunities to gain demand, which would raise volume, lowering price. If x86, PPC, ARM, Itanium, SPARC, Alpha and 68k all had the same volume / demand, which do you think would be cheaper? Amazingly, it’d be the chips that are more lucrative in the long run.
I’d love to put four of these on a single board. 16 cores, and only 8 watts for CPU power!!! If the volume was there, this would be absolutely, 100% possible. Kinda makes you wish you’da dumped x86 before computers quite became the commodity they are now, huh? But oh well, we chose our architecture, for better or worse.
(Not that the OSNews community sat down in the late 80’s / early 90’s and said, “Ya know, x86 is the way to go”)
2004-05-18 5:49 am
Anonymous
ARM’s new “MPCore” multiprocessor core can be configured to contain between one and four processors delivering up to 2600 Dhrystone MIPS of aggregate performance, based on clock rates between 335 and 550 MHz.
What can we compare the 2600 Dhrystone MIPS value to in the x86 world?
2004-05-18 5:56 am
Anonymous
“The thing is that one CISC instruction can take several cycles to complete. RISC tries to minimize cycles needed per instruction and create a small instruction set were every instruction is executed as fast as possible. This results in low power consumption. ”
It is sooooooo much more complicate than that …
x86 won not despite its defaults but because of its merits ! As linus has already pointed
“Clever architecture is something that has trapped others in the past. The Alpha processor team spent years learning that many of the architecturally correct ideals they had held needed to be thrown out when it came to the real world. According to Torvalds, “And all the RISC stuff that tried to avoid it was just a BIG WASTE OF TIME. Because the _only_ thing the RISC approach ended up showing was that eventually you have to do the hard stuff anyway, so you might as well design for doing it in the first place.””
RISC may be better on a theoritical point of view, but someone has to do the dirty work somewhere.
2004-05-18 6:23 am
Anonymous
“Linus said it, so it must be true!” Yeah, lets try thinking for ourselves here. Facts help with that, and are much more convincing that generalized arguments based seemingly on nothing.
“Because the _only_ thing the RISC approach ended up showing was that eventually you have to do the hard stuff anyway, so you might as well design for doing it in the first place.”
I don’t see how that was proven at all. Obviously, given the success of the PPC architecture, nothing of the sort was proven. Is IBM planning on implementing a CISC-style processor as their next-generation PPC?
Was Linus perhaps referring to the “Altivec” unit, which to my understanding is somewhat CISC-like?
2004-05-18 6:25 am
Anonymous
Considering ARM is an open architecture, where the only barriers to entry is the fabrication process (the chip itself is rather well documented) plus it is the single most used chip (ARM, not necessarily this) in the history of computing I’d guess it’s already commodity hardware. Maybe not an ‘enduser’ commodity.
In reality though, I do agree with you.
BTW, Rayiner, I have a predisposition concerning your name — A local cheap beer named after a local mountain (rainier) pleases me.
2004-05-18 7:02 am
Anonymous
“I don’t see how that was proven at all. Obviously, given the success of the PPC architecture”
Given the sucess of the x86, well, one can say that RISC doesn’t worth anything . Funny to see the sucess argument coming out in favor to risc …
Anyway, what some people fail to see is that RISC and CISC don’t mean anything today.
You want an other example why pure risc system fail completely ? Here speaks David Ditzel, former chief architect of SPARC
“”Today [in RISC] we have large design teams and long design cycles,” he said. “The performance story is also much less clear now. The die sizes are no longer small. It just doesn’t seem to make as much sense.” The result is the current crop of complex RISC chips. “Superscalar and out-of-order execution are the biggest problem areas that have impeded performance [leaps],” Ditzel said. “The MIPS R10,000 and HP PA-8000 seem much more complex to me than today’s standard CISC architecture, which is the Pentium II. So where is the advantage of RISC, if the chips aren’t as simple anymore?””
The RISC thing is just a theory toy which doesn’t really work in the real world. Several things were good and put in more CISC Cpu (more registers, for example, in the case of the opteron), etc… But pure risc rely heavily on compilers, etc… Whereas you can be more ‘relax’ on cisc like.
Basically, all this risc – cisc is bs. They all have complex instruction like SIMD (altivec, SSE, etc…), a lot have a more RISC ‘core’.
What is important is the way it is *implemented*, not the way it is designed. I don’t care if a 1 Ghz CPU is as fast as an other one at 2 Ghz, if the 1 Ghz is much expensive to conceive (and so much expensive when you buy it).
2004-05-18 7:53 am
Anonymous
castle already sells ARM computer desktops with RISC OS and Linux…..
I personally perfered SPARC.
2004-05-18 8:45 am
Anonymous
This baby is going to be one fast processor for embedded devices, but not for your desktop PC.
“up to 2600 Dhrystone MIPS of aggregate performance”
puts it about the same as a Pentium4 1.4ghz…
A P4 2.8 Ghz gets 5340 Dhrystone MIPS.
A P4 2.4 Ghz gets 4685 Dhrystone MIPS.
A P4 2.0 Ghz gets 3848 Dhrystone MIPS.
2004-05-18 9:06 am
Anonymous
Damn, an embedded 550 mhz CPU can compare to a 1.4 ghz desktop-centric P4? Wow. Not a good desktop CPU? Why not? The vast majority of users simply do web-centric things anymore, listen to music, email, instant messaging, web browsing, maybe type a paper – that ARM cpu would be more then enough for anyone. I find it annoying when someone puts out big bucks for a PC and simply use it for an hour every other day to check his email, what a waste.
2004-05-18 9:49 am
Anonymous
Maybe if Castle puts SMP into RISC OS then they can give the thing a bit more pep with these new double core CPUs. Not that an app would be able to use more than one CPU at once mind you.
2004-05-18 11:53 am
Anonymous
The performance edge of x86 vs RISCs is exclusively driven by the huge production volume. If the PPC attained the same levels of production ( and competition between Intel and AMD ), it would be cheaper, because RISC CPU are simpler.
… Well, that sentence was right 5 years ago, today, high end CPUs are so complex that the difference is disappearing ( and some kind of RISC architecture is hidden in CISC cloths ).
RISC meant Reduced Instruction Set, IBM have renamed it Rationalized Instruction Set as the PPC instruction set isn’t less rich than x86’s , but it is dramatically simpler.
The ARMs cannot be compared with PPC and x86, performance wise.
It is not targeted to desktop computing anymore ( although it was first used in the Archimedes PC ).
These multi core chips will be dedicated to very specific applications.
ARM is now an Intellectual Propriety firm, customer can buy their design to build custom ASICs.
ARMs can now be compared to DSPs, tuned for the application.
2004-05-18 11:57 am
Anonymous
Christopher X:
Could be my new desktop…hah, cheap/lower power quad SMP? Very very cool!
Since ARM totally sucks in terms of floating point performance (most of these processors don’t even have a FPU yet), I recommend that you don’t throw away your “normal” system, because that is likely to be much faster in terms of multimedia and 3D performance.
johnny:
since when RISC is more powerful?
You do know that the only reason why x86 keeps up is that fact that Pentium and Athlon translate the x86 instructions to RISC-like operations internally? Otherwise x86 would have been a slow duck long ago.
Typically RISC requires many lines to do a task which CISC only uses one.
Correct, but that also means that the instructions are easier to pipeline (the key to single-cycle issue), and also much easier to schedule in superscalar implementations.
Ie. although a RISC architecture needs more instructions for the same task than a CISC architecture those instructions can be executed much faster and several in parallel, which means that a RISC processor will execute much more instructions per cycle than a CISC processor, not to forget that pipelining is also the key to higher clocking speeds.
x86 is a pain in the ass though.
That’s every true though.
2004-05-18 12:00 pm
Anonymous
2600 MIPS Dhrystone is what you can expect from an AMD Duron 1.3GHz. So in terms of computational power, there is nothing to get excited over in a quad ARM CPU chip @ 500 MHz.
Also, in terms of computational power / W, the newest Intel Pentium M processors are the present record holders.
x86 vs PPC vs ARM, RISC vs. CISC: please let’s not go back. These subjects have been debated ad nauseam here, on /. and just about anywhere else…
Multi-core ARM designs are suitable for very specific applications, hence will be manufactured in smaller lots, making them more expensive than other, multi-purpose CPUs.
2004-05-18 12:05 pm
Anonymous
well that makes no sense anymore since x86 are risc cores with an instruction decoder, but it really makes sense in power consumption. My iBook G4 933mhz doesn’t need to have the fan turned on all the time while my old acer p3 800Mhz had the fan always on. I don’t think the G4 is worst than the P3.
what i want to say is that i really don’t like the infamous decoder step of the x86 since i prefer to have the decoding done on the compiler side, it’s much simpler to get more performance and less bugs in the long run, keep the hardware simple and work on the software side (that’s the trend in every electronic products)
2004-05-18 12:31 pm
Anonymous
Andrew:
“2600 MIPS Dhrystone is what you can expect from an AMD Duron 1.3GHz. So in terms of computational power, there is nothing to get excited over in a quad ARM CPU chip @ 500 MHz.”
Other than the fact that it will be running at a temperature so cool that it won’t even need a heatsink, never mind a fan, and you have a fantastic combination, great for silent desktops, blades and cool running laptops. And with a power consumption of a couple of watts at most, it will mean your laptop battery will last a good deal longer.
And at that low power consumption, you can afford to add more processors if you need more processing capacity. Think of a blade server in a laptop….
M.I.K.e
“Since ARM totally sucks in terms of floating point performance (most of these processors don’t even have a FPU yet), I recommend that you don’t throw away your “normal” system, because that is likely to be much faster in terms of multimedia and 3D performance.”
Yep, they have no floating point. But they do have multimedia extensions and even hardware extensions for Java (depending on the packaging). And how many desktop applications *truely* need floating point? Even software that needs it can be programmed using fixed point with a little lateral thinking. Floating point hardware is more of a programmers convenience than a necessity. Ditching the floating point halves the power consumption and transister count with very little practical drawbacks, other than for specialised software.
2004-05-18 1:03 pm
Anonymous
Yep, they have no floating point.
They could have, because it’s part of the design since the ARM10 (I think the only other ARM FP before were the ARM250, the FPA11, and the ARM7500FE), but most implementors don’t seem to think they need it.
But they do have multimedia extensions
I think the last time I looked it was mainly stuff that was good for something like MPEG video decoding, but a lot of other stuff certainly needs FP.
and even hardware extensions for Java (depending on the packaging).
Yeah, I’ve read about Jazelle, but I don’t run Java too often anyway.
And how many desktop applications *truely* need floating point?
Everything with 3D, unless you have a really good 3D hardware, which current ARM-based systems don’t, and MP3 encoding is a very big difference with a fast FPU.
Even software that needs it can be programmed using fixed point with a little lateral thinking.
I’ve played Quake on my Acorn RiscPC with a 200MHz StrongARM: roughly 10 FPS, and that was already after lots of fixed point conversion had taken place. The plain port of Quake with no such modifications ran at maybe 1 FPS, if you were lucky.
Don’t get me wrong, I think ARM is a nice architecture (although I liked it better before they added a lot of further instructions, because ARM wasn’t really designed with extensibility in mind), and I still have my Acorn RiscPC around (remember that Acorn was the original creator of that architecture), but I don’t think that ARM can compete with current desktop architectures anymore. It makes a nice portable processor though.
2004-05-18 3:11 pm
Anonymous
M.I.K.e
“MP3 encoding is a very big difference with a fast FPU”
Is this because the encoding is done using a port of an encoder that was written asuming a hardware FP system (i.e. ported from x86 linux)? I am sure a clean-room implementation using fixed point could improve matters greatly.
Part of the performance boost of using hardware FP is the fact that it is being performed in parallel with the integer instructions, on a separate unit. But when you already have parellel integer cores, this is less of a penalty.
“I’ve played Quake on my Acorn RiscPC with a 200MHz StrongARM: roughly 10 FPS, and that was already after lots of fixed point conversion had taken place. The plain port of Quake with no such modifications ran at maybe 1 FPS, if you were lucky.”
Sure, but again, Quake was written from the ground up assuming FP hardware was present. I seem to remember there was a Quake flythrough demo written from scratch using integer maths routines that was many times faster on a Risc PC. And don’t forget, the original RiscPCs lacked any hardware accelerated graphics, high speed buses, texture cacheing, had blocking IO on the disk access and all sorts of other bottlenecks that would have reduced performance considerably.
Hardware FP will always be faster and will always be easier to program for. No argument. My point is that the lack of it doesn’t cripple a desktop system. It will tend to make porting or writing certain applications more difficult. So it is a programmer convenience more than a necessity.
But after reading the article in more detail I noticed it talks about it containing a ‘vector floating point unit’. Maybe this will do after all? 😉
2004-05-18 3:58 pm
Anonymous
there is NOTHING CISC like about Altivec OR any other vector processing unit.
if anything, Vector processing units are the epitome of the RISC style of processing, though they are very different, the pipelining of instructions is similar.
2004-05-18 4:14 pm
Anonymous
“The thing is that one CISC instruction can take several cycles to complete. RISC tries to minimize cycles needed per instruction and create a small instruction set were every instruction is executed as fast as possible. This results in low power consumption.”
RISC architectures tend to offer better power efficiency because their instructions sets… being simpler… don’t require as many resources (ultimately transistors) to implement the architecture.
2004-05-18 5:00 pm
Anonymous
A laptop with a multi-core ARM chip running Palm OS 6 would make for an interesting device.
Does anyone know why they designed this chip? What applications do they have in mind?
2004-05-18 5:50 pm
Anonymous
Can someone tell me how ARM-CPUs compare to the Hitachi SH-family?
I know Sega used them in their Saturn (2xSH2) und Dreamcast (SH4) consoles and arcade-boards (Naomi Series).
M.I.K.e, with a Dreamcast you can even play Quake 3 on an 200MHz SH4, but I think they had an fpu integrated.
MfG,
Chris
PS: M.I.K.e, bist der von damals ausn beosonline Forum?
wie gehts wie stehts denn so?
2004-05-18 6:16 pm
Anonymous
while i don’t know, i do own a sega dreamcast, and i think highly of the SH4 series.
the DC SH4 has an integrated vector FPU
http://www.segatech.com
the SH series also uses a compact 16-bit instruction set, which gives it CISC like code density – an increasingly important issue as memory bandwidth becomes the bottleneck
2004-05-18 6:37 pm
Anonymous
“RISC meant Reduced Instruction Set, IBM have renamed it Rationalized Instruction Set as the PPC instruction set isn’t less rich than x86’s , but it is dramatically simpler. ”
There is a common misconception that the “Reduced” in RISC meant fewer opcodes in the ISA, the Reduced was in contrast with the Complex in CISC, as in reduced complexity. Meaning simple instructions vs. complex instructions not number of instructions available to the programmer.
RISC was just a way of making the microcode visible to the programmer. Basically RISC just dealt with the need to eliminate the translation process in architectures such as the VAX that had to translate their ISA intro microcode before the processor could execute the set. So they figured that since each instruction mapped into a subset of the microcode, why not make the microcode visible and get rid of the translation HW overhead, since it was the microcode what was being executed anyway.The amount of instructions in RISC may actually be bigger than in comparable CISC (case in point the POWER ISA which is remarkably large).
Several neat things came out of RISC, basically make the common case fast and reduce complexity to make things go fast. Unfortunately many of these lessons were lost when the RISC architects started to do more and more complex stuff to extract as much ILP as they could from a single instruction stream. Basically RISC people figured out that since they lacked the volume they may never enjoy the same process leadership as Intel so they better extract as much as they can out of their technology (which was almost 1 generation behind Intel’s at the end) in order to keep up. Also most of the RISC vendors started out as fabless IP shops so they could never use fab processes to their advantage, like Intel does.
2004-05-18 7:02 pm
Anonymous
fanless desktops and low power CPUs are great. period. We all just need CPUs that consume little power and don’t run so hot to require fans.
that’s all i gotta say.
2004-05-19 1:48 am
Anonymous
lower power is good, ok?
that’s for sure, as my tualatin is fanless
2004-05-19 8:57 am
Anonymous
Slow MP3 encoding on ARM
Is this because the encoding is done using a port of an encoder that was written asuming a hardware FP system (i.e. ported from x86 linux)?
Good question. I have to admit that I don’t know, but I assume that it had been optimized.
I am sure a clean-room implementation using fixed point could improve matters greatly.
The problem is that you cannot rewrite or even redesign everything before using it.
Althought I have to admit that I was a great opponent of typical benchmarks between PCs and the RiscPC, because most C programs simply don’t run that fast on an ARM if they aren’t tweaked. But the applications written specifically for the RiscPC, often in a mixture of BBC Basic and ARM inline assembly, were much faster than similar applications a PC that should be several times more powerful.
Part of the performance boost of using hardware FP is the fact that it is being performed in parallel with the integer instructions, on a separate unit.
True, which is why it actually makes sense to use FP as long as you have an FPU.
But when you already have parellel integer cores, this is less of a penalty.
The problem with ARM is that it wasn’t superscalar before ARM11, IIRC. OK, now they have several units…
Sure, but again, Quake was written from the ground up assuming FP hardware was present.
True.
I seem to remember there was a Quake flythrough demo written from scratch using integer maths routines that was many times faster on a Risc PC.
You remember correctly, but that flythrough also had no lighting, no torches, no water, and no AI.
And don’t forget, the original RiscPCs lacked any hardware accelerated graphics, high speed buses, texture cacheing, had blocking IO on the disk access and all sorts of other bottlenecks that would have reduced performance considerably.
That’s all true, but I also know someone who develops a movie player on his Iyonix with a 600MHz XScale, and he already rewrote most of the calculations to fixed-point, and his machine still struggles with a lot of movies.
Hardware FP will always be faster and will always be easier to program for. No argument. My point is that the lack of it doesn’t cripple a desktop system.
That was absolutely true just a few years ago, but nowadays there are so many multi-media formats (especially for sound) that make such heavy use of FP that it doesn’t make sense to use a desktop machine without a FPU, unless you know that you’ll never play any multi-media stuff.
But after reading the article in more detail I noticed it talks about it containing a ‘vector floating point unit’. Maybe this will do after all? 😉
Well, the VFP10 already had been part of the ARM10 design, but I don’t think any firm actually implemented that, but I might be wrong.
2004-05-19 9:12 am
Anonymous
Can someone tell me how ARM-CPUs compare to the Hitachi SH-family?
In terms of complexity, speed, and power consumption the early models are quite similar.
Later SuperH became superscalar and also got a quite powerful FPU, but I don’t think the clocking speed is as high as the ARM’s at the moment. But I have to admit that I didn’t keep track of the development for some time.
The interesting difference between ARM and SuperH is that the development is kind of vice-versa. ARM was developed as a desktop processor, then was used in embedded systems, which is why Thumb, a set of 16-bit instructions, was added to the normal 32-bit instructions.
SuperH was an embedded processor from the start (with the architecture being similar to MIPS), so it only had 16-bit instructions and 16 registers like the ARM. But SuperH can access all registers in that mode, unlike Thumb, which has a bit of a problem to access more than just the 8 lower registers of the 16.
But with SH5 the old 16-bit instructions were named SHcompact, and it became 32-bit instructions called SHmedia. Not only does this allow more complexity but these instructions also access 64 GRPs with a width of 64-bit. Just like an ARM with Thumb the SH5 can switch to SHcompact mode with a special branch instruction, but unlike ARM SuperH has a 64-bit processor.
I know Sega used them in their Saturn (2xSH2) und Dreamcast (SH4) consoles and arcade-boards (Naomi Series).
The SH3 also was the most often used processor in PocketPCs, until Microsoft decided to settle on just one architecture and chose the ARM.
M.I.K.e, with a Dreamcast you can even play Quake 3 on an 200MHz SH4, but I think they had an fpu integrated.
I’ve never played Quake 3 on my Dreamcast, but the SH4 was the first SuperH with a FPU, and quite a powerful one for such a primitive processor.
PS: M.I.K.e, bist der von damals ausn beosonline Forum?
Yes, it’s me. I guess the name isn’t too frequently used and a dead give-away 😉
wie gehts wie stehts denn so?
I have to admit that I switched to MacOS X now, and I really like it, although I still miss some of the BeOS features now and then.
2004-05-19 11:30 am
Anonymous
It is quite wrong to think that FPUs are limited to some signal processing or 3D rendering applications.
Hardware FP will always be faster and will always be easier to program for. No argument. My point is that the lack of it doesn’t cripple a desktop system.
GUIs are going toward FP coordinate system for better handling of vector graphics, seamless screen / printer rendering, …
Many applications needs Fixed Point arithmetics and a complete Floating Point Unit is overkill but it is much simpler to use than to do plenty of normalisation / shifts after every other addition or multiplication.
Low end DSP use Fixed Point arithmetics instead of Floating Point, but it is not completely like integer arithmetics. It is also painful because you often need to evaluate the range and precision of every operation to determine where you place the point in the binary numer, whereas the FPU keeps the best precision automatically ( … of course, complex calculations mandates a precise evaluation of truncation errors ).
For a special purpose processor, you can do the job of evaluating points movment tradeoffs, for a general purpose desktop CPU, a FPU is now mandatory.
2004-05-19 4:33 pm
Anonymous
but the SH4 was the first SuperH with a FPU, and quite a powerful one for such a primitive processor.
i agree. it’s a shame DC died as the SH is an ideal console CPU, and that SH5 or some core thereafter isn’t being used on an updated portable DC, perhaps with an updated POWERVR core.
the gameboy advance is rather weak in this age of 3D.
2004-05-19 11:14 pm
Anonymous
AFAIK is Sega developing a new arcade-board based on the new PowerVR core and if i remember correct an SH6.
2004-05-20 10:22 pm
Anonymous
what would be the advantage now that xbox and GC is available?