“A new startup out of MIT emerged from stealth mode today to announce that they’re shipping a 64-core processor for the embedded market. The company, called Tilera, was founded by Dr. Anat Agarwal, the MIT professor behind the famous and venerable Raw project on which Tilera’s first product, the TILE64 processor, is based. Tilera’s director of marketing, Bob Dowd, told Ars that TILE64 represents a “sea change in the computing industry”, and the company’s CEO isn’t shy about pitching the chip as the “first significant new chip architectural development in a decade”. So let’s take an initial look at what was announced about TILE64 today, with further information to follow as it becomes available.”


What is it? x86, x64? PowerPC? MIPS? Something new?
The article indicates it’s a MIPS derivative.
If anything x86 isn’t exactly a great architecture, just “good enough”. If you’ve tried programming it, you know that even 6502 is more elegant.
PPC, ARM, MIPS &c are all excellent architectures which can be programmed highly, highly efficiently by someone who knows what they’re doing and isn’t clouded by doing things solely the x86 way.
Oh, and modern CPUs like the Core Duo emulate x86 macros down to more a more RISC-like central set. This is how Intel managed to make the jump from the hot, high-power P4’s to relatively cooler and lower power Dual Core chips.
The Core 2 is actually less RISC-y internally than the P4. The P4 is internally a pure u-op design. The Core 2 caries fused u-ops (eg: mem-op instructions) through much of the frontend of the core.
As for PPC, ARM, and MIPS, one of those three does not belong. MIPS is a great instruction set. PowerPC is poo.
My bad then, but there are a number of variants of ARM, so it can be tight in some instances where even simple op-codes are un-available. However the Commodore style design, and the neat conditional prefixes make it a somewhat creative processor.
Oh, ARM is fine. It’s PPC that doesn’t belong. I just don’t like the basic design (too many weird instructions, no separate 32-bit and 64-bit operations, etc).
MIPS is my favorite, but x86-64 comes in a close second. It’s so much better than people give it credit for. It’s actually quite orthogonal in its addressing and operand modes. You get 8-bit, 32-bit, and 64-bit registers, along with 8-bit, 32-bit, and 64-bit operations. You get 8-bit and 32-bit immediates and displacements, instead of the oddly-sized immediates and displacements you usually get in RISC. Instructions that write to fixed registers and two-operand instructions suck a bit, but you can deal with both quite easily in the register allocator.
I miss MIPS, it was the instruction set they used to teach us assembler back at college. So nice and straightforward.
Not even close. Intel has been converting from x86 to internal micro operations since the original Pentium more than 10 years ago.
PPC, ARM and RISC aren’t in any way more efficient than x86. The ISA is just like and API and in no way reflects efficiency. A lot of issues with X86 are easily fixed by clever hardware design.
Yeah it’s not as elegant as PPC or MIPS but it’s also a lot older.
I thought the Pentium Pro was the first one to start doing that… Could be wrong.
According to the article, this isn’t an extension of the x86 architecture but it’s own cpu architecture. This scores well in my book.
The thing that would make or break this cpu is how well the programmer can extract performance from this thing. I for one, hope that more people learn about proper concurrency techniques, because God knows that there aren’t enough people that know how to think in parallel.
even if one can toss one program or task pr “core” things get interesting. have you looked into something like the windows task manager and looked at the number of stuff thats running the background on a average desktop?
Too bad those threads aren’t running at the same time.
Yes, I guess that was why he brought it up.
Umm… right… they are not running at the same time BECAUSE you have one core (prior to multicore CPUs).
With multicore CPUs, windows can barf on a process for a while but still run the machine quite well simply for the fact that more than one process can execute during the same processing cycle.
> Umm… right… they are not running at the same time BECAUSE you have
> one core (prior to multicore CPUs).
Erm, no. These threads are waiting for external events or for each other. Practically all threads in windows except applications are service threads in a microkernel sense and have nothing to do most of the time.
> With multicore CPUs, windows can barf on a process for a while but still
> run the machine quite well simply for the fact that more than one
> process can execute during the same processing cycle.
OSes do that for quite some time, and it’s called “preemption”. If the service threads need 1% of the time, the rest can get 99%. Even if this is distributed evenly among applications, the system is still responsive. What you experienced is more likely a problem in windows or in an application, through the fact that CPU time was *not* distributed correctly. You may be able to fix this problem by throwing lots of CPUs at it, but it doesn’t attack the root of the problem.
they’re not all running at the same time on any system simply because they are dormant in memory – waiting for a user interrupt. The number of cores / CPUs you have in a system wont make inactive threads spring to life magically.
Yes, and if you look at a *nix system, there’s a metric called “load average”, which measures the number of processes trying to run concurrently.
This number is typically very low, since usually the background processes are waiting for input and _not_ running at once.
The applications of this chip and several others like it (RMI if I recall) are not for anything a Windows minded user would likely know much about.
It will end up in embedded hardware for switches, routers, wireless base stations and the like where ever there are large amounts of channels or packets to process. It perhaps could also end up in some interesting HPC apps since it is available on a PCI card.
Since the MIPs ISA is commonly used in these embedded applications it would be quite surprising for any new multicore chip not to use it except of course Intel/AMD, Sun.
The issue of concurrency for these chips isn’t really the same either for Intel, Sun or these Tilera type companies. Sun has it pretty easy given their server type of load. Tilera and similar will get programmed mostly by engineers familiar with hardware and software. Intel has the real problem, selling huge numbers of parallel whatever to the unwashed masses who really still pine for single threaded.
While Tilera’s numbers look interesting, I am disappointed not to see them using RLDRAM instead of DDRAM, it has vastly better performance for threaded designs.
sorry for the OT, but two questions:
1. what’s your opinion on david may’s new XCore [also for embedded market] event driven multi-threaded processor [xmos.com]?
2. how’s your own design progressing? [link?]
thnx!
I will wait and see for more details on XMos. The event driven scheme looks interesting, possible reuse of old ideas and the threaded 8 way core looks okay. When I see the use of large caches and SDRAM though, I see another boat going the same place as everyone else, huge missed opportunity for real memory performance. I am more intrigued about the software they put together for it, if any CSP or occam in it. I am waiting to see if it looks more like a hardware design language platform. Now if you had 50,000 tiny cores on a chip, the software tool flow might look alot like an FPGA tool flow of today.
On my own work I switched tack to concentrate on the OS and app layer and when thats developed enough, will get back to finishing up hardware. I think that was the right thing to do, a somewhat radical hardware platform with no software wouldn’t be much use with nothing to run on it.
deleted [double post]
Edited 2007-08-21 07:43
…iss the software and the adoption.
actually we have powerpc and sparc, but open source movement simply ingnore them.
since x86 will remain yje only reference, new and good cpu will fail.
This clearly isn’t meant as an competitor for x86 or anything similair. It has other applications.
SPARC has been open for over 20 years, the problem is that Intel and/or AMD are unwilling to swallow their pride and adopt an alternative. Lets be completely honest, if Intel spent as much money creating Core (and Core 2) and developed a processor with that level of efficiency with a SPARC ISA sitting ontop it would have the best of both worlds. Open standards processor that is rocket fast.
The problem is that unless it gets the volume, development and more importantly, backing of Microsoft, its doomed to be a niche in the computer market. Unfortunately, whether it runs Windows plus applications dictates whether it succeeds of fails in the marketplace of mainstream desktops and servers.
Is that really so? I think now that Sun is committed to Solaris and Linux on both x86 and SPARC and considering the improvements both operating systems are seeing, it will not be too long before these will be credible alternatives to the establishment, in fact they already are for most people.
I’m typing this on my Fu Long MIPS system and it feels and works just as fine as on my x86 system except for proprietary things such as Flash, which I would rather do without.
I have ported Slackware 12.0 to SPARC as well over the last few weeks and apart from Xorg problems with the Permedia II card because of missing multi-domain PCI support it works very well already.
What we’re heading for is an era where you can choose any hardware and any operating system you want. Microsoft’s support of x86, which was once to their advantage, has turned into a significant disadvantage because all their own software and that of third parties is wedded to the architecture.
That’s what the documents I’ve come across on the net all say. It seems like they don’t have a full MIPS(r) license hence the use of “MIPS-like” however according to Linuxdevices they have licensed compiler technology from SGI namely the “MIPSPRO compiler suite” which was/is the native compiler suite for IRIX.
From the article:
> The processor is also available in lots of 10,000 for $435, and further
> entries to the TILE family are planned to include different core counts.
Unless they mean $435 per lot (which would amount to $0.04 per CPU and doesn’t sound realistic), this is quite a huge amount of cash for a processor. Remember, we’re talking about the embedded market here. Could you imagine a device whose CPU alone costs that much? Then filter out the devices that need that amount of power at all.
This is probably quite comparable in price to RMI’s 8-core monster. I mean, have you seen the prices of the networking gear that these things go into?
From the article
“TILE64 is initially being pitched at the embedded market, with wire-speed network processing and HD media encoding being the two main application scenarios that Tilera wants to see it used in.”
Neither of which is likely oriented towards the casual consumer who would frown at $435/cpu With some networking gearing in the hundreds of thousands of dollars, the cost is not that big of a deal; especially if the performance and power numbers hold up.
   With some networking gearing in the hundreds of thousands of dollars, the cost is not that big of a deal; especially if the performance and power numbers hold up.
IIRC, Niagara 2 costs $800-1000 each, so $435 is a reasonable price. Remember, million-dollar Cisco routers are embedded systems.
I’m going to make a first test of a parallel development environment soon (reducible to any level of granularity, so the more cores, the better). My question is will these chips be available to regular customers and if so, when and for what price? Is there a chance of this?
Also, I did not understand the part about simulating your own ASIC. Does this mean you can reprogram data paths? Or is it just a way of making your software believe it’s directly connected to something it isn’t (as in two more jumps, but the software doesn’t notice)? I hope for future tech on reprogrammable data paths. That would be insanely useful for what I’m doing. But I get the feeling it’s about configuring the external pins on the actual package rather than the cores themselves. Is that correct?
[Oh, and if you’re curious about the software technique I’m using, it’s mostly data flow, but without the sequential parts for components. 30 yo techniques with a twist.]
My question is will these chips be available to regular customers and if so, when and for what price? Is there a chance of this?
Sure, the PCI eval board should only cost $5,000-10,000.
http://www.tilera.com/products/boards.php
Usually the suppliers of parts and systems headed for embedded and industrial use don’t usually care to sell product to end users on a whim, too much trouble and they likely don’t have the staff to follow lots of small projects. You could try and get into a developer program if they have one. When a high vol price is x $, the one off price if there is any is usually several times that and usually from their distributer.
The article does suggest the I/O pins are reconfigurable to some degree, I don’t think the cores are to any real extent. There are some other reconfigurable ASIC replacements where arithmetic blocks can be configured at a higher level than FPGAs.
A lot of the applications for multicores could also be handled by ASIC equivalents that would be much harder and more expensive to design in a timely fashion, so to some extent these multicores from several vendors are replacing ASIC systems. If you were already designing an ASIC for a similar type of application, then these multicores effectively simulate or replace that ASIC at full speed.
You can think of a hardware software continuum where processes can run as code on communicating processors or be synthesized as logic circuits. We see the same sort of thing in FPGAs but at much lower speeds and densities.
Sounds like you are also exploring this continuum too, dataflow is in there too. Ever thought about a language that can be used for hardware design and || software too?
Sounds like you are also exploring this continuum too, dataflow is in there too. Ever thought about a language that can be used for hardware design and || software too?
(First, thanks for the response.)
About your Q, I’m not building a language per se though you can certainly use anything for any low level operations (but these do not parallelise). Most of the software is actually designed by linking components with data paths. What I like about TILE64 as that the cores are all connected together in a grid so you can simulate most any data flow network.