Linked by Thom Holwerda on Tue 24th Apr 2007 21:09 UTC
Hardware, Embedded Systems The prototype for a revolutionary new general-purpose computer processor, which has the potential of reaching trillions of calculations per second, has been designed and built by a team of computer scientists at The University of Texas at Austin. The new processor, known as TRIPS (Tera-op, Reliable, Intelligently adaptive Processing System), could be used to accelerate industrial, consumer and scientific computing. Professors Stephen Keckler, Doug Burger and Kathryn McKinley have been working on underlying technology that culminated in the TRIPS prototype for the past seven years. Their research team designed and built the hardware prototype chips and the software that runs on the chips.
Thread beginning with comment 233666
To read all comments associated with this story, please click here.
Dataflow execution
by tdemj on Tue 24th Apr 2007 21:52 UTC
tdemj
Member since:
2006-01-03

Here's some more information:
http://www.cs.utexas.edu/users/skeckler/cs382m/handouts/trips-proc-...

The basic idea is that execution is based on dataflow order, not program order.

Reply Score: 2

RE: Dataflow execution
by hobgoblin on Tue 24th Apr 2007 22:09 in reply to "Dataflow execution"
hobgoblin Member since:
2005-07-06

i wonder what kind of hell that will make for compilers...

Reply Parent Score: 2

RE[2]: Dataflow execution
by JonathanBThompson on Wed 25th Apr 2007 07:34 in reply to "RE: Dataflow execution"
JonathanBThompson Member since:
2006-05-26

Having read through chapter 6 of the link given above (thanks to that poster: can't remember the name off the top of my head, and OSNews has obliterated that from my view while typing this comment, and I feel too lazy right now to get it by opening another browser window/tab) it appears to me that even a lousy compiler writer would still create executable code without too much effort: in each block of up to 128 instructions, the processor itself prioritizes them according to how they're entered, BUT the instructions only execute once their predicates and incoming data are ready: in other words, the actual order of the instructions to accomplish something is merely an optimization of priorities, but actually does not affect the eventual outcome at all, except perhaps in efficient execution time. It is indeed, "data-flow driven" code, in that the dependencies between incoming and outgoing data and calculations is very explicitly stated, and as long as you've drawn that graph correctly, it's actually very easy to create code for it that's technically correct, even if not optimal for prioritization.

The impression I got from reading thus far through the PDF file is that this is a CPU designed by someone that is very accustomed to thinking in functional logic blocks, and as such the code and how it is all connected has an incredibly strong resemblance to drawing a combinatorial/sequential logic gate diagram and all the inputs/outputs, etc. associated with them.

In addition, in a bizarre but potentially very scalable design implementation, many (or all, I'd have to review) of the registers for each execution unit are available outside the chip (!) via a memory mapped address, and outsiders can pause the processor as needed: this chip is designed from the outset to scale in some mutation of a grid (the topology can be defined quite loosely in hardware to some degree, largely defined in software for programming purposes) leading to the very interesting observation that if there were (for example) a data processing task that required 1000 separate small sets of data, it'd be fairly easy to figure out how to set that up with a bunch of these processors, as blocks of instructions are executed atomically: either everything is satisfied by the end of the block, or no outside writing of data occurs, and there are native synchronization instructions to make this easier.

Nonetheless, compared to any other processor architecture I've seen documented, this one is WEIRD but it may very well be the future, though I suspect not in the next few years: there's too much legacy code to make it an easy jump to something so foreign in concept to most people at the low level, though if the compiler writers get busy, that may mostly be abstracted into irrelevancy: the biggest question becomes how well and fast it can execute emulation of other processors in a reasonable manner. I suspect that may be the really hard thing to do, by comparison.

Reply Parent Score: 3