Linked by Thom Holwerda on Tue 24th Apr 2007 21:09 UTC
Hardware, Embedded Systems The prototype for a revolutionary new general-purpose computer processor, which has the potential of reaching trillions of calculations per second, has been designed and built by a team of computer scientists at The University of Texas at Austin. The new processor, known as TRIPS (Tera-op, Reliable, Intelligently adaptive Processing System), could be used to accelerate industrial, consumer and scientific computing. Professors Stephen Keckler, Doug Burger and Kathryn McKinley have been working on underlying technology that culminated in the TRIPS prototype for the past seven years. Their research team designed and built the hardware prototype chips and the software that runs on the chips.
Thread beginning with comment 233894
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[4]: Dataflow execution
by tdemj on Wed 25th Apr 2007 16:45 UTC in reply to "RE[3]: Dataflow execution"
tdemj
Member since:
2006-01-03

That's exactly what I was thinking too, an FPGA, except one that's reprogrammable a million times a second. It looks like each instruction is a mini circuit, executing code directly in hardware. This would be ideal for image, audio and signal processing - it would beat traditional DSPs.

I'm fairly confident that it's possible to write compilers for such a processor. It just has to analyze the dataflow, and design an ever changing circuit. You could execute entire loops in hardware.

Binary compatibility is not an issue these days, when the world is moving to the virtual machine direction. The compiler can be executed at runtime.

Multitasking wouldn't be an issue. If you have 1024 execution units in an array, you can choose to run dozens of tasks, or fully utilize the entire processor to run a single task at an unprecedented speed. Just imagine that you could download an entire neural network to a TRIPS. Now that's parallel processing!

Reply Parent Score: 1

RE[5]: Dataflow execution
by JonathanBThompson on Wed 25th Apr 2007 20:00 in reply to "RE[4]: Dataflow execution"
JonathanBThompson Member since:
2006-05-26

I have further researched the processor on the university website, and not only is it possible, they've already written a C and a Fortran compiler and toolchain, along with performance monitoring tools.

The looping structure and how that works is from one block to another, and not specifically within the same block: however, unrolling loops should be very easy, because of the usage of predicates and dataflow. For example:

for(i=0;i<4;i++)
{
a[i]*=10;
}
In this case, how it could be unrolled is:

a[0] depends on i==0, and then updates a[0]
a[1] depends on i=1, and then updates a[1]
<ditto>

Now if you put each a[x] as separate locations and then set what they depend on to match the loop unrolling increment (4 in this case) it then becomes such that all instructions are executed in parallel, based on their dependency of the data. If it takes 4 instructions for such a loop unroll in this sort of case (I haven't figured it out exactly) that may mean being able to unroll the loop by an increment of 32, and subject to the limitation of the processor core for throughput (there seems to be a limit per thread of 16 instructions) it could do this effectively in 2 cycles, assuming all the instructions and data were present in the block and it was loaded, with then the time requirements for storing the results back out, if desired, or they could be left behind for the next block to process.

If anything, it looks like keeping this sort of processor fed with data/instructions may be the biggest issue.

Reply Parent Score: 1

RE[6]: Dataflow execution
by hobgoblin on Wed 25th Apr 2007 20:26 in reply to "RE[5]: Dataflow execution"
hobgoblin Member since:
2005-07-06

oh man, i so want to run linux on that thing now ;)

Reply Parent Score: 2