Linked by Thom Holwerda on Thu 30th Jun 2005 12:27 UTC
Hardware, Embedded Systems There are increasing rumors that Alpha might be brought back to life. The Inq sets the big 'if' aside and explores the possiblities: "What if there really is a will to get Alpha back into the changed market? What sort of chip would it have to be to have that good chance of success, if any?"
Permalink for comment
To read all comments associated with this story, please click here.
RE: I think if you had some cpu
by JJ on Thu 30th Jun 2005 18:04 UTC

Due to FPGA constraints I stick to rigid RISC & KISS principles but I do get some performance by using N way threading and by replicating several processors around each MMU.

In case any Athlon fans didn't notice, the chief architect of that cpu Atiq Raza has long gone and is doing the same thing I am doing except he use the MIPs ISA and is doing ASIC design, 8 cores 4 way threaded and alot like Niagara. Not surprising since he had something to do with that also. Both cpus do not promise ridiculous gobs of FPU that nobody really uses and both are 100 person efforts. They do promise continous threaded computing with a lot less impact when branches and cache misses occur. They are not intended for folks that want 1 fast single threaded cpu.

Since I have only 1% of their budgets I have to use FPGA and only get 1/3 of ASIC perf but atleast it can be ASICed later. My per cpu cost though is only $1 per 100 Mips but you have to buy into pervasive threading ala Transputer. I think today the 64b Opterons maybe deliver 10x more cpu power but at 100x-500x the cost. My xp2400 runs about 5x faster at maybe 20x cost, but I can place multiple cores into some smaller FPGA. But FPU is out of it for now.

I also use a simple design that makes it easy to get 3 reads and 1 write from the local sram every instruction and it also drastically simplifies the instruction decode which uses 1..4 16bit codes with only 2 formats usually 3 reg or a ld/st. The datapath and ISA is a 32bit design but uses a 2 cycle 16b ALU. My branch hit is either 1 (near) or maybe 4 (far) cycles when taken so no predicter used. There is some Icache and Dcache is replaced by 8 way banked RLDRAM.

The only thing I add on top of that that others don't is to support pervasive process and message support as the Transputer did, and user level memory allocation in HW so new and del HW managed. The other thing I use is RLDRAM which effectively gives me 2.5ns DRAM access after 20ns of latency (on paper) while I hear that AMD direct memory connect gives 100ns to DDR covered though by gobs of cache.

As for VLIW,Epic,superscalar,ooo,vector etc, I could care less for building single threaded cpus, they simply don't deliver mips in proportion to transisters and clock frequency or power used. Massively distributed slower cores have their issues too as some of the Cell fans are finding out. If you don't want to learn how to develop concurrent programs CSP style then you are stuck with praying at the Intel,AMD,IBM church for incrementally less and less of the same.

It all comes down to who is in control of the parallelization, the cpu or the programmer. I don't buy the cpu idea of trying to extract it automatically.

just my 2cycles