The Future of CPUs: What’s After Multi-Core?

Eugenia Loli 2006-10-29 Hardware 18 Comments

As Moore’s Law continues to hold, IC designers are finding that they have more and more silicon real estate to play with. David Chisnall hazards some guesses as to what they might do with it.

About The Author

Eugenia Loli

Ex-programmer, ex-editor in chief at OSNews.com, now a visual artist/filmmaker.

Follow me on Twitter @EugeniaLoli

18 Comments

2006-10-29 7:56 am

samad
“It’s very easy to make accurate predictions about the future of technology. Stuff will get smaller, faster, and cheaper. This has been true for centuries and is unlikely to change…”

You mean how technology experts predicted the rapid rise of the Internet in the 80’s?

Edit: I think the future of technology should be looked at a less technological level; rather one should consider how society is shifting itself. Technology usually reflects the conditions of a society. There’s no doubt technology is shifting to greater and greater levels of sophisticated communication. The rapid rise in the Internet is a wonderful example of this. Even though the article is about CPUs, CPUs will adapt to new technological demands.

Edited 2006-10-29 08:00

2006-10-29 3:27 pm

hobgoblin
true, i dont think its so much about the internals as its about computing power pr mAh used or something like that.

things just become more and more portable with high speed wireless connections becoming more and more available.

people have been talking about digital nomads and convergence for decades almost. but now things are starting to come into place. we are getting connections that can carry everything from text to video at decent bitrates.

does any of the current umpc’s (or whatever) come with connectors for external screens? could one have say a laptop shell that one could hook into a umpc to turn it into a full sized laptop when it was required? ie, nothing more then a keyboard, touchpad and lcd with some kind of universal connector.

maybe build in a charger into the setup so that one could power it all and recharge the umpc at the same time. and when not needed the charger and any wires would be stored inside the “shell”.

then give the umpc two modes and interfaces. a desktop interface for when connected to the shell, and a pda/webpad one for when on its own.

hmm, visions…

2006-10-29 2:00 pm

PipoDeClown
re-examine some old processor architectures like http://en.wikipedia.org/wiki/Transputer or http://en.wikipedia.org/wiki/Clipper_architecture

MAKE IT OPENSOURCE HARDWARE!

Edited 2006-10-29 14:03

2006-10-29 3:43 pm

hobgoblin
interesting!

that transputer sounds like some idea i have had in my head for some time now. ie, build a computer out of seperate external component that can do tasks on their own. but when connected together they can share tasks, and cooperate.

and any gui or similar should not require installation. instead the “software” for each device should be stored on firmware in said device, and when connected would be made available to any and all displays connected to the “hive”. so when you wanted to add a scanner, printer or whatever to the mix, you would just hook it up and locate the new icon on the list provided on the display. when “clicked” said icon would open into the firmware gui of the device.

2006-10-29 4:10 pm

r_a_trip
Firmware is too permanent. If it is locked up in ROM, it becomes more difficult to update, even if it is flash-based.

With standard inter-component communication protocols, it becomes easier to maintain the software per component. Just update the software in the device and it is ready to talk to the other devices.

It requires a completely level market though. Any disruptive forces with proprietary pieces would thwart ubiquity. We still have a few decades to go, before the old vestiges of vendor lockin die off.

2006-10-29 5:36 pm

hobgoblin
software, firmware, im using to thinking that any software thats tied to the hardware (but still upgradeable without replacing said hardware) it runs on as firmware. hell, sometimes i kinda think of a apple laptop as a gigantic pda because of the hardware/software bonding apple does.

in any case, im thinking about separating the gui from the processing part using xml. so that the gui is rendered kinda like on firefox, while the real work happens on the individual device.

and yes, any inter-device communication would have to be standardized. and sadly many a company would probably try the old “embrace and extend”.

my thought is to use linux or bsd to drive the different hardware, and document the protocols under something like the gnu documentation license…
2006-10-30 11:13 am

dnstest
Exactly right. Firmware is too permanent, however the trend is going away from specialized hardware components. Instead of dedicated hardware logic doing specialized functions, generic DSP logic provides a bridge between physical input/output and the software that emulates the specialized functions previously done in hardware. Modems and soundcards were the first to go this route, and now they use it with anything they can get away with.

The notion that hardware should provide its own software, and the concept of specialized hardware interconnecting into a whole functioning system is not reality. Specialized hardware will always exist for niche markets, but the trend is towards cheap emulated hardware.

2006-10-29 6:28 pm

transputer_guy
Since the article actually mentioned the contribution of the Transputer I was quite heartened, it also had integrated FPU somewhat before the 486 did too but it never gained any MMU.

The Transputer showed that it was quite easy to write massively parallel programs using a Process model that is not unlike that of designing hardware modules. Indeed some even used occam to describe hardware and build it as FPGA logic. Today I would take a much simplified C++ and add some concurrency aspects of Verilog.

The real problem is the so called Memory Wall, the distance between DRAM and processor speeds only increases with time as little efort is made to address the fundamental problem. It is possible to turn this on its head and replace it with a Thread Wall, as long as you have atleast 30+ threads to run, each can appear to have almost no Memory Wall and can run on simplified generic RISC designs without all the fuss of SS OoO BP L123.. cache designs we have now. Such a design could use RLDRAM since it can do DRAM cycles every 2-3ns vs the effective 100ns+ of conventional systems.

I am still in awe of the Atlas machine from the early 60s as its designers shared the same view of true flat memory space.

2006-10-29 11:50 pm

Phloptical
Wow! Could you translate that into english? Then again, don’t bother. Reading stuff like that is like listening to a conversation between theoretical physicists waxing on about string theory and bubble universes…..it’s just best left not understanding and appreciating the fact that people like you have probably forgotten more than people like me will ever know.

2006-10-30 5:21 am

transputer_guy
I like to watch Nova as much as the next guy and am just as baffled by string theories and the so called quantum dot computers that pop up every so often.

Actually its really not too hard to understand at all. Look at my bio and get the paper I gave at the wotug conference on parallel computing regarding building a modern Transputer in FPGA. This design is mostly memory first then processor throughput second. Google for wotug fpga transputer R16. A modern Transputer with the best CSP thinking from the old design can take advantage of modern RLDRAM (Micron Inc) and FPGAs and multiple RISC (Sparc/Niagara like) processor elements.

However there is a complete widespread belief that the Memory Wall (google, wiki that too) can not be solved, only managed be ever increasing cache sizes. This is now known to be a complete fallacy as SRAM is a good way to build chips that get hot while DRAM pretty much doesn’t because the leakage has been designed out of it. There is also a fallacy spread by us chip designers that DRAM is many orders slower than SRAM but thats also mostly nonsense these days for “large” arrays. If you want Megabytes of RAM at high speed, the most important thing is the interface speed and to use separate I/O data paths with no muxed interfaces, hence DRAM peripheral logic can go just as fast as SRAM interfaces. DRAM arrays though can be highly banked and each run out of step and used by multiple slower interleaved threads that are latency hiding.

// long sentences warning

The essential idea is that if you have an even load of atleast 30-40 threads to run on a multithreaded processor pretty much constructed any way on any instruction set, there arises the possibility that most all DRAM references (which occur on load and store ops typically every 5-8th opcode) can be handled by a special MMU. This MMU does associative mapping via a hash on an object ID (sort of a file/memory handle) v array index and using a special DRAM from Micron called RLDRAM that can do 8 interleaved accesses every 8 DRAM clocks ie 1 every 2.5ns for a 400MHz bus. The effect is that a small DRAM chip of 32MBytes with 2x the cost of regular DRAM can have much better effective performance than L2 SRAM because most all 8 banks can be kept continuosly in flight over any 20ns window. In regular DRAM, typically only 1 bank goes into flight and thats over a 60ns period and when you add in typical x86 MMU, TLB and OS overhead, true random access is closer to several 100ns for worst case cache misses. Regular TLBs use real associative CAMs to do the translation and are limited to 256-1K ways range of possible mappings. A hash can be almost entirely asociative but requires pretty fast ram cycle rates but doesn’t require low latency if the accesses are for multiple threads.

There is bank contention and also the hash collision penalty but those can be managed so that 2/3 of the accesses are useful. I am modelling the project on a regular PC to see how an OS and apps would be like to write on it so in effect the hardware MMU C model is also a private software memory management package. Its also worth looking up persistant storage too.

I am really not the first or last to get into this, people 20-30 years before me already did some of this but they sure never had access to RLDRAM and FPGAs so it never really took off. Any rational communications or DSP guy would do it this way but you do have to deal with 30+ threads per MMU to get the latency well hidden.

I will probably be taking this thesis to my grave though before anyone else builds such a machine.

2006-10-29 7:02 pm

ParaMouthBalls
Like most chips, They should make USB and PCI CPU Processors to add on more power to notebooks. Notebooke PC’s not supported with anything as it is.
2006-10-29 7:29 pm

case
the article is a good overview of present and future CPU development. Makes me wonder at what point the complexity of the muti-core CPU’s will out distance the skills of current programmers and the languages they use.

2006-10-29 9:35 pm

cerbie
That is an excelent question. In theory, many VM-based languages allow for easier scaling up in CPUs, but then you need to get tons of use of it (happening w/ Java and .NET), get the VM and compiler before it working well in parallel tasks, and figure out ways to further encourage parallel work.

I think VIA’s solution is a good one. Take things that are expensive in CPU time, or at least done often, and implement them in hardware.

2006-10-29 10:12 pm

someone
VM is not enough. We also need to change the mindset of programmers and encourage multithreading.

2006-10-29 10:34 pm

Javier O. Augusto
I tell you what’s after Multi-Core..

REAL OPERATING SYSTEMS!!!

peace!
2006-10-29 11:43 pm

Phloptical
What’s after the new multi-core craze (see: hype)? Maybe some apps that can actually utilize them….and no I don’t mean an OS as an app.
2006-10-30 7:28 am

Marcellus
I admit I only skimmed through the article, but while I saw a note about RISC vs CISC, there was nothing about EPIC in there, which I find odd and disappointing.

Is that maybe because EPIC is currently harder to program because compilers are not advanced enough to utilize such a processor fully?

Or is it the common “Itanic sucks” opinions?

Like current CPU’s are more of a RISC/CISC amalgam, I myself expect future CPU’s to be a RISC/CISC/EPIC mix.

2006-10-30 9:58 am

renox
Sigh, IS in RISC or CISC means ‘instruction set’, the external instruction set of x86 has not changed much so it’s still CISC.

As the article said, high-end solution tend to be squeezed out of the market, so the future ISA will be probably x86 + an ISA for the GPU coprocessor.

As for the future GPU coprocessor, I don’t know what its ISA will be, probably with explicit parallelism, but it may be an ‘invisible’ ISA: a hidden ISA like Transmeta did with it’s interface being a kind of shading language.