Building The Next Generation, Part 1: Hardware

After personal computers arrived in the 1970’s they went through a series of revolutionary changes delivered by a series of different platforms. It’s been over a decade since we’ve seen anything truly revolutionary, will we see a revolution again? I believe we could not only see revolution again, we could build it today.

In this series I shall describe how, starting with existing hardware and software technologies we could put together a new platform radically different from anything on the market. I for one firmly believe we can build a completely new platform today which is faster, friendlier, more robust, more secure and more advanced than anything else on the the market today.

What is Revolutionary?

Very rarely did any of the new PC platforms introduce anything genuinely new. Rather they added technologies which were already around in research or available in more expensive systems. Many of the technologies we think of as “modern” were first thought of decades ago.

Hardware threading (aka “HyperThreading”) is new in the desktop world but was first introduced in computers by Semore Cray [Cray] in the CDC 6600 in 1964 – 40 years ago. Indeed much of the architecture in modern microprocessors first appeared in Cray designs in the 60’s.

At the same time Douglas Engelbart [Mouse] and colleagues were working on technologies such as networking, video conferencing, windows, hyper-links and the mouse, all parts of the modern computing experience.

The new platforms in the 80’s would take these technologies and combine them in ways never done before and this would create something never seen before and capable of feats previous systems couldn’t keep up with.

Here’s some of the personal computers / systems I consider revolutionary:

Apple I / II – 1977

They may not have been the first but Steve Wozniak’s engineering skill combined with Steve Jobs marketing savvy brought the personal computer to the world’s attention.

Macintosh 1983

The first mass market computer with a GUI. It started with Jef Raskin’s vision for a easy to use, low cost computer but changed radically under Steve Job’s direction in the final product.

Amiga – 1985

Jay Miner combined video game hardware with a 68K processor, that powerful hardware was then combined with an operating system with a GUI and multitasking. It took a decade for the rest of the world to catch up.

Archimedes – 1987

British company Acorn developed it’s own RISC CPU called the “Acorn RISC Machine” or ARM, they were the first to introduce the technology to the low priced desktop in the Archimedes. The ARM CPU now outsells x86 several times over and all desktop CPUs now follow RISC principles.

NeXT – 1988

Steve Jobs came back again, this time with a workstation, he put a GUI on top of the industrial strength Unix and combined it with cutting edge hardware. NeXT now lives on inside OS X.

BeOS – 1994

They started with the desire to create an Amiga-like multimedia system. The hardware had multiple CPUs and DSPs but it died after AT&T decided to stop making their chips. The Be Operating System was years ahead of anything on the market and many of it’s features still have yet to make it to the mainstream.

It’s been a long time since we’ve seen anything revolutionary but innovation hasn’t stopped altogether, there is one revolutionary platform due in the not too distant future.

200x – Sony/Toshiba/IBM Cell

Not yet available commercially but the Cell project described in the patent [Cell] combines a network of fast vector processors along with a distribution system for computations.

When these platforms arrived everything was done in-house, and I mean everything: Hardware, Casing, OS, Applications, Development environment and Compiler. Nobody does all of that today and nobody has since 1994s BeBox when Be inc. had to create an entire system from the OS core to the media player and app-lets which ran on top.

Today on the desktop things are very different. Due to the popularity of Unix clones and especially Linux there is a whole ecosystem of software from kernels to codecs, applications to app-lets which can be used in projects. If you wanted to create a new platform today you need only pick, choose and customise.

A New Platform

I am going to describe how to build a new platform but based on off-the-shelf parts and an existing open source OS. As the previous platforms have already shown, by combining advanced existing technologies we can create something completely new.

Many of the ideas already exist spread across the existing platforms but not in one place. Often the need for backwards compatibility prevents changes from being made to existing systems so useful new or even old ideas don’t get added. Even though it’s based on existing technology a fresh start will allow any changes desired to be made so we can take advantage of research and use new ideas.

Guiding Principles

Things should be made as simple as possible, but not any simpler” – Albert Einstein

Software is complex and the longer it exists the more complex it becomes, by starting again we can can consider all the requirements and produce a design to fit rather than modifying an existing design which is difficult and often leads to failure. So, when we start with the design or construction it should be simple. Simplicity is a good thing, it may make designing more difficult but the end result is easier to construct, easier to maintain and less prone to bugs. In the hardware world it’s also likely to be faster, indeed this is how Semore Cray designed his machines even as far back as the 1950’s, these machines later inspired the creation of RISC.


This system is going to be more than software. While it would be possible to design an OS only and get many of the advantages you would also be missing a lot, especially in the form of performance enhancements. So, we’ll start with what the physical system shall be, the hardware it shall use.

Hardware is changing. Processor manufacturers are hitting the limits of super-scalar processors which can be mass produced and cooled in a reasonable manner

The solutions they are switching to is single chip multi-core multi-threading (“Mulcoth”) processors where a number of CPU cores are built on a single die and each of these cores can run multiple threads. The recently announced POWER5 CPU does this and other manufacturers (Intel, HP, Sun, AMD, Motorola) will join them in the future with Sun in particular following this strategy very aggressively, Sun plan to put 8 simple cores on a single chip each running 4 threads simultaneously. In the future I can see single-core single-threaded CPUs becoming a thing of the past for the desktop.

In the future physical limitations will have an increasing effect placing limitations on how CPUs can be designed forcing simpler designs [TISC], increasing the number of CPU cores on a single chip may eventually be the only way to increase performance.

If your system can take advantage of parallelism, Mulcoth CPUs are going to bring a big advantage performance wise even if individual cores are slower than single core solutions. In fact slowing the cores down may actually boost performance as lower clocked cores can use smaller transistors freeing up room on the die for more cache and additional cores. All modern processors are limited by memory, there more there is on chip, the faster they’ll run. Using low clocked cores also means low power consumption is possible.

If we want a new platform it should take account of these changes and make use of them. Do it properly and we could have the fastest system on the market. One system which would be perfectly suitable to this sort of processor is BeOS, the entire system is heavily threaded and multi-tasks very well so a Mulcoth chip would run BeOS like a dream. You can actually take even more advantage of multiple cores than BeOS does but I’ll come back to that when I discuss the OS.

Mulcoth CPUs aren’t the only new technology on the way. FPGAs have been long predicted to appear in desktop systems but have yet to appear. Stream processors are another type of CPU which will probably turn up some day.

Stream Processors

Stream processors are an advancement on DSPs (Digital Signal Processors) which are CPUs designed specifically for high compute applications.

Many DSP processes can be broken apart into a stream – a sequential series of algorithms. In many cases DSP problems can be further divided across multiple streams and further divisions can be made within the algorithms making them suitable for SIMD (Single Instruction Multiple Data) processing.

Experimental parallel stream processors have been developed which take account of this divisibility and can process data at rates up to 100 times faster than even the most powerful desktop CPUs [Stream]. Additionally, within the algorithms data tends to be “local” so these processors do not need to constantly access a high bandwidth memory, this means their actual processing speed may be close to their theoretical peak – something very uncommon in general purpose processors.

Custom processors such as 3D Graphics processors are very high performance but cannot be programmed to do other tasks. Shaders can be programmed but this is still limited and difficult. Stream processors on the other hand are highly programmable so many different kinds of operations are possible. As if to rub the CPU manufacturers noses in it, these type of processors have low power requirements.

So I think we can use one of these into our new platform. But, where do we get them? Sony’s new Cell processor [Cell] will allow this sort of processing. Each Cell has a number of cores all of which access an on chip high speed memory and these can be configured to process data as a stream. Cell processors will be made in vast numbers from the get go and will also be sold to 3rd parties, so they should be cheap, fast, and available. You’ll not want to run your OS on them – they’re not designed for that, but for video, audio and other high compute processing they will blow everything else into next week.


An FPGA (Field Programmable Gate Array) is a “customisable chip”, it provides the parts and you tell it what to assemble itself as. They are not as fast as full custom chips but modern full custom chips cost $15 million+ just to develop.
Stream processors will be able to do many of the tasks a FPGA would usually do but stream processors are best suited to well, streams. Not everything is a stream.

There may be cases where a stream processor can work but the cumulative latency may be too great – complex real time audio or video processing are areas where this could be an issue.
There are as you see some areas where stream processors may be at a disadvantage due to their architecture. General purpose processors can do anything but performance is considerably lower than either a stream processor or an FPGA. In these cases the FPGA will provide a solution.

I don’t know if the FPGA would be used much at the beginning as they are difficult to design for but they are cheap and there’s free tools available so why not? Pre-programmed libraries on the other hand will be easy for any programmer to use.

Programming different CPUs

Having 3 different kinds of CPUs does leave us with a problem, how do we program them?

Computer companies have attempted to produce systems which accelerated functions by adding a DSP but none of these projects have lasted. The original BeBox design was based on two CPUs and three DSPs but it was very difficult to program the system. Commodore had machines with DSPs in the works but never released them [CBM]. Only Apple produced machines with a DSP but they were dropped when the PowerPC CPUs were released.

Since then the DSP technology has been incorporated into general purpose CPUs in the form of vector extensions such as SSE and Altivec. These still require specialist programming though.

However just because something is difficult doesn’t mean it can’t be solved or at least made easier. There is indeed a system which will solve this problem but to find it we’ll have to go to Russia…

The Russian computer manufacturer, Elbrus [Elbrus] have designed a technique which allows them to produce an optimal binary for different versions of a CPU, even if the CPU changes. The way they do this is to use multi-stage compilation where a part compiled file is produced and shipped, when the program is first executed it is then compiled and optimised for the system it is running on. The final-stage compiler is specific for the processor so the programmer does not need to worry about producing different binaries for different versions of the CPU. This technique is not a million miles away from the “code morphing” method used by Transmeta and indeed Elbrus has long rumoured to have been the inspiration for this technique.

This technique will not be a magic bullet but it will certainly help. When you install a program the compiler could produce a binary for the general purpose CPU, additionally it could then search for areas which are appropriate for stream processing or which could run on the Cell processor. I think this will need the developer to assist this process by marking sections of code but I expect eventually it could be an automated process, auto-vectorising compilers have been doing exactly this sort of thing for decades.

Programming the FPGA is a more complex affair although tools do exist to assist programming. I expect our system will be somewhat immature development wise for FPGAs and they will require specialist programming skills for some time to come.

Propriety Vs Off the shelf hardware

In the 80’s designing your own hardware meant you could gain a real advantage over other manufacturers. The original Amiga with it’s custom chip set was ahead of PCs for many years because of this. However in this day and age designing custom chips is prohibitively expensive and best left up to companies who specialise in that area or can at least afford it.

Building a custom board is considerably lower cost but then you have an army of PC motherboards to fight against. However, if you want to produce something different hardware wise you really don’t have much choice. The downside is other manufacturers can catch up and go straight past – your advantage one day can be your disadvantage the next.

The OS must be designed to abstract from the beginning, it must not completely rely on specific parts or combinations of parts. The Elbrus technique can get around internal changes in processors but not chip level OS dependancies.


So, we have some pretty radical hardware which has even the fastest PCs as a small snack between meals. What sort of software are we going to run on this beast? What sort of Operating System will it run?

Before we can think of applications we’ll need an OS, It’ll be based on an open source OS but modified heavily, in part 2 I shall explain which OS I would base it on, the changes to be made and why.


[Cray] Many of the techniques used in today’s microprocessors were pioneered by Seymour Cray 30-40 years ago, there are a couple of fascinating interviews with him here.

[Mouse] GUIs and many other “modern” concepts were developed in the 1960’s

Some of the technologies developed

Interview with Douglas Engelbart

Check out his workstation – 1964-1966!

[TISC] The Incredible Shrinking CPU, If CPUs are to keep getting faster they are going to have to get a lot simpler.


[Stream] Stream Processors


[Cell] Patent application for Sony’s Cell processor can be found here.

Note: Diagrams appear to be in an IE only HTML variant.

[CBM] Amiga A3000+ would of had a DSP

[Elbrus] The Elbrus Technique

Intel has recently done a deal with Elbrus and got many of their engineers.

Press Release (in Russian)

Copyright © Nicholas Blachford July 2004

About the Author:
Nicholas Blachford is a 33 year old British ex-pat, who lives in Paris but doesn’t speak French (yet). He is interested in various geeky subjects (Hardware, Software, Photography) and all sorts of other things especially involving advanced technologies. He is not currently working.

If you would like to see your thoughts or experiences with technology published, please consider writing an article for OSNews.


  1. 2004-07-13 10:35 pm
  2. 2004-07-13 11:01 pm
  3. 2004-07-13 11:15 pm
  4. 2004-07-13 11:58 pm
  5. 2004-07-14 12:20 am
  6. 2004-07-14 12:28 am
  7. 2004-07-14 12:31 am
  8. 2004-07-14 12:50 am
  9. 2004-07-14 1:31 am
  10. 2004-07-14 2:06 am
  11. 2004-07-14 4:34 am
  12. 2004-07-14 4:58 am
  13. 2004-07-14 8:40 am
  14. 2004-07-14 12:14 pm
  15. 2004-07-14 2:33 pm
  16. 2004-07-14 2:35 pm
  17. 2004-07-14 3:52 pm
  18. 2004-07-14 4:13 pm
  19. 2004-07-14 4:20 pm
  20. 2004-07-14 4:23 pm
  21. 2004-07-14 11:02 pm
  22. 2004-07-14 11:45 pm
  23. 2004-07-15 12:45 am
  24. 2004-07-15 4:25 am
  25. 2004-07-15 9:54 am
  26. 2004-07-15 10:46 am
  27. 2004-07-15 12:09 pm
  28. 2004-07-16 6:10 am
  29. 2004-07-16 6:14 am
  30. 2004-07-17 1:03 am
  31. 2004-07-18 5:13 pm
  32. 2004-07-19 12:19 am