Building The Next Generation, Part 1: Hardware

Guest post by Nicholas Blachford 2004-07-13 Hardware 32 Comments

After personal computers arrived in the 1970’s they went through a series of revolutionary changes delivered by a series of different platforms. It’s been over a decade since we’ve seen anything truly revolutionary, will we see a revolution again? I believe we could not only see revolution again, we could build it today.

In this series I shall describe how, starting with existing hardware and software technologies we could put together a new platform radically different from anything on the market. I for one firmly believe we can build a completely new platform today which is faster, friendlier, more robust, more secure and more advanced than anything else on the the market today.

What is Revolutionary?

Very rarely did any of the new PC platforms introduce anything genuinely new. Rather they added technologies which were already around in research or available in more expensive systems. Many of the technologies we think of as “modern” were first thought of decades ago.

Hardware threading (aka “HyperThreading”) is new in the desktop world but was first introduced in computers by Semore Cray [Cray] in the CDC 6600 in 1964 – 40 years ago. Indeed much of the architecture in modern microprocessors first appeared in Cray designs in the 60’s.

At the same time Douglas Engelbart [Mouse] and colleagues were working on technologies such as networking, video conferencing, windows, hyper-links and the mouse, all parts of the modern computing experience.

The new platforms in the 80’s would take these technologies and combine them in ways never done before and this would create something never seen before and capable of feats previous systems couldn’t keep up with.

Here’s some of the personal computers / systems I consider revolutionary:

Apple I / II – 1977

They may not have been the first but Steve Wozniak’s engineering skill combined with Steve Jobs marketing savvy brought the personal computer to the world’s attention.

Macintosh 1983

The first mass market computer with a GUI. It started with Jef Raskin’s vision for a easy to use, low cost computer but changed radically under Steve Job’s direction in the final product.

Amiga – 1985

Jay Miner combined video game hardware with a 68K processor, that powerful hardware was then combined with an operating system with a GUI and multitasking. It took a decade for the rest of the world to catch up.

Archimedes – 1987

British company Acorn developed it’s own RISC CPU called the “Acorn RISC Machine” or ARM, they were the first to introduce the technology to the low priced desktop in the Archimedes. The ARM CPU now outsells x86 several times over and all desktop CPUs now follow RISC principles.

NeXT – 1988

Steve Jobs came back again, this time with a workstation, he put a GUI on top of the industrial strength Unix and combined it with cutting edge hardware. NeXT now lives on inside OS X.

BeOS – 1994

They started with the desire to create an Amiga-like multimedia system. The hardware had multiple CPUs and DSPs but it died after AT&T decided to stop making their chips. The Be Operating System was years ahead of anything on the market and many of it’s features still have yet to make it to the mainstream.

It’s been a long time since we’ve seen anything revolutionary but innovation hasn’t stopped altogether, there is one revolutionary platform due in the not too distant future.

200x – Sony/Toshiba/IBM Cell

Not yet available commercially but the Cell project described in the patent [Cell] combines a network of fast vector processors along with a distribution system for computations.

When these platforms arrived everything was done in-house, and I mean everything: Hardware, Casing, OS, Applications, Development environment and Compiler. Nobody does all of that today and nobody has since 1994s BeBox when Be inc. had to create an entire system from the OS core to the media player and app-lets which ran on top.

Today on the desktop things are very different. Due to the popularity of Unix clones and especially Linux there is a whole ecosystem of software from kernels to codecs, applications to app-lets which can be used in projects. If you wanted to create a new platform today you need only pick, choose and customise.

A New Platform

I am going to describe how to build a new platform but based on off-the-shelf parts and an existing open source OS. As the previous platforms have already shown, by combining advanced existing technologies we can create something completely new.

Many of the ideas already exist spread across the existing platforms but not in one place. Often the need for backwards compatibility prevents changes from being made to existing systems so useful new or even old ideas don’t get added. Even though it’s based on existing technology a fresh start will allow any changes desired to be made so we can take advantage of research and use new ideas.

Guiding Principles

“Things should be made as simple as possible, but not any simpler” – Albert Einstein

Software is complex and the longer it exists the more complex it becomes, by starting again we can can consider all the requirements and produce a design to fit rather than modifying an existing design which is difficult and often leads to failure. So, when we start with the design or construction it should be simple. Simplicity is a good thing, it may make designing more difficult but the end result is easier to construct, easier to maintain and less prone to bugs. In the hardware world it’s also likely to be faster, indeed this is how Semore Cray designed his machines even as far back as the 1950’s, these machines later inspired the creation of RISC.

Hardware

This system is going to be more than software. While it would be possible to design an OS only and get many of the advantages you would also be missing a lot, especially in the form of performance enhancements. So, we’ll start with what the physical system shall be, the hardware it shall use.

Hardware is changing. Processor manufacturers are hitting the limits of super-scalar processors which can be mass produced and cooled in a reasonable manner

The solutions they are switching to is single chip multi-core multi-threading (“Mulcoth”) processors where a number of CPU cores are built on a single die and each of these cores can run multiple threads. The recently announced POWER5 CPU does this and other manufacturers (Intel, HP, Sun, AMD, Motorola) will join them in the future with Sun in particular following this strategy very aggressively, Sun plan to put 8 simple cores on a single chip each running 4 threads simultaneously. In the future I can see single-core single-threaded CPUs becoming a thing of the past for the desktop.

In the future physical limitations will have an increasing effect placing limitations on how CPUs can be designed forcing simpler designs [TISC], increasing the number of CPU cores on a single chip may eventually be the only way to increase performance.

If your system can take advantage of parallelism, Mulcoth CPUs are going to bring a big advantage performance wise even if individual cores are slower than single core solutions. In fact slowing the cores down may actually boost performance as lower clocked cores can use smaller transistors freeing up room on the die for more cache and additional cores. All modern processors are limited by memory, there more there is on chip, the faster they’ll run. Using low clocked cores also means low power consumption is possible.

If we want a new platform it should take account of these changes and make use of them. Do it properly and we could have the fastest system on the market. One system which would be perfectly suitable to this sort of processor is BeOS, the entire system is heavily threaded and multi-tasks very well so a Mulcoth chip would run BeOS like a dream. You can actually take even more advantage of multiple cores than BeOS does but I’ll come back to that when I discuss the OS.

Mulcoth CPUs aren’t the only new technology on the way. FPGAs have been long predicted to appear in desktop systems but have yet to appear. Stream processors are another type of CPU which will probably turn up some day.

Stream Processors

Stream processors are an advancement on DSPs (Digital Signal Processors) which are CPUs designed specifically for high compute applications.

Many DSP processes can be broken apart into a stream – a sequential series of algorithms. In many cases DSP problems can be further divided across multiple streams and further divisions can be made within the algorithms making them suitable for SIMD (Single Instruction Multiple Data) processing.

Experimental parallel stream processors have been developed which take account of this divisibility and can process data at rates up to 100 times faster than even the most powerful desktop CPUs [Stream]. Additionally, within the algorithms data tends to be “local” so these processors do not need to constantly access a high bandwidth memory, this means their actual processing speed may be close to their theoretical peak – something very uncommon in general purpose processors.

Custom processors such as 3D Graphics processors are very high performance but cannot be programmed to do other tasks. Shaders can be programmed but this is still limited and difficult. Stream processors on the other hand are highly programmable so many different kinds of operations are possible. As if to rub the CPU manufacturers noses in it, these type of processors have low power requirements.

So I think we can use one of these into our new platform. But, where do we get them? Sony’s new Cell processor [Cell] will allow this sort of processing. Each Cell has a number of cores all of which access an on chip high speed memory and these can be configured to process data as a stream. Cell processors will be made in vast numbers from the get go and will also be sold to 3rd parties, so they should be cheap, fast, and available. You’ll not want to run your OS on them – they’re not designed for that, but for video, audio and other high compute processing they will blow everything else into next week.

FPGAs

An FPGA (Field Programmable Gate Array) is a “customisable chip”, it provides the parts and you tell it what to assemble itself as. They are not as fast as full custom chips but modern full custom chips cost $15 million+ just to develop.
Stream processors will be able to do many of the tasks a FPGA would usually do but stream processors are best suited to well, streams. Not everything is a stream.

There may be cases where a stream processor can work but the cumulative latency may be too great – complex real time audio or video processing are areas where this could be an issue.
There are as you see some areas where stream processors may be at a disadvantage due to their architecture. General purpose processors can do anything but performance is considerably lower than either a stream processor or an FPGA. In these cases the FPGA will provide a solution.

I don’t know if the FPGA would be used much at the beginning as they are difficult to design for but they are cheap and there’s free tools available so why not? Pre-programmed libraries on the other hand will be easy for any programmer to use.

Programming different CPUs

Having 3 different kinds of CPUs does leave us with a problem, how do we program them?

Computer companies have attempted to produce systems which accelerated functions by adding a DSP but none of these projects have lasted. The original BeBox design was based on two CPUs and three DSPs but it was very difficult to program the system. Commodore had machines with DSPs in the works but never released them [CBM]. Only Apple produced machines with a DSP but they were dropped when the PowerPC CPUs were released.

Since then the DSP technology has been incorporated into general purpose CPUs in the form of vector extensions such as SSE and Altivec. These still require specialist programming though.

However just because something is difficult doesn’t mean it can’t be solved or at least made easier. There is indeed a system which will solve this problem but to find it we’ll have to go to Russia…

The Russian computer manufacturer, Elbrus [Elbrus] have designed a technique which allows them to produce an optimal binary for different versions of a CPU, even if the CPU changes. The way they do this is to use multi-stage compilation where a part compiled file is produced and shipped, when the program is first executed it is then compiled and optimised for the system it is running on. The final-stage compiler is specific for the processor so the programmer does not need to worry about producing different binaries for different versions of the CPU. This technique is not a million miles away from the “code morphing” method used by Transmeta and indeed Elbrus has long rumoured to have been the inspiration for this technique.

This technique will not be a magic bullet but it will certainly help. When you install a program the compiler could produce a binary for the general purpose CPU, additionally it could then search for areas which are appropriate for stream processing or which could run on the Cell processor. I think this will need the developer to assist this process by marking sections of code but I expect eventually it could be an automated process, auto-vectorising compilers have been doing exactly this sort of thing for decades.

Programming the FPGA is a more complex affair although tools do exist to assist programming. I expect our system will be somewhat immature development wise for FPGAs and they will require specialist programming skills for some time to come.

Propriety Vs Off the shelf hardware

In the 80’s designing your own hardware meant you could gain a real advantage over other manufacturers. The original Amiga with it’s custom chip set was ahead of PCs for many years because of this. However in this day and age designing custom chips is prohibitively expensive and best left up to companies who specialise in that area or can at least afford it.

Building a custom board is considerably lower cost but then you have an army of PC motherboards to fight against. However, if you want to produce something different hardware wise you really don’t have much choice. The downside is other manufacturers can catch up and go straight past – your advantage one day can be your disadvantage the next.

The OS must be designed to abstract from the beginning, it must not completely rely on specific parts or combinations of parts. The Elbrus technique can get around internal changes in processors but not chip level OS dependancies.

Conclusion

So, we have some pretty radical hardware which has even the fastest PCs as a small snack between meals. What sort of software are we going to run on this beast? What sort of Operating System will it run?

Before we can think of applications we’ll need an OS, It’ll be based on an open source OS but modified heavily, in part 2 I shall explain which OS I would base it on, the changes to be made and why.

References

[Cray] Many of the techniques used in today’s microprocessors were pioneered by Seymour Cray 30-40 years ago, there are a couple of fascinating interviews with him here.

http://americanhistory.si.edu/csr/comphist/cray.htm

http://americanhistory.si.edu/csr/comphist/montic/cray.htm

[Mouse] GUIs and many other “modern” concepts were developed in the 1960’s

http://inventors.about.com/library/weekly/aa081898.htm

Some of the technologies developed

http://www.bootstrap.org/chronicle/pix/pix.html

Interview with Douglas Engelbart

http://americanhistory.si.edu/csr/comphist/englebar.htm

Check out his workstation – 1964-1966!

http://www.bootstrap.org/chronicle/pix/img0023.jpg

[TISC] The Incredible Shrinking CPU, If CPUs are to keep getting faster they are going to have to get a lot simpler.

here

[Stream] Stream Processors

here

[Cell] Patent application for Sony’s Cell processor can be found here.

Note: Diagrams appear to be in an IE only HTML variant.

[CBM] Amiga A3000+ would of had a DSP

http://amiga.emugaming.com/prototypes/a3000plus.html

[Elbrus] The Elbrus Technique

http://www.elbrus.ru/mcst/eng/e2k_arch.shtml

Intel has recently done a deal with Elbrus and got many of their engineers.

Press Release (in Russian)

About the Author:
Nicholas Blachford is a 33 year old British ex-pat, who lives in Paris but doesn’t speak French (yet). He is interested in various geeky subjects (Hardware, Software, Photography) and all sorts of other things especially involving advanced technologies. He is not currently working.

If you would like to see your thoughts or experiences with technology published, please consider writing an article for OSNews.

32 Comments

2004-07-13 10:35 pm

Anonymous
>> When these platforms arrived everything was done in-house,

>> and I mean everything: Hardware, Casing, OS, Applications,

>> Development environment and Compiler. Nobody does all of

>> that today and nobody has since 1994s BeBox when Be inc.

>> had to create an entire system from the OS core to the

>> media player and app-lets which ran on top.

Software wise, I’d say that SkyOS fits your bill of “in-house”. Perhaps you should have done a little more research on some of this?

In general, this is a pretty good article. It’ll be worth reading the “Part II”.
2004-07-13 11:01 pm

Anonymous
Multi core, multi threaded CPU’s are already on the roadmaps.

GUI’s will be rendered on the grahics card like Quartz and Avalon

USB Flash drives have already pretty much replaced the Floppy drive

PCI express allows for 10gigE network cards.

As far as software, the way I use a computer today has not really changed since I used Win95.

I don’t think a computer will ever be able to “think for me” without just being in my way.
2004-07-13 11:15 pm

Anonymous
Where does SkyOS build hardware, Casing or the Compiler in house?
2004-07-13 11:58 pm

Anonymous
first, kudos on the reference, i would like to see that more common into tech journalism.

The arcticle is interesting, even if it have some error The Atari Falcon did feature a DSP. Forgetting The russian Setun computer and the CBM SID chip is criminal.

I think the only way to make think move it to be a treath to the big guys. This mean doing PCI (or the standard that will be next) surface mount card that can unload part of the OS. I could really see a “Haiku detected your Media kit hardware card and will use it” at boot time just like old psygnosis amiga game did with extented memory card for A500.

Another break to this are laptop, the do it yourself hardware innovation is pretty much constrained to open big noisy beige box.

Another break to this is the number of hardware guy in engineering school, this is bad. I don’t even myself do surface mount anymore and know very few guy that can do that at home.

Another problem, standard. A do it youself standard can pretty much only be imposed trough a known institution/fundation as a contest. Or perhaps by carving porn scene on top of each IC
2004-07-14 12:20 am

Anonymous
Most of today’s computer systems all have 1 thing in common: they are programmed to do something with data. There is data, and one or more CPU’s eat through streams of instructions, telling them what to do with that data.

Example:

(1) take one data element

(2) perform some operation on that data

(3) place modified data back into storage

(4) repeat (1) to (3) over and over, going through all data

Call them instruction-stream centered. In my vision the next big step: there is data, and all the programming does, is re-configure the way in which streams of data are processed. Call that: data-stream centered. The FPGA is a good example of how this works.

Example:

(1) configure what path data comes in, what operation is done, and along which path data should go out

(2) wait until all data has passed through the hardware

This is much more energy-efficient, and hardware to do this, exists (and is used) today. In my view the real problems are not building this hardware, but the software to control it. Think operating system, programming language design. Way more difficult than putting the IC’s together.

I am convinced though, that the simpler the essentials become, the more useful they are in building complex systems. So the way to go: RISC CPU’s, (lightweight) kernel languages, simple Virtual Machines, that sort of thing.

BTW. That Russian company mentioned is doing nothing new, it’s called bytecode & JIT (Just In Time) compiling. Several modern languages, like Java, already use those techniques.
2004-07-14 12:28 am

Anonymous
Of course they wrote an OS, granted the idea of Mac OS was gleaned from Xerox but apple have writted an OS. It depends on wether you consider OSX Apple made or NeXT made or even Berkley made

And for those that have never heard of it, Apple also wrote DOS FROM SCRATCH IN HOUSE

(before anyone flames, DOS is not exclusive to Microsoft many companys have made a ‘Disk Operating System’)
2004-07-14 12:31 am

Anonymous
http://www.fact-index.com/a/ap/apple_dos.html

just in case anyone wants to read an abridged version
2004-07-14 12:50 am

Anonymous
Nick alludes to the power of FPGAs but they are not as hard to program as he suggested.

One way to design FPGA HW is to write code in a more familiar language such as C with support for Par communication added. Like handelC which was based off the Occam language originally run on the Transputer chip of the 80s.

Indeed the Transputer he neglected to mention was the 1st commercial cpu chip that could be easily plugged together as many as you want using simple links. Inmos never got as far as integrating 2 cpus after all they were built 20yrs ago, but the new generation of multicore cpus do not offer much over the original Transputer except ofcourse raw speed and modern process. Still its nice to see massively par cpus becoming mainstream again.

Ofcourse a few of us are building cpus out of FPGAs, not as fast as the 64b cpus from AMD, but actually cheaper by the mips because they fit into low cost FPGAs, a few $ per copy. Indeed 1 large FPGA can theoretically hold 1-100+ cpu instances depending on the cpu complexity, but it will likely be hot

And 10 cpus running 1/10 clock can be more powerful than 1 regular cpu PROVIDED you know how to program parallel apps. For slower cpus, the available memories look 10x faster to an FPGA than to a 2GHz waiting machine.

Indeed its possible to use much higher quality memories for FPGA cpu than regular DDR, ie a slow FPGA cpu can run RLDRAM that is a couple times faster than DDR so making up for the much smaller caches that must be used.

Myself I am creating a modern Transputer Risc ISA that would run BeOS like a dream, but I’ll have to leave that to others. Others are building more conventional Riscs but those are not usually scaleable.

JJ
2004-07-14 1:31 am

Anonymous
in many ways this was like reading the gear books for the good old pen and paper rpg cyberpunk 2020. the talk of multijob, multiprosessor units and so on gave me a flash of the b&w pencil drawing from them for some reason…

a os is nothing without something to run it on. and the original concept of a os was a system that was around so you didnt have to write hardware interface code every time. this is what the linux kernel/os is. it allows for hardware and filesystem access and nothing more. the rest is up to the software that interface with the os, be it file system management tools, guis, web browsers and whats not…

i wonder what would happen if the “drivers” where stored on firmware chips so that when you turned the system on the bios would gather these parts up and create a kind of in memory os from these pars. then it would toss a list of partitions with a boot indicator on screen with a timeout aiming for the one with the highest priority. then it would hand everything over to the “os” that would then fire up a boot program that had only one job, read a boot script and fire up the stuff defined in there. this is mutch what init for linux does. the diffrent parts of the “os” would be running on the chips of the hardware so when you fired up a gui it would be running on the chip of the video card totaly. fire up a mp3 player or similar and it would run on the soundcard chip with a gui part tossed of to the video card for rendering.

i am no computer engineer so i dont know if this stuff could work at all, but i guess that microsoft would be kicking up all kinds of fud against it
2004-07-14 2:06 am

Anonymous
The article mentions some good ideas on the third page. Here’s my take on them:

In my ideal platform, we’d dump the “jumble of binaries” model and go to a “code database” model. Instead of shipping programs as binaries, programs would be shipped in a low-level intermediate representation. At installation, the intermediate code would be submitted to a central controller, which would compile the IL to native code and store it in a cache. This central controller would sit at the lowest levels of the system, even below the bulk of the OS.

The spiffy part of all this is that it’d allow for all sorts of cool optimizations:

1) Code could be optimized for each processor.

2) Analysis could easily cross library boundries, allowing optimizations like inlining of library code.

3) “Optimistic” optimizations could be applied, and backed out when the original assumptions that enabled them no longer applied.

4) IL code could be compiled via a safe compiler, which would allow us to get rid of memory protection between processes, and between the OS and user programs.

The performance benefits of this approach could be very significant, especially given the high-cost of protection-level transitions on modern processors, and their large dependence on compiler-time instruction scheduling and register allocation. This would also allow the underlying CPU architecture to become completely irrelevant. The only thing that would need to change when targeting a new CPU would be the code-generator in the central controller.
2004-07-14 4:34 am

Anonymous
can you say cluster?

from a hardware point of view this is my take on “NextGen”

easy clustering… one case, a lot of slots, processor daughter boards and a good old competition between tycoons to push the cheapest versions of motherboard/daughterboards.

“Next gen” will also have an internet managed OS: I just subscribe to a site that does the managing… and use any “next gen” computer I have access to transparently. Personal data/files will be held on some CF Xtreme Needed apps will be “simple” scripts glueing together very high level widgets (also maintained by internet OS). Portables will be able to harvest “in range” processing power.
2004-07-14 4:58 am

Anonymous
The PC really deserves to be on that list, judging by the others. At the time, a machine built from cheap off the shelf components running a third party OS – and not tied to that OS – was at least as revolutionary as the Apple ][ or Archimedes (first PC featuring RISC – at most a minor issue of semantics – is hardly a “revolution”).
2004-07-14 8:40 am

Anonymous
I really don’t know if I would regard BeOS as revolutionary. BeOS did a lot of things right. I still marvel at the ease of use, when I boot it. But most of its strength comes from being legacy free. Starting from a clean slate and doing it right.

Personally I think NeXTStep 5 years earlier were more revoulutionary in terms of API and GUI. Had NeXTStep only had the Be filesystem…
2004-07-14 12:14 pm

Anonymous
What about Copland? It wasn’t finished, but it was written alone by Apple!

Also, RE: Revolutionary / Evolutionary, I agree that the most important factor in designing a fast, stable OS is to start from scratch, with backward compatability only in networking and file compatibility. If only the creators of NT had abandoned old software compatibility as part of the OS, and instead run old software in a slimmed down emulation layer, windows would currently be a lot faster and far more stable.
2004-07-14 2:33 pm

Anonymous
Actually, Apple did both. In particular, Apple wrote the Apple Dylan IDE. The IDE was a very powerful Smalltalk-style development environment for the Dylan language, which was invented in-house at Apple.

See: http://monday.sourceforge.net/wiki/index.php/AppleDylanEulogy

Beyond that, they wrote AUX (Apple’s UNIX), and at least two OSs for the Newton (a Dylan-based one and a C++ based one), as well as lots of powerful tools for the Newtonscript language.
2004-07-14 2:35 pm

Anonymous
Actually, NT does run Win32 in what amounts to an emulation layer (a “personality server”). It had personality servers from POSIX and OS/2 as well, but over time, they kind of deprecated the idea and moved towards a Win32-only system.
2004-07-14 3:52 pm

Anonymous
I think we’re going to see a lot of reconfigurable logic in the future. FPGAs got immensely capable in the last years and now that it’s possible to reconfigure them at run time, there are a lot of exciting possibilities.

Another key point will be networks: Not like the ones now, but instead a dense self organizing network of virtually every equipment which posesses some form of intelligence. We will see OSes which run “on the net” rather than on a single computer, these OSes will encapsulate and integrate all the processing power in their reach and present you with a single view and interface to access this processing power.
2004-07-14 4:13 pm

Anonymous
I would point out that Apple’s DOS, like Microsoft’s (or Seattle Computer Products) is not an operating system, it is a file system. An OS has to provide, at a minimum, management support for memory use and processor scheduling (threading). DOS does neither and therefore does not meet the standard of a computer ‘operating system’.
2004-07-14 4:20 pm

Anonymous
We have a multi-monopoly and a monopoly working together to make sure people stay on the upgrade treadmill forever.

It would take a giant industry consortium of the “non-aligned” tech powers to unseat Wintel.
2004-07-14 4:23 pm

Anonymous
Each time there is a retrospective of the heroic years of computing, you see apple as a pioneer, giving the GUI and the mouse to masses.

http://en.wikipedia.org/wiki/Smaky

It came earlier. And with an OS that puts the macOS of the time to shame.

Really. Sometimes great advances in computing do not come from the US — but when it happens, the details are lost…
2004-07-14 11:02 pm

Anonymous
Part 1 of a 12 part series written by a 30 year old with junior high writing skills and a vast ignorance of current technology.

Dream, Nick, Dream. Then realize that thousands of smarter people before you dreamt better dreams and are building them now.

This comment will self destruct when Eugenia takes personal offense.
2004-07-14 11:45 pm

Anonymous
Well written and interesting. Good work…
2004-07-15 12:45 am

Anonymous
Part 1 of a 12 part series written by a 30 year old with junior high writing skills and a vast ignorance of current technology.

Dream, Nick, Dream. Then realize that thousands of smarter people before you dreamt better dreams and are building them now.

This comment will self destruct when Eugenia takes personal offense.

Hahah!
2004-07-15 4:25 am

Anonymous
is there an instruction set that goes beyond cisc, risc and vliw and epic?

personally, since hitachi superh core only takes up like 3mm, but performs on-par with a 1-ghz pentium 3, and many 90nm processors have die size of 120mm, why not have a 40-cpu multicore superh processor (allowing for built-in controller and cache size)
2004-07-15 9:54 am

Anonymous
Amiga Inc in partnership with Tao have an offerring that automatically compiles intermediate code to different OSs and chipsets called AmigaAnywhere. It also is capable of scaling its display from PDAs to desktops. The Intent system, as it is called, was created by game programmers who were sick of porting from the Atari ST and Amiga and wanted a system to make it easy. After an initial and possibly early release it is undergoing further development and will be re-released with more powerful APIs which will hopefully give this promising dev kit a shot in the ARM for crossplatform development. I’m looking forward to it.

The once touted roadmap(silence of late) for the AmigaOS(v5) would feature this portability by having the OS written entirely in intermediate code. OK, Intent sounds like Java but it will turn various popular languages into IL, kinda like .NET but for all processors and OSs. You need to purchase a player though.

Incidentally, if you don’t know, a PPC powered AmigaOne is in development, so there will be a new PPC based platform and PPC based OS(AmigaOS4) on the block. The specs aren’t much to write home about at the moment unfortunately, but I consider this STEP ONE to greater things and more powerful chipsets. One of the draws of the microA1 is its small size as the motherboard will be microITX standard(17cm by 17cm). This PPC AmigaOS(v4) has a 68k emulator for old apps. It is apparently more than a port of 68k AmigaOS v3.9. Its small footprint both in Hardware and OS will make it ideal for broadband enabled set-top boxes that distribute content via wireless, and there is sure to be a media kit like environment that it will leverage to play movies and MP3s. I am quite excited by this, and it will HOPEFULLY see the Amgia return as the king of media.

I am sure there are many project featuring FPGAs(?). One that I know about is the Commodore ONE project(OK..now called C-ONE due to naming disputes) that aims to reconfigure itself to various ancient CPU architectures like the 6502 or Z80 for instance. Apparently it can change on the fly too. It features a SID chip. Google it for more info.
2004-07-15 10:46 am

Anonymous
The Archimedes computer mentioned in the article progressed into the RiscPC range of computers and they are now owned by Castle Technology in the UK.

They sell a Xscale ARM chip in their IyonixPC using RISC OS, a GUI OS first developed in 1985(!)

http://www.castle-technology.co.uk/castle/front.html

Another company also produces an interesting ‘next gen’ desktop computer -The Omega Computer from MicroDigital UK.

This uses a number of FPGAs in its design to enable most of its hardware functions (video, io, etc.) to be reprogrammed should the need occur by the user.

http://www.microdigital.co.uk/omspec.htm

This is a 300MHz ARM based desktop PC using RISC OS Select as its operating System. 300MHz may seen slow to IBM Compat. PC owners but it’s running a very fast OS.

An interface called the ARMTwister (a Intel Xscale 600MHz is due out sometime which will allow even faster operation.
2004-07-15 12:09 pm

Anonymous
I think most people out there don’t realize how FPGAs works.

I’ll try to explain, and sorry for my elementary school english.

FPGAs are essentially made of two things :

– logic resources : for example :

– simple D flip-flop

– Embedded memory of a few kilobits

– Full multiplier

– “Hard IP” Microprocessor ( ARM for Altera, PowerPC for Xilinx )

– Programmable interconnects.

The “logic resources” can be as fast as in ASICs as common RAM based FPGA uses a standard hi speed logic process.

The “programmable interconnect” are slower than in ASICs as they are basically replaced with wires.

Some cheap ASICs are called “prediffused” as the “logic cells” are made once for all in silicon and customers only need to pay for metal layers masks.

Some FPGAs uses AntiFuse OTP ( one time programming ) technologies ( Actel ) that are quite like prediffused ASICs because the fuses are made on polysilicium interconnect layers.

Highest density FPGA are RAM based, they download their configuration at power up into RAM cells that command the switches of the “programmable interconnections” . It is a quite cheap technology as it is based on standard processes. ( for example, Xilinx uses the 90nm process of IBM factories to build their next gen FPGAs ). Usually, these FPGAs are not reprogrammed after power up but many have had the idea of using that behaviour ( that can be a major problem as these FPGAs need time to become fully operational, say, up to 1 second ) to do dynamic reconfiguration.

There are two kind of possible reconfigurations schemes :

– Change memory elements : Constant tables, registers contents, microcode.

It’s quite easy to do that.

– Change the behaviour.

It’s an awful mess because common FPGAs are not geared for partial reconfiguration. In software, you can swap a DLL without recompiling your app, in a FPGA, usually, changing a single D flipflop may break the entire design ( altough signifiant improvement are being made now ). FPGA compilers are very slow and if the partial reconfiguration is error prone, it may introduce real shortcuts inside the chip ( whereas, in general, no bug ridden software can break hardware :-).

Another problem is that if you need to reconfigure a signifiant part of the FPGA, it will take much time to download the configuration. The binary efficiency of FPGA fabric code is much lower than common assembly language. The performance gain of the FPGA will be mitigated by the bandwith needed to dynamically transform the FPGA.

The whole point is that you won’t expect any performance gain for using a FPGA instead of an ASIC :

– For general purpose processing, a CPU will be faster

– For Signal processing, a DSP would be cheaper ( smaller die ) if it can handle the calculations.

– For some complex stream tasks : Well, what kind of complex stream ? 3D rendering ? A 3D chip will do the job better.

Common desktop CPU take most of their time moving small chunks of memory, swapping registers, pushing/popping from the stack, … Nothing very streamlineable, nothing that can be efficiently implemented in a VLIW number crunching architecture.

( I’ve once built a DSP based parallel computer for astronomical data processing. Datas were basically independant and as such connecting together 40 DSP made calculations 39 faster. Basically, there no such speed improvement to expect on your PC )

FPGA vendor do not expect to replace Intel and AMD as big cpu builder. For low end CPUs, they offer soft libraries that can be implemented in the FPGA fabric. For higher performance, they embed hard coded processors ( Xilinx offers up to 4 400MHz PowerPCs in a single FPGA ). For insane performance, get a real CPU.

Finally, high FPGA are becoming quite affordable and anyone can experiment new CPU architecture in his/her garage ( for about $2000 to have a useable board ). Maybe you could find really innovative structures ( Objet oriented CPU, hardware garbage collector, dynamic code and data compression… ), but, once you’ve patented your brilliant idea, look for cash to burn a real high performance ASIC that will dwarf Intel….

( Another area where FPGA are really exciting is Neural Networks, transforming logic cells into neurons and programmable interconnects into dendrites… )
2004-07-16 6:10 am

Anonymous
It says that I wanted a low-cost computer and that Jobs changed it to an “easy to use” orientation. It started from an easy-to-use orientation, that was my reason for starting the Macintosh project in the first place.

Please check your history (e.g. read the primary sources, such as in the Stanford Univ. History of Technology collection (mostly online)) before you write. Be a good journalist. Get the facts.
2004-07-16 6:14 am

Anonymous
I remember the CDC 6600 quite well, having programed on one here in NYC in the mid sixties when I went to stuyvesant HS. I was taking FORTRAN IV (remember that?), and the computer was in the Courand Mathamatical Institute nearby. It was quite a machine back then. The 6000 and 6600 were the first supercomputers though the performance would just elicit a giggle today.

FPGA’s are inefficient due to the poor routing. The best paths are rarely able to be taken. There are other problems as well.

It remains to be seen whether increasing processors beyond two in a desktop-like system is practical for most work. Only some problems can be effectively broken down in this way.

It’s unlikely that multiple cell processors will be useful in anything but a graphics enviornment.

Back in the “old days” every machine was mostly unique, and had it’s own OS. Pretty much the only things in common were the cpu’s and the memory chips.

There were some S100 buss machines, and later some used CPM, but all the big guys built their own machines from the ground up. The IBM Pc killed most all of that.

Apple’s Xcode 2, which will be released with Tiger next year, will do automatic compiling for the Altivec vector unit, which will give all programs access to that portion of the chip for the first time, as programmers now don’t want to spend the time optimizing programs they don’t (sometimes wrongly) think need it.
2004-07-17 1:03 am

Anonymous
You completely omitted lisp machines from your history, probably because you never used one (I haven’t either), but I would argue that they’re just as revolutionary, if not more so than many of the other consumer oriented pieces you mentioned.

And that said, I think the industry has proven, time and time again that worst-wins or cheapest-wins, either way. Hardware is a hugely expensive proposition to design and develop, far more so than software. I have a friend who designs DRAM among other things, assuming you have a design which is six layers, each layer let’s just say is $10,000, and the whole wafer is thus $60,000. Now, if you have a bug in your hardware design, you need to replace that layer. If this happens at the last layer, you’re out $10,000 to replace the one wafer (in a product line before mass-producing, they’ll role out a wafer for each layer, just for this sort of debugging). But, if you’re bug happens to be in the first layer, BAM $60,000 mistake. Can you, or any other hobbiest afford to spend that much on a bug in your design?

Not saying that hardware can’t be messed with, but at a mass-production level, things get fugly. Anyway, point being – the winning situation is to use commodity hardware, best of breed perhaps (e.g. AMD64’s and some sort of GPU that you can use to offload things onto, or heck even something tiny like a C3 that has wicked AES implementations), and go from there. Maybe later if you find success hardware will help support you (e.g. linux driver support is creeping in official channels now) but that’s probably the last thing to happen.

That said, you can start at lower levels than you might think. Overhauling the BIOS could yield some real improvements, look at LinuxBIOS, or various OpenFirmware products. Heck, even http://www.soekris.com has implemented his own BIOS which offers serial console support and boots more OS’s than LinuxBIOS does.

Hardware is a space that is extremely difficult to get into on a small scale, and if you do – then yeah, stick with fpga’s and home designed PCB’s and the like, forget about multi-core multi-threaded CPU’s or 256bit GPU’s or anything pushing the edge (even GPU’s aren’t anywhere near the same fab processes as cpu’s are now). Leave the cutting edge to those who can throw billions at fabs.
2004-07-18 5:13 pm

Anonymous
>Mac two-line description wrong

Fixed.

>grey

No, the idea is not to do custom chips at all. There’s no real point these days unless you can really do something special and even then it is extraordinarily expensive.

>Anonymous

is there an instruction set that goes beyond cisc, risc and vliw and epic?

The problem with VLIW is it depends on the compiler extracting all the parallelisim in the instruction stream, if it can’t it doesn’t help.

I didn’t specify which CPU I’d use as I didn’t think yet another CPU fame war in the forum would be any benefit.

personally, since hitachi superh core only takes up like 3mm, but performs on-par with a 1-ghz pentium 3, and many 90nm processors have die size of 120mm, why not have a 40-cpu multicore superh processor (allowing for built-in controller and cache size)

Yes, that’s what I expect, dozens of cores on a CPU so you can then spread out the OS and applications across them.

>Various missing machines

The list was only to give a few examples, it was not meant to be exaustive.
2004-07-19 12:19 am

Anonymous
I would like to tie two ideas together, 1: Untying distributed software from the target CPU hardware and 2: The use of FPGAs to provide custom logic/hardware acceleration.

These ideas are based the two products below.

http://www.transmeta.com/crusoe/index.html

http://www.stretchinc.com/products.php

If you married these two current together, replaced the risc core in the Stretch with Transmeta’s core you would have the best of both worlds.

Here is how I think it would work:

1. Write your program in c/c++ (other language)

2. Profile your code with the stretch tools to find locations for custom logic acceleration.

3. Compile your code to an “intermediate representation” with lots of optimisation hints (static optimisations) in the code.

4. When the software is installed it is compiled to the target hardware using hints found in the code, leaving some hints in.

5. When the code in running on the cpu it is being monitored by “code morphing” software to apply runtime optimisations and possibly find new code for custom logic acceleration.

The advantages I see for this approach are:

1. Always haveing highly ptimised code for your hardware: runtime optimisations for your actual usage patterns, better optimisations than Transmeta’s code morphing because of the static analysis.

2. Simpler/better code morphing software because of the pre-compiling to match the code morphing instruction set.

3. The ability to create new custom logic on the fly to enhance/ accelerate the code morphing software.

4. Software developers can take easy advantage of the Stretch FPGA abilities.

5. When improved cpus come out, new designs or with more resources the software can take advantage of it. ( Might require a installation recompile on cpu upgrade.)

6. Allow hardware acceleration of virtual machines by the code morphing software: Possibly look like coprocessors to the OS. (reduces the levels of abstraction / emulation. Eg java on JVM->X86->code morphing becomes java on code morphing) New instruction sets or virtual machines could be added by a plug in interface to the code morphing layer. The number of plug-ins per cpu would probability be small, but off set by the multiple cpus (eg 4)