Linked by Nicholas Blachford on Tue 13th Jul 2004 21:56 UTC
Hardware, Embedded Systems After personal computers arrived in the 1970's they went through a series of revolutionary changes delivered by a series of different platforms. It's been over a decade since we've seen anything truly revolutionary, will we see a revolution again? I believe we could not only see revolution again, we could build it today.
Order by: Score:

Missing data:
by Roguelazer on Tue 13th Jul 2004 22:35 UTC

>> When these platforms arrived everything was done in-house,
>> and I mean everything: Hardware, Casing, OS, Applications,
>> Development environment and Compiler. Nobody does all of
>> that today and nobody has since 1994s BeBox when Be inc.
>> had to create an entire system from the OS core to the
>> media player and app-lets which ran on top.

Software wise, I'd say that SkyOS fits your bill of "in-house". Perhaps you should have done a little more research on some of this?

In general, this is a pretty good article. It'll be worth reading the "Part II".

The Future
by Anonymous on Tue 13th Jul 2004 23:01 UTC

Multi core, multi threaded CPU's are already on the roadmaps.
GUI's will be rendered on the grahics card like Quartz and Avalon
USB Flash drives have already pretty much replaced the Floppy drive
PCI express allows for 10gigE network cards.

As far as software, the way I use a computer today has not really changed since I used Win95.

I don't think a computer will ever be able to "think for me" without just being in my way.

v Story
by iongion on Tue 13th Jul 2004 23:06 UTC
He said everything.
by Earl Colby Pottinger on Tue 13th Jul 2004 23:15 UTC

Where does SkyOS build hardware, Casing or the Compiler in house?

v Apple never wrote compilers
by QuantumG on Tue 13th Jul 2004 23:52 UTC
not a bad article
by AlienSoldier on Tue 13th Jul 2004 23:58 UTC

first, kudos on the reference, i would like to see that more common into tech journalism.

The arcticle is interesting, even if it have some error ;) The Atari Falcon did feature a DSP. Forgetting The russian Setun computer and the CBM SID chip is criminal.

I think the only way to make think move it to be a treath to the big guys. This mean doing PCI (or the standard that will be next) surface mount card that can unload part of the OS. I could really see a "Haiku detected your Media kit hardware card and will use it" at boot time just like old psygnosis amiga game did with extented memory card for A500.

Another break to this are laptop, the do it yourself hardware innovation is pretty much constrained to open big noisy beige box.

Another break to this is the number of hardware guy in engineering school, this is bad. I don't even myself do surface mount anymore and know very few guy that can do that at home.

Another problem, standard. A do it youself standard can pretty much only be imposed trough a known institution/fundation as a contest. Or perhaps by carving porn scene on top of each IC ;)

Most of today's computer systems all have 1 thing in common: they are programmed to do something with data. There is data, and one or more CPU's eat through streams of instructions, telling them what to do with that data.

Example:
(1) take one data element
(2) perform some operation on that data
(3) place modified data back into storage
(4) repeat (1) to (3) over and over, going through all data

Call them instruction-stream centered. In my vision the next big step: there is data, and all the programming does, is re-configure the way in which streams of data are processed. Call that: data-stream centered. The FPGA is a good example of how this works.

Example:
(1) configure what path data comes in, what operation is done, and along which path data should go out
(2) wait until all data has passed through the hardware

This is much more energy-efficient, and hardware to do this, exists (and is used) today. In my view the real problems are not building this hardware, but the software to control it. Think operating system, programming language design. Way more difficult than putting the IC's together.
I am convinced though, that the simpler the essentials become, the more useful they are in building complex systems. So the way to go: RISC CPU's, (lightweight) kernel languages, simple Virtual Machines, that sort of thing.

BTW. That Russian company mentioned is doing nothing new, it's called bytecode & JIT (Just In Time) compiling. Several modern languages, like Java, already use those techniques.

RE: Apple never wrote compilers or even an OS.
by Adurbe on Wed 14th Jul 2004 00:28 UTC

Of course they wrote an OS, granted the idea of Mac OS was gleaned from Xerox but apple have writted an OS. It depends on wether you consider OSX Apple made or NeXT made or even Berkley made

And for those that have never heard of it, Apple also wrote DOS FROM SCRATCH IN HOUSE

(before anyone flames, DOS is not exclusive to Microsoft many companys have made a 'Disk Operating System')

http://www.fact-index.com/a/ap/apple_dos.html

just in case anyone wants to read an abridged version

FPGAs
by JJ on Wed 14th Jul 2004 00:50 UTC

Nick alludes to the power of FPGAs but they are not as hard to program as he suggested.

One way to design FPGA HW is to write code in a more familiar language such as C with support for Par communication added. Like handelC which was based off the Occam language originally run on the Transputer chip of the 80s.

Indeed the Transputer he neglected to mention was the 1st commercial cpu chip that could be easily plugged together as many as you want using simple links. Inmos never got as far as integrating 2 cpus after all they were built 20yrs ago, but the new generation of multicore cpus do not offer much over the original Transputer except ofcourse raw speed and modern process. Still its nice to see massively par cpus becoming mainstream again.

Ofcourse a few of us are building cpus out of FPGAs, not as fast as the 64b cpus from AMD, but actually cheaper by the mips because they fit into low cost FPGAs, a few $ per copy. Indeed 1 large FPGA can theoretically hold 1-100+ cpu instances depending on the cpu complexity, but it will likely be hot

And 10 cpus running 1/10 clock can be more powerful than 1 regular cpu PROVIDED you know how to program parallel apps. For slower cpus, the available memories look 10x faster to an FPGA than to a 2GHz waiting machine.

Indeed its possible to use much higher quality memories for FPGA cpu than regular DDR, ie a slow FPGA cpu can run RLDRAM that is a couple times faster than DDR so making up for the much smaller caches that must be used.

Myself I am creating a modern Transputer Risc ISA that would run BeOS like a dream, but I'll have to leave that to others. Others are building more conventional Riscs but those are not usually scaleable.

JJ

nuts
by hobgoblin on Wed 14th Jul 2004 01:31 UTC

in many ways this was like reading the gear books for the good old pen and paper rpg cyberpunk 2020. the talk of multijob, multiprosessor units and so on gave me a flash of the b&w pencil drawing from them for some reason...

a os is nothing without something to run it on. and the original concept of a os was a system that was around so you didnt have to write hardware interface code every time. this is what the linux kernel/os is. it allows for hardware and filesystem access and nothing more. the rest is up to the software that interface with the os, be it file system management tools, guis, web browsers and whats not...

i wonder what would happen if the "drivers" where stored on firmware chips so that when you turned the system on the bios would gather these parts up and create a kind of in memory os from these pars. then it would toss a list of partitions with a boot indicator on screen with a timeout aiming for the one with the highest priority. then it would hand everything over to the "os" that would then fire up a boot program that had only one job, read a boot script and fire up the stuff defined in there. this is mutch what init for linux does. the diffrent parts of the "os" would be running on the chips of the hardware so when you fired up a gui it would be running on the chip of the video card totaly. fire up a mp3 player or similar and it would run on the soundcard chip with a gui part tossed of to the video card for rendering.

i am no computer engineer so i dont know if this stuff could work at all, but i guess that microsoft would be kicking up all kinds of fud against it ;)

To expound on the third page
by Rayiner Hashem on Wed 14th Jul 2004 02:06 UTC

The article mentions some good ideas on the third page. Here's my take on them:

In my ideal platform, we'd dump the "jumble of binaries" model and go to a "code database" model. Instead of shipping programs as binaries, programs would be shipped in a low-level intermediate representation. At installation, the intermediate code would be submitted to a central controller, which would compile the IL to native code and store it in a cache. This central controller would sit at the lowest levels of the system, even below the bulk of the OS.

The spiffy part of all this is that it'd allow for all sorts of cool optimizations:

1) Code could be optimized for each processor.
2) Analysis could easily cross library boundries, allowing optimizations like inlining of library code.
3) "Optimistic" optimizations could be applied, and backed out when the original assumptions that enabled them no longer applied.
4) IL code could be compiled via a safe compiler, which would allow us to get rid of memory protection between processes, and between the OS and user programs.

The performance benefits of this approach could be very significant, especially given the high-cost of protection-level transitions on modern processors, and their large dependence on compiler-time instruction scheduling and register allocation. This would also allow the underlying CPU architecture to become completely irrelevant. The only thing that would need to change when targeting a new CPU would be the code-generator in the central controller.

next generation?
by Peter on Wed 14th Jul 2004 04:34 UTC

can you say cluster?
from a hardware point of view this is my take on "NextGen"
easy clustering... one case, a lot of slots, processor daughter boards and a good old competition between tycoons to push the cheapest versions of motherboard/daughterboards.
"Next gen" will also have an internet managed OS: I just subscribe to a site that does the managing... and use any "next gen" computer I have access to transparently. Personal data/files will be held on some CF Xtreme ;) Needed apps will be "simple" scripts glueing together very high level widgets (also maintained by internet OS). Portables will be able to harvest "in range" processing power.

Missing the IBM PC
by drsmithy on Wed 14th Jul 2004 04:58 UTC

The PC really deserves to be on that list, judging by the others. At the time, a machine built from cheap off the shelf components running a third party OS - and not tied to that OS - was at least as revolutionary as the Apple ][ or Archimedes (first PC featuring RISC - at most a minor issue of semantics - is hardly a "revolution").

Revolutionary / Evolutionary
by Peter Mogensen on Wed 14th Jul 2004 08:40 UTC

I really don't know if I would regard BeOS as revolutionary. BeOS did a lot of things right. I still marvel at the ease of use, when I boot it. But most of its strength comes from being legacy free. Starting from a clean slate and doing it right.
Personally I think NeXTStep 5 years earlier were more revoulutionary in terms of API and GUI. Had NeXTStep only had the Be filesystem...

RE: Apple never wrote compilers or even an OS
by Joe on Wed 14th Jul 2004 12:14 UTC

What about Copland? It wasn't finished, but it was written alone by Apple!

Also, RE: Revolutionary / Evolutionary, I agree that the most important factor in designing a fast, stable OS is to start from scratch, with backward compatability only in networking and file compatibility. If only the creators of NT had abandoned old software compatibility as part of the OS, and instead run old software in a slimmed down emulation layer, windows would currently be a lot faster and far more stable.

Actually, Apple did both. In particular, Apple wrote the Apple Dylan IDE. The IDE was a very powerful Smalltalk-style development environment for the Dylan language, which was invented in-house at Apple.

See: http://monday.sourceforge.net/wiki/index.php/AppleDylanEulogy

Beyond that, they wrote AUX (Apple's UNIX), and at least two OSs for the Newton (a Dylan-based one and a C++ based one), as well as lots of powerful tools for the Newtonscript language.

@Joe
by Rayiner Hashem on Wed 14th Jul 2004 14:35 UTC

Actually, NT does run Win32 in what amounts to an emulation layer (a "personality server"). It had personality servers from POSIX and OS/2 as well, but over time, they kind of deprecated the idea and moved towards a Win32-only system.

Reconfigurable Logic
by Sebastian on Wed 14th Jul 2004 15:52 UTC

I think we're going to see a lot of reconfigurable logic in the future. FPGAs got immensely capable in the last years and now that it's possible to reconfigure them at run time, there are a lot of exciting possibilities.
Another key point will be networks: Not like the ones now, but instead a dense self organizing network of virtually every equipment which posesses some form of intelligence. We will see OSes which run "on the net" rather than on a single computer, these OSes will encapsulate and integrate all the processing power in their reach and present you with a single view and interface to access this processing power.

RE: Apple never wrote compilers or even an OS.
by Ken on Wed 14th Jul 2004 16:13 UTC

I would point out that Apple's DOS, like Microsoft's (or Seattle Computer Products) is not an operating system, it is a file system. An OS has to provide, at a minimum, management support for memory use and processor scheduling (threading). DOS does neither and therefore does not meet the standard of a computer 'operating system'.

as long as microsoft/intel exist, there will be no progress
by wishful thinking on Wed 14th Jul 2004 16:20 UTC

We have a multi-monopoly and a monopoly working together to make sure people stay on the upgrade treadmill forever.

It would take a giant industry consortium of the "non-aligned" tech powers to unseat Wintel.

Apple obsession...
by Hmmmm on Wed 14th Jul 2004 16:23 UTC

Each time there is a retrospective of the heroic years of computing, you see apple as a pioneer, giving the GUI and the mouse to masses.

http://en.wikipedia.org/wiki/Smaky

It came earlier. And with an OS that puts the macOS of the time to shame.

Really. Sometimes great advances in computing do not come from the US -- but when it happens, the details are lost...

Oh GOD
by fore on Wed 14th Jul 2004 23:02 UTC

Part 1 of a 12 part series written by a 30 year old with junior high writing skills and a vast ignorance of current technology.

Dream, Nick, Dream. Then realize that thousands of smarter people before you dreamt better dreams and are building them now.

This comment will self destruct when Eugenia takes personal offense.

Good article
by th on Wed 14th Jul 2004 23:45 UTC

Well written and interesting. Good work... ;)

Re: fore (IP: ---.CS.UCLA.EDU) - Posted on 2004-07-14 23:02:03
by Anonymous on Thu 15th Jul 2004 00:45 UTC

Part 1 of a 12 part series written by a 30 year old with junior high writing skills and a vast ignorance of current technology.

Dream, Nick, Dream. Then realize that thousands of smarter people before you dreamt better dreams and are building them now.

This comment will self destruct when Eugenia takes personal offense.


Hahah!

beyond cisc, risc, vliw, epic?
by Anonymous on Thu 15th Jul 2004 04:25 UTC

is there an instruction set that goes beyond cisc, risc and vliw and epic?

personally, since hitachi superh core only takes up like 3mm, but performs on-par with a 1-ghz pentium 3, and many 90nm processors have die size of 120mm, why not have a 40-cpu multicore superh processor (allowing for built-in controller and cache size)

Seems the Amiga is heading the write direction
by BigBenAussie on Thu 15th Jul 2004 09:54 UTC

Amiga Inc in partnership with Tao have an offerring that automatically compiles intermediate code to different OSs and chipsets called AmigaAnywhere. It also is capable of scaling its display from PDAs to desktops. The Intent system, as it is called, was created by game programmers who were sick of porting from the Atari ST and Amiga and wanted a system to make it easy. After an initial and possibly early release it is undergoing further development and will be re-released with more powerful APIs which will hopefully give this promising dev kit a shot in the ARM for crossplatform development. I'm looking forward to it.

The once touted roadmap(silence of late) for the AmigaOS(v5) would feature this portability by having the OS written entirely in intermediate code. OK, Intent sounds like Java but it will turn various popular languages into IL, kinda like .NET but for all processors and OSs. You need to purchase a player though.

Incidentally, if you don't know, a PPC powered AmigaOne is in development, so there will be a new PPC based platform and PPC based OS(AmigaOS4) on the block. The specs aren't much to write home about at the moment unfortunately, but I consider this STEP ONE to greater things and more powerful chipsets. One of the draws of the microA1 is its small size as the motherboard will be microITX standard(17cm by 17cm). This PPC AmigaOS(v4) has a 68k emulator for old apps. It is apparently more than a port of 68k AmigaOS v3.9. Its small footprint both in Hardware and OS will make it ideal for broadband enabled set-top boxes that distribute content via wireless, and there is sure to be a media kit like environment that it will leverage to play movies and MP3s. I am quite excited by this, and it will HOPEFULLY see the Amgia return as the king of media.

I am sure there are many project featuring FPGAs(?). One that I know about is the Commodore ONE project(OK..now called C-ONE due to naming disputes) that aims to reconfigure itself to various ancient CPU architectures like the 6502 or Z80 for instance. Apparently it can change on the fly too. It features a SID chip. Google it for more info.

The Archimedes computer mentioned in the article progressed into the RiscPC range of computers and they are now owned by Castle Technology in the UK.
They sell a Xscale ARM chip in their IyonixPC using RISC OS, a GUI OS first developed in 1985(!)
http://www.castle-technology.co.uk/castle/front.html

Another company also produces an interesting 'next gen' desktop computer -The Omega Computer from MicroDigital UK.
This uses a number of FPGAs in its design to enable most of its hardware functions (video, io, etc.) to be reprogrammed should the need occur by the user.

http://www.microdigital.co.uk/omspec.htm

This is a 300MHz ARM based desktop PC using RISC OS Select as its operating System. 300MHz may seen slow to IBM Compat. PC owners but it's running a very fast OS.

An interface called the ARMTwister (a Intel Xscale 600MHz is due out sometime which will allow even faster operation.



FPGA
by Zgog on Thu 15th Jul 2004 12:09 UTC


I think most people out there don't realize how FPGAs works.

I'll try to explain, and sorry for my elementary school english.

FPGAs are essentially made of two things :
- logic resources : for example :
- simple D flip-flop
- Embedded memory of a few kilobits
- Full multiplier
- "Hard IP" Microprocessor ( ARM for Altera, PowerPC for Xilinx )
- Programmable interconnects.

The "logic resources" can be as fast as in ASICs as common RAM based FPGA uses a standard hi speed logic process.

The "programmable interconnect" are slower than in ASICs as they are basically replaced with wires.
Some cheap ASICs are called "prediffused" as the "logic cells" are made once for all in silicon and customers only need to pay for metal layers masks.
Some FPGAs uses AntiFuse OTP ( one time programming ) technologies ( Actel ) that are quite like prediffused ASICs because the fuses are made on polysilicium interconnect layers.

Highest density FPGA are RAM based, they download their configuration at power up into RAM cells that command the switches of the "programmable interconnections" . It is a quite cheap technology as it is based on standard processes. ( for example, Xilinx uses the 90nm process of IBM factories to build their next gen FPGAs ). Usually, these FPGAs are not reprogrammed after power up but many have had the idea of using that behaviour ( that can be a major problem as these FPGAs need time to become fully operational, say, up to 1 second ) to do dynamic reconfiguration.

There are two kind of possible reconfigurations schemes :
- Change memory elements : Constant tables, registers contents, microcode.
It's quite easy to do that.
- Change the behaviour.
It's an awful mess because common FPGAs are not geared for partial reconfiguration. In software, you can swap a DLL without recompiling your app, in a FPGA, usually, changing a single D flipflop may break the entire design ( altough signifiant improvement are being made now ). FPGA compilers are very slow and if the partial reconfiguration is error prone, it may introduce real shortcuts inside the chip ( whereas, in general, no bug ridden software can break hardware :-).
Another problem is that if you need to reconfigure a signifiant part of the FPGA, it will take much time to download the configuration. The binary efficiency of FPGA fabric code is much lower than common assembly language. The performance gain of the FPGA will be mitigated by the bandwith needed to dynamically transform the FPGA.

The whole point is that you won't expect any performance gain for using a FPGA instead of an ASIC :
- For general purpose processing, a CPU will be faster
- For Signal processing, a DSP would be cheaper ( smaller die ) if it can handle the calculations.
- For some complex stream tasks : Well, what kind of complex stream ? 3D rendering ? A 3D chip will do the job better.

Common desktop CPU take most of their time moving small chunks of memory, swapping registers, pushing/popping from the stack, ... Nothing very streamlineable, nothing that can be efficiently implemented in a VLIW number crunching architecture.

( I've once built a DSP based parallel computer for astronomical data processing. Datas were basically independant and as such connecting together 40 DSP made calculations 39 faster. Basically, there no such speed improvement to expect on your PC )

FPGA vendor do not expect to replace Intel and AMD as big cpu builder. For low end CPUs, they offer soft libraries that can be implemented in the FPGA fabric. For higher performance, they embed hard coded processors ( Xilinx offers up to 4 400MHz PowerPCs in a single FPGA ). For insane performance, get a real CPU.

Finally, high FPGA are becoming quite affordable and anyone can experiment new CPU architecture in his/her garage ( for about $2000 to have a useable board ). Maybe you could find really innovative structures ( Objet oriented CPU, hardware garbage collector, dynamic code and data compression... ), but, once you've patented your brilliant idea, look for cash to burn a real high performance ASIC that will dwarf Intel....

( Another area where FPGA are really exciting is Neural Networks, transforming logic cells into neurons and programmable interconnects into dendrites... )

Mac two-line description wrong
by Jef Raskin on Fri 16th Jul 2004 06:10 UTC

It says that I wanted a low-cost computer and that Jobs changed it to an "easy to use" orientation. It started from an easy-to-use orientation, that was my reason for starting the Macintosh project in the first place.

Please check your history (e.g. read the primary sources, such as in the Stanford Univ. History of Technology collection (mostly online)) before you write. Be a good journalist. Get the facts.

Ah, history
by melgross on Fri 16th Jul 2004 06:14 UTC

I remember the CDC 6600 quite well, having programed on one here in NYC in the mid sixties when I went to stuyvesant HS. I was taking FORTRAN IV (remember that?), and the computer was in the Courand Mathamatical Institute nearby. It was quite a machine back then. The 6000 and 6600 were the first supercomputers though the performance would just elicit a giggle today.

FPGA's are inefficient due to the poor routing. The best paths are rarely able to be taken. There are other problems as well.

It remains to be seen whether increasing processors beyond two in a desktop-like system is practical for most work. Only some problems can be effectively broken down in this way.

It's unlikely that multiple cell processors will be useful in anything but a graphics enviornment.

Back in the "old days" every machine was mostly unique, and had it's own OS. Pretty much the only things in common were the cpu's and the memory chips.
There were some S100 buss machines, and later some used CPM, but all the big guys built their own machines from the ground up. The IBM Pc killed most all of that.

Apple's Xcode 2, which will be released with Tiger next year, will do automatic compiling for the Altivec vector unit, which will give all programs access to that portion of the chip for the first time, as programmers now don't want to spend the time optimizing programs they don't (sometimes wrongly) think need it.

re:
by grey on Sat 17th Jul 2004 01:03 UTC

You completely omitted lisp machines from your history, probably because you never used one (I haven't either), but I would argue that they're just as revolutionary, if not more so than many of the other consumer oriented pieces you mentioned.

And that said, I think the industry has proven, time and time again that worst-wins or cheapest-wins, either way. Hardware is a hugely expensive proposition to design and develop, far more so than software. I have a friend who designs DRAM among other things, assuming you have a design which is six layers, each layer let's just say is $10,000, and the whole wafer is thus $60,000. Now, if you have a bug in your hardware design, you need to replace that layer. If this happens at the last layer, you're out $10,000 to replace the one wafer (in a product line before mass-producing, they'll role out a wafer for each layer, just for this sort of debugging). But, if you're bug happens to be in the first layer, BAM $60,000 mistake. Can you, or any other hobbiest afford to spend that much on a bug in your design?

Not saying that hardware can't be messed with, but at a mass-production level, things get fugly. Anyway, point being - the winning situation is to use commodity hardware, best of breed perhaps (e.g. AMD64's and some sort of GPU that you can use to offload things onto, or heck even something tiny like a C3 that has wicked AES implementations), and go from there. Maybe later if you find success hardware will help support you (e.g. linux driver support is creeping in official channels now) but that's probably the last thing to happen.

That said, you can start at lower levels than you might think. Overhauling the BIOS could yield some real improvements, look at LinuxBIOS, or various OpenFirmware products. Heck, even www.soekris.com has implemented his own BIOS which offers serial console support and boots more OS's than LinuxBIOS does.

Hardware is a space that is extremely difficult to get into on a small scale, and if you do - then yeah, stick with fpga's and home designed PCB's and the like, forget about multi-core multi-threaded CPU's or 256bit GPU's or anything pushing the edge (even GPU's aren't anywhere near the same fab processes as cpu's are now). Leave the cutting edge to those who can throw billions at fabs.

Re: various
by Nicholas Blachford on Sun 18th Jul 2004 17:13 UTC

>Mac two-line description wrong

Fixed.

>grey

No, the idea is not to do custom chips at all. There's no real point these days unless you can really do something special and even then it is extraordinarily expensive.

>Anonymous

is there an instruction set that goes beyond cisc, risc and vliw and epic?

The problem with VLIW is it depends on the compiler extracting all the parallelisim in the instruction stream, if it can't it doesn't help.

I didn't specify which CPU I'd use as I didn't think yet another CPU fame war in the forum would be any benefit.

personally, since hitachi superh core only takes up like 3mm, but performs on-par with a 1-ghz pentium 3, and many 90nm processors have die size of 120mm, why not have a 40-cpu multicore superh processor (allowing for built-in controller and cache size)

Yes, that's what I expect, dozens of cores on a CPU so you can then spread out the OS and applications across them.

>Various missing machines

The list was only to give a few examples, it was not meant to be exaustive.

I would like to tie two ideas together, 1: Untying distributed software from the target CPU hardware and 2: The use of FPGAs to provide custom logic/hardware acceleration.
These ideas are based the two products below.
http://www.transmeta.com/crusoe/index.html
http://www.stretchinc.com/products.php

If you married these two current together, replaced the risc core in the Stretch with Transmeta's core you would have the best of both worlds.

Here is how I think it would work:
1. Write your program in c/c++ (other language)
2. Profile your code with the stretch tools to find locations for custom logic acceleration.
3. Compile your code to an "intermediate representation" with lots of optimisation hints (static optimisations) in the code.
4. When the software is installed it is compiled to the target hardware using hints found in the code, leaving some hints in.
5. When the code in running on the cpu it is being monitored by "code morphing" software to apply runtime optimisations and possibly find new code for custom logic acceleration.

The advantages I see for this approach are:
1. Always haveing highly ptimised code for your hardware: runtime optimisations for your actual usage patterns, better optimisations than Transmeta’s code morphing because of the static analysis.
2. Simpler/better code morphing software because of the pre-compiling to match the code morphing instruction set.
3. The ability to create new custom logic on the fly to enhance/ accelerate the code morphing software.
4. Software developers can take easy advantage of the Stretch FPGA abilities.
5. When improved cpus come out, new designs or with more resources the software can take advantage of it. ( Might require a installation recompile on cpu upgrade.)
6. Allow hardware acceleration of virtual machines by the code morphing software: Possibly look like coprocessors to the OS. (reduces the levels of abstraction / emulation. Eg java on JVM->X86->code morphing becomes java on code morphing) New instruction sets or virtual machines could be added by a plug in interface to the code morphing layer. The number of plug-ins per cpu would probability be small, but off set by the multiple cpus (eg 4)