Linked by Nicholas Blachford on Sun 10th Feb 2002 17:37 UTC
Editorial "This will end up being one of the world's worst investments, I'm afraid," - David House, former Intel chief of corporate strategy said in the early 1990s. I've been fasinated by microprocessors for years and have been following the Merced debacle since back in 1994 when HP and Intel announced they were getting together to make some amazing new technology.
Order by: Score:
Why is Compaq committed to IA64?
by Gil Bates on Sun 10th Feb 2002 18:23 UTC

Can anyone explain to me why Compaq is currently totally committed to moving Tru64 Unix and OpenVMS away from the Alpha CPU and onto Itanium?

I can't understand why Compaq is making such a huge commitment to Itanium and dumping Alpha if IA64 is so slow?

I can't imagine it being just for some sort of political reason. Sun at one time had Solaris ported to Itanium but they abandoned it because of the lousy performance of the CPU.

the truth
by the jimbo on Sun 10th Feb 2002 21:40 UTC

"Historically, Intel has this remarkable ability to charge a factor of eight for a performance boost of two in microprocessors."

they are really bad and only after the money. performance of a PII is almost identical to a PIII to a P4. the L1 and L2 cache are all that seems to matter!! let's not kid ourselves with crappy comparisions and charts!!!

important question.
by the jimbo on Sun 10th Feb 2002 21:48 UTC

does the 64 bit hammer require special support from microsoft. will it require a separate build if windows XP????

hammer
by Ungolaint on Sun 10th Feb 2002 22:13 UTC

For XP to take advatage of the 64 bit part of the chip, yes. To use the 32bit stuff no. I'd imagine you could run 64 bit programs on the 32 bit XP though.

Itanium and Transmeta r both VLIW.

Transmeta has these advantages over itanium

(a) lower power consumption
(b) lower cost
(c) faster x86 emulation.

in theory, it should be possible to program directly to
transmeta crusoe processors, by passing the code x86 morphing software, and get native VLIW performance boost.

Perhaps AMD or Apple should consider purchasing transmeta, and making the next gen. processor around it.

cracking open the RISC parts of the x86 line
by P_Developer on Sun 10th Feb 2002 23:34 UTC

I've asked this before... has ANYONE ever been able to crack open the RISC engine that drives the current x86 lines. The hardware is in there, and I'm sure that Intel must have a way of testing that section so I suspect there is a back door *somewhere*. That would make for very interesting performance enhancements to exisitng x86 operating systems.

P

x86 isnt really RISC
by Raptor-32 on Mon 11th Feb 2002 00:27 UTC

Before i get to that RISC comment...
"I'd imagine you could run 64 bit programs on the 32 bit XP though."

I would imagine the opposite. The *hammer processors are able to execute legacy (there is no sweeter word to use when describing x86) x86 code by behaving exactly like it, in 32bit mode. In x86-64's 32bit i dont think you have access to the other registers, which makes it _exactly_ like Protected Mode on 386-P4/Athlons. This means you would not be able to execute 64bit code (in 32bit mode) without some type of emulation. And if that code is for IA-64, you are just plain screwed on the *hammer because i'm 99.9% sure they are not binary compatible chips.

Now about RISC and x86. The 386 till even the newer chips like Athlons and P4s have always been considered CISC processors. I'm not sure about intel chips, but i know that AMD made an effort to reduce the number of instructions. The way they did that was by having a RISC-like set of instructions, and then all the others. If an app could get away by just using the "RISC" set it would take less cycles. A non-"RISC" instruction would be punished.

Example:
dec ecx
jnz label

would be the "RISC" translation of:
loop label

Itanium sucks...
by mterlouw on Mon 11th Feb 2002 01:31 UTC

Raptor, he's referring to the way Intel maps the CISC instruction set to RISC instructions internally on the latest Pentiums. I'm guessing there is no way to bypass the CISC instruction decoder. The existance of a uniform instruction format for the RISC microcode is extremely unlikely.

lipstick lesbian, the Itanium has a LOT more going for it than just VLIW (even though the VLIW in Itanium is a complete waste unless you're using a compiler that optimizes for explicit paralellism). I agree the Itanium sucks, for different reasons, but Transmeta has a looooong way to go to compete with the raw power of the Itanium.

Compaq make boxes
by Jon on Mon 11th Feb 2002 03:27 UTC

Compaq have dropped the Alpha for one simple reason, they understand making boxes, not cpu's.
For all of DEC's marketing stupidity they understood how to build cpu's. Compaq bought DEC for their box production, and things like Alpha were an unwanted bonus.
As soon as Compaq was given the opportunity to get rid of those things it just couldnt cope with it did.


As far as exposing the RISC core of x86 cpu's, its not as wasy as you'd like. Remember that CISC processors used to have little ROM's ( microcode ) that actually controlled the internal working of the cpu, with even more basic instructions than what you find RISC ISA's.
Saying they are RISC cores is a misnomer, they are an application of techniques first developed in RISC cpu's to a CISC cpu, and the internal instructions reflect that.

If you are really interested then try to dig up some docs on the NexGen 5x86, which let programs run in RISC mode. You might even be able to find a motherboard based on one of these chips, and do some programming.

System speed != CPU speed
by Jurgen Defurne on Mon 11th Feb 2002 14:19 UTC

How long will it take before the notion settles (known for years by computer scientists/engineers) that the Mhz of your CPU plays only a partial role in the speed of a system.
Yes, Sun has the lead even with slower CPU's, because they know how to optimise ALL their IO and balance it with the CPU speed.

I have worked for almost two years on a system which had a 33 MHz CPU and 16Mb of RAM, yet it served 20 people with a relational database of 4Gb with reasonable response times. I am talking MINICOMPUTER here. The reason why this system could do that, was that all IO was handled by separate IO processors, instead of the main processor. If the PC architecture would allow something like that, it would provide additional power at much lower MHz rates.

The only reason that Intel tries to stay ahead of the pack with increasing complexity is that they want to sell CPU's and cut off competitors. They want to make it possible to use their CPU's for every function on the PC. When did Intel think about multi-media extensions ? Philips had proposed an architecture for a special multi-media processor to provide audio- and video capabilities besides the CPU in the PC architecture. Intel did not want that, so they introduced MMX. Philips' idea died a very rapid death.

Yet, offloading IO tasks to specialised processors or spicifically programmed IO processors takes a whole burden from the central processor, allowing it to run the OS more efficient.

Jurgen

Sounds familiar to me...
by JoBBo on Mon 11th Feb 2002 16:43 UTC

> Yet, offloading IO tasks to specialised processors or
> spicifically programmed IO processors takes a whole
> burden from the central processor, allowing it to run the
> OS more efficient.

Heh, you're describing Jay Miner's concept for the Amiga technology...

-> many different specialised processors = Amiga's "custom chips" !!

please answer
by the jimbo on Mon 11th Feb 2002 19:50 UTC

so please please explain to me this question:

will AMD be under the mercy of microsoft to release a special build for the x86-64 produced only by AMD????


please please answer the question. it is driving me crazy.

transport triggered architecture
by Pontsik on Tue 12th Feb 2002 00:23 UTC

Have you've seen TTA architecture?
i STILL think THIS is whats gonna cut it!

http://www.byte.com/art/9502/sec13/art1.htm>TTA

http://einstein.et.tudelft.nl/~heco/move/move-project/>TTA

regarding MS and x86-64
by P_Developer on Tue 12th Feb 2002 01:48 UTC

The MS Win32 SDK docs now incorporate constants for x86-64 CPU types when getting the system versions. I bet that something is in the pipeline.

See this quote from

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sy...

"
wProcessorArchitecture
Specifies the system's processor architecture. This value can be one of the following values:

PROCESSOR_ARCHITECTURE_UNKNOWN
PROCESSOR_ARCHITECTURE_INTEL
Windows NT 3.51: PROCESSOR_ARCHITECTURE_MIPS
Windows NT 4.0 and earlier: PROCESSOR_ARCHITECTURE_ALPHA
Windows NT 4.0 and earlier: PROCESSOR_ARCHITECTURE_PPC
64-bit Windows: PROCESSOR_ARCHITECTURE_IA64
64-bit Windows: PROCESSOR_ARCHITECTURE_IA32_ON_WIN64
64-bit Windows: PROCESSOR_ARCHITECTURE_AMD64
"

What's curious is the IA32_ON_WIN64. Wonder what that means.

P

IA32_ON_WIN64.
by Anonymous on Tue 12th Feb 2002 10:45 UTC

Maybe it's used by the Windows on Windows system. This would need to have a IA32 entry but bind to the WIN64 api.

This is nasty actually, you'd need to translate stack frames, which would be non trivial if you pass pointers to structure that contains pointers e.g.

// Easy, no translation required
typedef
{
DWORD dwVal;
} APISTRUCT3, *LPAPISTRUCT3;

// Need to translate lpData
typedef
{
DWORD dwVal;
LPVOID lpData;
} APISTRUCT2, *LPAPISTRUCT2;

// Need to build a 64 bit version of Struct
typedef
{
LPAPISTRUCT2 Struct;
} APISTRUCT1, *LPAPISTRUCT1;

APISTRUCT1 ApiStruct1;
APISTRUCT2 ApiStruct2;

No, if you pass APISTRUCT3 to a Windows API call, no translation is required, you just zero extend the pointer on the stack.

If you pass APISTRUCT1 it's a real pain - you'd need to copy (& translate the lpData pointer) the APISTRUCT2 structure it points to to a temporary buffer, and change the pointer to point there, and then do the reverse after the API call returns.

Plus you'd need to be able to switch to 64 bit mode and back, which may take time.

Hammering Windows and IA32_ON_WIN64
by longjohn on Tue 12th Feb 2002 12:30 UTC

Given the name and place in the list, I would guess that this applies to Itanium instead of Hammer. Note that IA-32 is Intel marchitecture for what everyone else calls x86. Also, 32-bit mode on Hammer was designed specifically to run 32-bit x86 Windows programs without any modifications.

BTW, that also means that you can't run 64-bit programs on 32-bit Windows because the extra bits and extra registers just aren't visible unless the OS switches to 64-bit mode. Although it would be possible to create a 32-bit version of Windows that does this, there would be no point.

OpenVMS on Itanium
by longjohn on Tue 12th Feb 2002 12:45 UTC

Digital and Compaq have never been able to figure out how to make Alpha popular enough to be profitable. That's the fundamental reason Compaq isn't going ahead with EV8. That means they had three choices for what to do with OpenVMS:
1) drop it altogether
2) port it to another platform
3) bleed money until the corporation is insolvent
Surely the first choice was the best for Compaq's customers.

It also isn't hard to understand why Compaq's chose Itanium. The possible choices were SPARC, MIPS, POWER and Itanium. Neither SPARC nor MIPS offers exceptional performance on the type of applications Alpha is good at. POWER would be an option, but IBM is a competitor and Compaq already has a working relationship with Intel. Besides, with the Alpha engineers working for Intel, future versions of Itanium are bound to have better performance.

Exposing VLIW
by longjohn on Tue 12th Feb 2002 13:22 UTC

There are two reasons Transmeta doesn't expose the underlying VLIW engine in Crusoe. First, they don't have to convince anyone to port to Crusoe. Second, they can change the underlying architecture and code generator without having to recompile applications. It also allows them to optimize the architecture for emulation / interpretation of x86 code instead of as a general purpose processor.

If Transmeta can survive and prosper they will eventually adopt x86-64 because that architecture is a lot more friendly for translation into RISC-like code than x86. There are enough registers to avoid storing temporary values on the stack and they are large enough to avoid most multi-precision arithmetic.

Whether they ever expose the underlying engine will depend on thier experience. If they find that a particular architecture works well so they aren't changing it all the time and they find that there is a fundamental issue with x86 translation and Crusoe becomes popular enough then they might. However, I doubt that will happen. Since they have made a virtue out of a problem (x86 compatibility), there just isn't much upside to exposing the engine.

Re: OpenVMS on Itanium
by longjohn on Tue 12th Feb 2002 13:25 UTC

Oops, I meant "the second choice", not the first. Sigh.

Several stages in the athlon pipeline is spent on decoding x86 into micro-ops which are Risc like. that way, athlon can run legacy x86 code.

there are like 50+ million athlons out there.

wouldn't it be faster to have a compiler that expose the underlying Athlon Risc engine, reduce the pipline by several stages and hence reducing mispredict penalties?

in other words, the underlying athlon Risc engine has many registers and a shorter pipeline, so why not have a compiler produce code in micro-ops, instead of x86.

it should allow the athlon to run cooler and faster.

Re: Exposing VLIW
by Nicholas Blachford on Tue 12th Feb 2002 22:50 UTC

I liked the Russians approach to this problem, they have a fast VLIW core but not exposing it means you have to perform code translation and this slows you down. However if you do expose the internals you are left with an architecture which is very hard to change as it will break compatibility (it is hard to make big changes to Itanium).

The Russians have a very clever way of solving this which is to break the problem in two. You ship a binary based on instruction set A (which is a hardware independant list of instructions) then you recompile this into instruction set B which is specific to the CPU. You only recompile once so it could be done when installing but you end up with an optimised binary for the specific CPU and no need for so much complexity in the hardware - so it's faster also.

As for Transmeta using x86-64, They already have a license:
http://www.amd.com/us-en/Corporate/VirtualPressRoom/0,,51_104_543~1...

In other news, in a timely announcement, SGI backed up my story :-)
http://www.theregister.co.uk/content/53/24039.html


P.S. There is one correction to the original article:
It was Nick Tredennick of Microprocessr Report who thought up "Itanic", not The Register.

Re: Exposing VLIW
by longjohn on Wed 13th Feb 2002 00:17 UTC

The idea of distributing intermediate-code files that are compiled on installation isn't new. I came up with this idea in 1992 and then discovered it was already old-hat.

The problem with this approach is that it assumes that the source code is platform independent. It is amazing how much source code isn't. The elegance of the Transmeta approach is that it detects the platform dependencies and emulates around them.

As for Transmeta and x86-64: licensing x86-64 technology isn't the same thing as delivering it, but I expect that they will for the reasons given.

The Althon engine is purely internal and there is no reason for AMD to expose it even if they could. The big selling point for Athlon is compatibility with legacy applications. If AMD exposed the engine they would have a RISC chip, but one that isn't compatible with any existing processor architecture and which doesn't have any software. Take it from someone who's been there / done that, there are better ways to throw away money.

You also need to understand that the real problem with decoding multiple x86 instructions is to identify the start of each instruction. That's hard because you have to (partially) decode each instruction to identify the one after it. (This is one of the primary advantages of fixed-size instructions in architectures like PowerPC or Alpha.) Once you've found the instructions, decoding them is comparatively easy. Athlon "cheats" by remembering where the start of each instruction is. This is almost as efficient as remembering the decoded instructions (as Pentium 4 does), but a heck of a lot easier.

One of the reasons I promote this dual approach is (a) smooth transition for (b) long term performance gains.

MDRONLINE estimates a 2% compounded penalty for using x86 ISA. - one that would be remedied to a switch with either RISC or VLIW.

let's look at apple as a case example. Apple started on CISC, the 68k line. they then switched to powerpc, with a 68k emulator. now they are almost native 100% RISC.

continually extending x86 maybe challenging. one of x86 problems is small register set (8).

let us suppose that amd allowed for a compiler that directly coded to its RISC engine.

AMD can claim fast x86 execution. this would be like apple powerpc emulating 68k. they can also claim speed improvements with certain applications by writing specifically for its RISC core in micro-op. This would be like writing for apple's powerpc native code.

at some point, if there is the demand, amd can offer cpu that are very powerful, and will run only with its RISC core. this will reduce die size and heat and transitor count and improve performance.











The Althon engine is purely internal and there is no reason for AMD to expose it even if they could. The big selling point for Athlon is compatibility with legacy applications. If AMD exposed the engine they would have a RISC chip, but one that isn't compatible with any existing processor architecture and which doesn't have any software. Take it from someone who's been there / done that, there are better ways to throw away money.