posted by Detlef Niehof on Tue 6th Jan 2009 22:53
Conversations From time to time, some OSNews visitors claim that x86 was a bad chip (instruction set?) architecture and something better should come along. As a current example, see http://www.osnews.com/thread?342462 (story: "Freescale To Take on Intel's Atom in Netbook Market", comment: "x86 is brain dead")
What do you all think is technically wrong with x86, why can't it be fixed, and which CPU and/or ISA (instruction set architecture) should follow? Why?
Also, one question that I couldn't answer myself by searching the web: For years, x86 suffered from too few general purpose registers (8), that were only recently raised to 16 with x86-64 (in long mode). What was the reason the number of registers in x86 was not increased earlier and higher? Why limit it to just 16 when other CPU's often offer something around 128 registers or so? Too expensive?
Previous ConversationNext Conversation
Comments:
My take
by rhyder on Wed 7th Jan 2009 02:32 UTC
rhyder
Member since:
2005-09-28

Up until 10-15 years ago a considerable amount of game and application development was still carried in assembly language. At that time, the x86 ISA seemed poor compared to that of other chips such as 68000 and ARM. However, unless writing device drivers or a compiler, most programmers don't see the ISA at all these days. It seems a shame to some that the weakest design achieved dominance. These days a CPU is a black box - you put code and data in one side and data comes out the other end. The performance, cost, and power consumption are the factors that differentiate chips for most people.

Sometimes people on this site comment that they wouldn't mind having a PPC Linux workstation for example. I'm always left scratching my head as to why.

Obviously the current chips would be more powerful, or more power efficient, or both, if they didn't run x86 code as they have to convert it into RISC code on the fly. This constitutes a waste of resources.

In the case of a netbook CPU, things are a bit different as power efficiency is as important as speed and legacy compatibility. In other words, extra battery life might be better than the ability to natively run Windows applications.

I always presumed that registers must be implemented as some sort of really high speed RAM located within the CPU core. I suppose that it adds more complexity and cost to the chip. As I say, a CPU should be considered as a black box. Perhaps the designers of x86-64 chips discovered that the performance gains of more than 16 registers were less than other measures such as longer pipelines, L1 cache or faster clockrates.

Reply Score: 2

RE: My take
by Detlef Niehof on Wed 7th Jan 2009 09:19 in reply to "My take"
Detlef Niehof Member since:
2006-05-02

First off, thanks for your answer!

Up until 10-15 years ago a considerable amount of game and application development was still carried in assembly language. At that time, the x86 ISA seemed poor compared to that of other chips such as 68000 and ARM. However, unless writing device drivers or a compiler, most programmers don't see the ISA at all these days. It seems a shame to some that the weakest design achieved dominance. These days a CPU is a black box - you put code and data in one side and data comes out the other end. (...)

When you say that the CPU is a black box, do you mean that this is a good thing in general? Let's pretend for a moment that the ISA of today's dominant chip was the best one imaginable: Which one would it be? And would it increase the chance that more dedicated assembly programmers would fine tune their code such that it could even offer better performance than what is possible today, just because the ISA fits the human brain better? Would maybe the CPU in that scenario not be considered a black box, for our own benefit?

Obviously the current chips would be more powerful, or more power efficient, or both, if they didn't run x86 code as they have to convert it into RISC code on the fly. This constitutes a waste of resources.

This is not obvious to me, however. It is claimed for years that x86 code was inefficient WRT performance, but how is that in line with the reality that the fastest CPU's (disregarding supercomputing) are in fact x86 CPU's? And that even Intel themselves, despite huge amounts of money for research and development that went into the Itanium, have not been able to create a CPU that was substantially faster than their own x86 offerings?
As to power efficiency, I can't say much about it, and you may be right. :-)

(...) Perhaps the designers of x86-64 chips discovered that the performance gains of more than 16 registers were less than other measures such as longer pipelines, L1 cache or faster clockrates.

Yes, probably. Unfortunately, not much information about it can be found on the web...

Reply Score: 1

RE[2]: My take
by rhyder on Wed 7th Jan 2009 10:42 in reply to "RE: My take"
rhyder Member since:
2005-09-28

"When you say that the CPU is a black box, do you mean that this is a good thing in general?"

I think that people sometimes become emotionally attached to a hardware platform. I was too, but these days, all I care about is performance and cost in a desktop CPU.

I don't think that any changes to the current ISAs would cause application developers to go back to assembly language programming en masse because software has become too complex to program at such a low level. In other words, time is a resource. Why spend three times as long working on the code for a possible, small speed benefit? In some cases, compilers can perhaps beat human assembly language programmers. On any typical programming project, three times the manpower could be better spent than wading through assembly language.

So, as no one even sees the ISA now, I don't think it matters what the underlying technology is.

Reply Score: 2

RE[3]: My take
by Treza on Wed 7th Jan 2009 13:58 in reply to "RE[2]: My take"
Treza Member since:
2006-01-11

Nobody likes the x86 architecture, even Intel which tried three times to kill it : i432, i860, Itanium.

It is Frankeinstein's creature !

:-)

Reply Score: 1

RE[2]: My take
by wanker90210 on Fri 9th Jan 2009 23:20 in reply to "RE: My take"
wanker90210 Member since:
2007-10-26

Let's pretend for a moment that the ISA of today's dominant chip was the best one imaginable: Which one would it be? And would it increase the chance that more dedicated assembly programmers would fine tune their code such that it could even offer better performance than what is possible today, just because the ISA fits the human brain better? Would maybe the CPU in that scenario not be considered a black box, for our own benefit?


This is an interesting question. If I'd choose a dream CPU of my web server it would be the highly threaded Sun T2000 CPU, but I'd suffer immensely having that for a desktop CPU. This might change in 5-10 years when/if desktop software will be highly parallel/threaded though.

MC68000 to some extent, MIPS, ARM, PowerPC etc are usually examples of "good" architectures, but how would they deal with the growth pain of 64"? Either new 64" dedicated designs that would do poor performance software emulation of 32" or they'd do a new protected mode for 64" à la a modern x86 processor. And in AMD64 mode, from the little I've read, the x86 is pretty nice.

The more cores you cram onto a die, the more memory bandwidth becomes an issue and CISC is more compact than RISC and therefore CISC architectures (or CISC instructions, RISC translation) might make sense more than a clean RISC only architecture.

Reply Score: 1

RE[3]: My take
by christian on Sat 10th Jan 2009 12:34 in reply to "RE[2]: My take"
christian Member since:
2005-07-06

"Let's pretend for a moment that the ISA of today's dominant chip was the best one imaginable: Which one would it be? And would it increase the chance that more dedicated assembly programmers would fine tune their code such that it could even offer better performance than what is possible today, just because the ISA fits the human brain better? Would maybe the CPU in that scenario not be considered a black box, for our own benefit?


This is an interesting question. If I'd choose a dream CPU of my web server it would be the highly threaded Sun T2000 CPU, but I'd suffer immensely having that for a desktop CPU. This might change in 5-10 years when/if desktop software will be highly parallel/threaded though.

MC68000 to some extent, MIPS, ARM, PowerPC etc are usually examples of "good" architectures, but how would they deal with the growth pain of 64"? Either new 64" dedicated designs that would do poor performance software emulation of 32" or they'd do a new protected mode for 64" à la a modern x86 processor. And in AMD64 mode, from the little I've read, the x86 is pretty nice.
"

I think nice is relative.

Anyway, MIPS and PowerPC (and PA-RISC) had relatively easy transitions to 64-bit. Their 64 bit instruction sets are supersets of the 32 bit ones, and 32-bit code runs without emulation or different protection modes. The only instructions that are different in 64 bit mode are rotates. Twos complement allows 32 bit code to use 64 bit registers just fine for most operations.

Oh, and the big RISC processors all transitioned in the early-mid 90's before they even had 64 bit OSes available.


The more cores you cram onto a die, the more memory bandwidth becomes an issue and CISC is more compact than RISC and therefore CISC architectures (or CISC instructions, RISC translation) might make sense more than a clean RISC only architecture.

Instruction density is only half the story. Many instructions and memory bandwidth on x86 is used simply for shuffling values into and out of registers, overhead that is generally not required on 32 register instruction sets. So, x86 may well require more instructions to execute for a given function, even if the mount of memory used is smaller. And with code sizes being dwarfed by data sizes these days, less and less bandwidth is being used by instructions.

And lets not forget the mighty Alpha. Given the resources Intel pour into chips and fabs, what could Alpha have achieved given the same resources. Compare the sizes of the original Pentium and the original Alpha. ~3.1 million transisters versus ~1.7m, which CPU do you think you could cram more of onto a die?

Reply Score: 1

RE[4]: My take
by Detlef Niehof on Sat 10th Jan 2009 22:35 in reply to "RE[3]: My take"
Detlef Niehof Member since:
2006-05-02

"(...) And in AMD64 mode, from the little I've read, the x86 is pretty nice.


I think nice is relative.
"

Seen from the perspective of an assembly language programmer?

Instruction density is only half the story. Many instructions and memory bandwidth on x86 is used simply for shuffling values into and out of registers, overhead that is generally not required on 32 register instruction sets. So, x86 may well require more instructions to execute for a given function, even if the mount of memory used is smaller. And with code sizes being dwarfed by data sizes these days, less and less bandwidth is being used by instructions.


So, put (very) simply, you say that code density does not matter much anymore since data structures are so much larger than code sections today?
I didn't understand, however, what you meant by the "overhead that is generally not required on 32 register instruction sets." What 32 register instructions sets do you mean? Could you clarify that? Thanks!

And lets not forget the mighty Alpha. Given the resources Intel pour into chips and fabs, what could Alpha have achieved given the same resources. Compare the sizes of the original Pentium and the original Alpha. ~3.1 million transisters versus ~1.7m, which CPU do you think you could cram more of onto a die?


Are you thus saying that we would have faster CPU's now, if Alpha had the same amount of money put into its development as the x86?

Reply Score: 1

RE[3]: My take
by Detlef Niehof on Sat 10th Jan 2009 22:52 in reply to "RE[2]: My take"
Detlef Niehof Member since:
2006-05-02

If I'd choose a dream CPU of my web server it would be the highly threaded Sun T2000 CPU, but I'd suffer immensely having that for a desktop CPU. This might change in 5-10 years when/if desktop software will be highly parallel/threaded though.

MC68000 to some extent, MIPS, ARM, PowerPC etc are usually examples of "good" architectures, but how would they deal with the growth pain of 64"? (...)


Now, leaving such problems aside, and assuming that one of the CPU's you mentioned were in fact today's dominant chip set without the need to be backwards compatible.
Do you think that just because these are "good architectures", todays programmers would write more efficient code, maybe even in assembly language? Given a "good architecture", this should be easier than on a bad one (as x86), shouldn't it?
Do you think that if the Sun T2000 CPU that you mentioned was dominant, that programmers would already have created suitable highly threaded desktop software?
Or, in other words, to what extent does this unfortunate, outdated and right from the beginning "bad" x86 chip design hold us back today, and to what extent is it just the way things work in modern computing (as rhyder claims)?

Reply Score: 1

RE[3]: My take
by renox on Tue 13th Jan 2009 10:12 in reply to "RE[2]: My take"
renox Member since:
2005-07-06

>>how would they deal with the growth pain of 64bit?<<

Uh? You'd better research better your facts before writing, MIPS and PPC have 64bit variants and without too much trouble I'd say.

As for instruction compactness, there are way to have a very RISC-type CPU and get very good instruction compactness such as ARM Thumb2 ISA, granted it's 32bit only..

Reply Score: 2

RE[2]: My take
by silix on Mon 12th Jan 2009 17:21 in reply to "RE: My take"
silix Member since:
2006-03-01

Let's pretend for a moment that the ISA of today's dominant chip was the best one imaginable: Which one would it be?
imho it would be an architecture developed with the hindsight of previous decades of attempts and mistakes, then none of the existing ones

This is not obvious to me, however. It is claimed for years that x86 code was inefficient WRT performance, but how is that in line with the reality that the fastest CPU's (disregarding supercomputing) are in fact x86 CPU's? And that even Intel themselves, despite huge amounts of money for research and development that went into the Itanium, have not been able to create a CPU that was substantially faster than their own x86 offerings?

the problem with the x86 at the time of the design and introduction of itanium, was its limited IPC - even with superscalar decoders, IPC higher than 1 is rarely reached, because of interdependent and / or sequence disrupting and/or memory accessing intructions in the sequence - and according to chip-architect.com even recent Athlon64's / Core's don't change this situation

then, it was thought that the best way to achieve intruction parallelism was to implement it in the code and support it at the ISA level, and actually the itanium IS fast, with higher performance/clock than, for instance, x86, iff used with carefully optimized code

the problem with the itanium is that it imposes an entirely different paradigm to the ML programmer / compiler: the latter can (or must) rely on a great many registers, but he must handle instruction parallelism (then ordering, then dependency) on its own, and thus a great deal of complexity is moved from the out of order execution logic to the developer (or compiler)

and, in a world where ML is nearly avoided by all but embedded or operating systems developers, and those developers who do program in assembly language, are mostly used to X86, a system that imposes a dramatic shift in addition to losing backwards compatibility, is inherently relegated to a niche and suffer the development pace (if any) of a niche segment

Reply Score: 1

RE[3]: My take
by Detlef Niehof on Mon 12th Jan 2009 20:41 in reply to "RE[2]: My take"
Detlef Niehof Member since:
2006-05-02

(...) and, in a world where ML is nearly avoided by all but embedded or operating systems developers, and those developers who do program in assembly language, are mostly used to X86, a system that imposes a dramatic shift in addition to losing backwards compatibility, is inherently relegated to a niche and suffer the development pace (if any) of a niche segment


So you assume that carefully crafted IA-64 assembly code that runs on the fastest Itanium would outperform equally well optimized x86 code on the fastest x86 CPU?
That sounds interesting. Are you aware of any performance comparisons that would back up such claims?
(I'm not necessarily talking about "hard facts" such as scientific research here, I'd also be interested in something like personal experience and such - even anecdotes might me interesting.)
I believe that hand-optimized code for the IA-64 would be hard to find, not only for the reasons that you gave, but also since the IA-64 programming model seems to be so complex that even Intel recommends against human optimizing attempts:

Intel and the industry are developing compilers to take advantage of these techniques. Application
developers are not advised to use this as a guide to assembly language programming for the Itanium architecture.

From Intel® Itanium® Architecture Software Developer’s Manual, page 149 in 24531705.pdf, available at http://developer.intel.com/design/itanium/manuals/245317.htm
Are the compilers already good enough to provide a real performance advantage over x86?

All in all, thanks heaps for your interesting answer. I wasn't even aware of the claim that Itanium CPU's might be faster than x86 if only the code was optimized for IA-64.
From what I've been reading on the web, Itanium seemed like a CPU that didn't even have the potential to be faster than x86!

Reply Score: 1

RE: My take
by silix on Mon 12th Jan 2009 16:27 in reply to "My take"
silix Member since:
2006-03-01

Sometimes people on this site comment that they wouldn't mind having a PPC Linux workstation for example. I'm always left scratching my head as to why.

me too, since the PPC is a decent RISC but not such special one - for instance, it lacks something i deem vital in order to avoid uncessary branching (ie conditional execution), is not orthogonal, and like all RISCs, has what can be considered a drawback from a developer's point of view (later on this)
it's not that it's so much better than x86 in this sense...

Obviously the current chips would be more powerful, or more power efficient, or both, if they didn't run x86 code as they have to convert it into RISC code on the fly. This constitutes a waste of resources.

interestingly, today's processors actually have tens or hundreds of registers (they just don't expose that many as GPR's, and use them for register aliasing -an ability crucial to out of order code execution- instead) but it's not the main factor for low efficiency. or better, low silicon usage effectiveness - complexity mostly arises from the processor front end (instruction fetch/align/decode) and dynamic execution (schedule / reorder) logic - for both, sophistication ( then complexity ) is needed in order to cope with today's performance needs, and so is not entirely ISA-mandated
iirc the most recent PPC, the G5, used microcode and drew the same if not more, power than a contemporary Athlon64, to achieve more or less equivalent performance...

In the case of a netbook CPU, things are a bit different as power efficiency is as important as speed and legacy compatibility. In other words, extra battery life might be better than the ability to natively run Windows applications.

but the hype around netbooks is a recent thing - if we think we are seeding X86-irrelevance, we're going to reap in the long period
maybe we even don't ever do, since being netbooks just smaller notebooks, people rightly expects them to run the same SW they run on normal notebooks (if not even notebooks they already own), starting with X86 versions of windows

I always presumed that registers must be implemented as some sort of really high speed RAM located within the CPU core. I suppose that it adds more complexity and cost to the chip. As I say, a CPU should be considered as a black box.
a long time ago it was so, and every added bit, every added transistor, could significantly impact the final cost
nowadays it's not like that anymore but unfortunately it's at that time that the original x86 ISA was conceived ;)

Perhaps the designers of x86-64 chips discovered that the performance gains of more than 16 registers were less than other measures such as longer pipelines, L1 cache or faster clockrates.

in principle, more general purpose registers give programmers room to think and optimize functions composed of register to regitser only operations, without external memory accesses other than loading arguments and saving the result
this is a code design approach that entirely fits with the philosophy of a RISC, load/store architecture

BUT, while 8 GPR's are too few for this kind of code design (which, btw, is less intuitive for many assembly programmers -at least all those i know, maybe too used to CISC architectures- because of the added indirection and the need to track which register holds which datum taken from which VM address, in contrast to simply treating VM addresses as variables), it has been found that few algorithm really require as many as 32 registers, while 16 should suffice for most algorithms

on the other hand, larger GPR banks require a smarter compiler to optimize register allocation and usage and in some cases actually take longer to save (thus impose overhead on the context switch operation)

then, i'd consider the 16 registers in the X86-64 isa as an example of "compromise design" - a decent compromise, but nonetheless one that could be done MUCH earlier ( in the pentium or pentium pro if not 386, days)

Reply Score: 1

Well...
by fretinator on Wed 7th Jan 2009 21:56 UTC
fretinator
Member since:
2005-07-06

I'm still waiting for my Cray laptop

Reply Score: 2

RE: Well...
by hollovoid on Mon 12th Jan 2009 04:44 in reply to "Well..."
hollovoid Member since:
2005-09-21

ahhh cray *drools*

Reply Score: 2