‘Dell is back with an ultra-thin tablet – with Intel inside’

Thom Holwerda 2015-02-03 Intel 28 Comments

AndroidCentral reviews some Dell Android tablet, and concludes:

There’s a lot to like in the Dell Venue 8 7840 tablet. The name is not one of those things. The display, however, most definitely is. Resolution quirks aside, Dell’s got a gorgeous panel in this tablet. And the Intel Atom processor seems like it’s pushing everything just as you’d expect a high-spec’d tablet to do. Battery life is pretty much on par with what we’d expect. And while on-board storage is close to shameful, Dell makes up for it with allowing for a massive amount of removable storage.

I’m not interested in the tablet itself, but in its processor. I find it remarkable that Intel has reached a point where it can power mobile devices with comparable performance and battery life… But with x86-64, not ARM. Intel isn’t new to mobile, of course – I have countless Xscale-powered PDAs – but that was ARM, not x86(-64).

We’re reaching a point where we have a standard architecture running from small phones all the way up to supercomputers. Remarkable.

About The Author

Thom Holwerda

Follow me on Mastodon @[email protected]

28 Comments

2015-02-03 1:36 am

tf456
“We’re reaching a point where we have a standard architecture running from small phones all the way up to supercomputers. Remarkable.”

Serious question: are there more Intel tablets than ARM servers?

I know the ARM servers that have been coming on mostly in the second part of this year are pretty early and experimental, but I wouldn’t be surprised if, say they don’t have a real or considerable numerical lead, that ARM servers are more “popular” than Intel tablets.

2015-02-03 9:18 am

said1
More likely they share the same ISA, not the same architecture.

Latest Atom performance are quite impressive though.

2015-02-03 1:46 am

tomz
The ugly but still 4004 compatible x86 instruction set? It should be killed, though it might require a stake through it’s heart, beheading, and scattering the ashes.

Long ago, it was the 68000 v.s. 8086, and the 68000 was just so much better. But the IBM PC picked the 8088 because it had an 8 bit bus and the 68008 wasn’t out yet.

http://www.easy68k.com/paulrsm/doc/dpbm68k1.htm

The ARM is similar. The best analogy is do you want a phonetic alphabet or a pictogram? The x86 has each byte maybe be an instruction with lots of prefixes, modifiers, and such, and is coerced into something RISC like inside the processors, but this takes lots of silicon. Even if you don’t go all the way to full RISC, you at least want a longer word and some orthogonality.

The Cell processor could do better for supercomputers, as could any RISC chip. But the economy of scale for silicon means x86 will be less expensive for the moment. The original Cray was NOT x86 like.

(I’m thinking VHS v.s. BetaMax, but am not sure how well it applies).

I’m surprised they made the Atom work – you still need all that extra silicon, gates, and switching – all of which requires power – to work v.s. the ARM.

This is one thing which surprised me on the software side, at least if Microsoft can make Windows work in mobile. By relying on Moore’s law to let you have 50w, then 100w, then 500w computers, Microsoft could code bloatware. But each inefficiency that took one more transistor switching took that much more energy. You can’t get around physics.

I for one hope to see the day – as soon as possible – the x86 goes the way of the Intel 432 (does anyone else know what that was), and the Itanium.

Perhaps AMD could have a parallel RISC mode with – not just a 64 bit mode, but a new RISC like instruction set and architecture – or maybe even just ARM64.

2015-02-03 2:51 am

kaiwai
I think goes to the heart of what Intel’s long term plan is – they realise that the x86 ISA isn’t the best but they hope that long term they can leverage their manufacturing edge to offset any possible inefficiencies that come with the x86 ISA. In the end though I fear that unless Intel can pull a rabbit out of their hat that with the die shrinkage the rate at which they can continue having the lead will eventually get to the point where ARM can be one step behind but still have a better performance-per-watt advantage over Intel. The big question long term is whether Intel jump back into the world of ARM because I could imagine Intel bolting onto their architecture the ARM ISA – the combination of a clean ISA, best architecture on the market and manufacturing edge would result in some pretty good products.

2015-02-03 4:53 am

tylerdurden
For the love of Mithra, people need to finally grasp the difference between “instruction set” and “microarchitecture.” In the big scheme of things, the translate stage structures are a really really tiny percentage of the transistor budget for a modern CPU. And it has been so for a while, a decade (or more) at least.

FWIW ARM success comes largely from their business model/structure, rather than the subjective qualitative musings some people like to wax poetic about its ISA (which is funny, because people used to bitch about how bad it was back in the day).

In any case, it’s odd to see the 80s/90s RISC marketing arguments still surviving to this stage, even if the technical reality is completely different. Amusing how marketing shapes perception.

2015-02-03 5:20 am

galvanash
In the big scheme of things, the translate stage structures are a really really tiny percentage of the transistor budget for a modern CPU

… and as the cores get smaller and faster, the cost of decode shrinks. It is a mostly fixed cost in both power and area, and it’s been a solved problem for 10 years. As a percentage of overhead it just keeps getting smaller and less relevant as time passes.

But yeah, people will keep spouting off about it… “RISC is better!”. Whatever. That acronym stopped having any real semantic meaning back when the Pentium Pro shipped in 1995…

Don’t get me wrong, I have interest in other architectures too, but it isn’t because I’m waiting for someone to save me from x86…
2015-02-03 8:56 am

Treza
The “technical reality” is not that clear-cut.

For simple cores (scalar in-order), x86 can’t compete with ARM and other RISC cores.

For the kind of cores suitable for a tablet or a phone, Intel can do something competitive, undoubtedly, but this has a cost, and Intel and AMD are the only ones able to make that pig fly.

On the ARM side, Apple and Nvidia have produced very competitive and innovative CPU cores, in relatively short time.

x86 complexity is not just decoding. There are segments, x87, many microcoded instructions … There are also a lot of special optimisations like treatment of the stack contents and SP registers.

Intel may decide to drop some parts (x87 or a 64bits-only core) and abandon compatibility with DOS.

ARM essentially made a reboot with their 64bits fixed length instruction set, very different from the 32bits fixed and variable lenght instruction sets.

So far, Intel made high margin CPUs which could justify designing very complex designs. They are using “contra-revenue” for imposing their CPUs in Android tablets.

It remains that 64bits ARMs can be marginally better, cheaper to design and a bit smaller, if made on the same fab.

2015-02-03 9:59 am

Brendan
Hi,

The “technical reality” is not that clear-cut.

For simple cores (scalar in-order), x86 can’t compete with ARM and other RISC cores.

Heh. For scalar in-order, you’re so far behind in terms of performance/watt that it’s like trying to win a “stupidest moron” contest – there is no winner.

For the tiny/embedded market, there’s 2 important factors (that have nothing to do with ISA). First, the markup is typically pathetic (too small for a company who is used to high profit margins to bother with).

Second, ARM is licensed, not a final product. This means that for embedded systems someone can put an ARM core directly into their own custom chip. Intel can’t do that – they need to provide everything, and if a manufacturer wants something in a chip that Intel doesn’t provide they’re out of luck.

– Brendan
2015-02-03 6:57 pm

tylerdurden
The “technical reality” is not that clear-cut.

For simple cores (scalar in-order), x86 can’t compete with ARM and other RISC cores.

As usual, that’s a big “it depends.” E.g. The Atom N450 mopped the floor in performance/watt with the comparable Cortex A8 cores of the time, using a similar silicon area/size.

Intel has traditionally been at least 1 or 2 nodes ahead than the rest of the ARM ecosystem, so they have access to those benefits which basically neutralize any supposed intrinsic power efficiency ARM may have. On the other hand, ARM has the better licensing/business model approach, which gives its ecosystem the edge in highly price-sensitive markets.

For the kind of cores suitable for a tablet or a phone, Intel can do something competitive, undoubtedly, but this has a cost, and Intel and AMD are the only ones able to make that pig fly.

The reduced number of players in the x86 space has more to do with the realities of the ownership of the x86 IP, and who is allowed to manufacture it.

On the ARM side, Apple and Nvidia have produced very competitive and innovative CPU cores, in relatively short time.

And why wouldn’t them, that’s the whole point of ARM: license their cores so you don’t have to reinvent the wheel from scratch.

x86 complexity is not just decoding. There are segments, x87, many microcoded instructions … There are also a lot of special optimisations like treatment of the stack contents and SP registers.

Intel may decide to drop some parts (x87 or a 64bits-only core) and abandon compatibility with DOS.

And all of that stopped being an issue, specially since decoupled microarchitectures have been around. It makes no sense to throw away something which keeps backwards compatibility (which is one of the most sensitive things in the x86 space), and that only adds noise to the overall transistor budget.

ARM essentially made a reboot with their 64bits fixed length instruction set, very different from the 32bits fixed and variable lenght instruction sets.

And?

So far, Intel made high margin CPUs which could justify designing very complex designs. They are using “contra-revenue” for imposing their CPUs in Android tablets.

That may very well be, but that is a function of the business model not the isa/microarchitecture.

It remains that 64bits ARMs can be marginally better, cheaper to design and a bit smaller, if made on the same fab.

Not really. ARM64 has had so far a fantastically hard time breaking into the high performance markets for example.

2015-02-03 1:36 pm

Yoshman
There are some specific parts of the ISA and the memory consistency model that do have relevant effects on real-world applications.

The thing with x86 is that, probably out of pure luck, it managed to select a design that turned out to be a very good choice in several places.

One example is selected TSO, total store order, as the memory model, that allow one to write things like ring-buffers and similar things for multicore CPUs that is far more efficient compared to what is possible on CPU-arch with looser memory models (essentially all RISC except SPARC which nowadays also use TSO).

Another thing is atomic load-op-store instructions, something that is very non-RISC. ARMv8.1 seem to add several such functions, probably because they see how much more efficient that is on multcore on x86 compared to the traditional RISC load-linked/op(s)/store conditional cycle.

Edited 2015-02-03 13:40 UTC

2015-02-03 1:43 pm

Thom Holwerda
The thing with x86 is that, probably out of pure luck, it managed to select a design that turned out to be a very good choice in several places.

I know fuck-all about this stuff, but why “out of luck”? Why not assume that the largest chip maker in the world knew what it was doing?
2015-02-03 2:14 pm

Yoshman
A less strict memory model had performance advantages as long as the norm was uni-core, so why didn’t Intel go with that?

One have to look a bit at history for that, the first multicore x86-design was 386 where the memory model was extremely strict, it was SC (sequentially consistent).

x86 never had a _formal_ specification around this until quite recently (i.e. it was not mentioned in Intel’s CPU-manuals). Using SC on a modern CPU is not practical as it essentially prevents any form of store-buffer to be used. Performance on memory operations would be horrible without a store-buffer.

So my guess is that TSO was selected because it is close enough to SC to make it feasible to support existing software with one code base. For example, Linux does have logic to handle pre-TSO x86 CPUs.

And just to be clear, this is absolutely no x86-bashing. On the contrary, as a programmer that does a lot of multicore optimization, I would select x86 over 32-bit ARM any day. 64-bit ARM is an big improvement, time will tell if it has caught up with x86 on this front.
2015-02-03 10:19 pm

galvanash
I know fuck-all about this stuff, but why “out of luck”? Why not assume that the largest chip maker in the world knew what it was doing?

Thing is no one knew what they were doing at that time (i.e. early 70s), not in light of what was to come later in architectural advances.

Superscaler Out-of-order CPUs were still almost 20 years down the road when the 8086 ISA was designed. It had an complex instruction encoding that was difficult to decode, only 8 general purpose registers with a complex register use model, programmer visible segmented memory organization, many odd specialized instructions. None of this was uncommon at the time, but by modern design standards it was awful.

So how were they lucky?

Instruction Encoding

Yes, it is hard to decode. But it is also extremely dense, which allows it to make more efficient use of instruction caching and speeds up program loading. This become more important over time, as the memory wall (the drop in performance when going to main memory) just kept growing. Dense instruction encoding needs less cache, use less memory, consuming less bus bandwidth, etc. – overall a very beneficial feature both power wise and performance wise.

ARM did thumb for this very reason – and they had no baggage to deal with, they did in on purpose…

Low Register Count

x86-64 added 8 more registers, and they did help a bit in certain scenarios. But it turns out that it didn’t really matter all that much… Once x86 went OOO and added register renaming in hardware, the limited number of programmer visible registers no longer had as many negative consequences. It also has some benefits, namely it makes compilers simpler to write (no heroics needed in avoiding immediate register reuse) and it keeps the instruction encoding small. It effectively decouples the programmer visible register count from the physical one, allowing Intel to optimize as needed without ISA changes.

Segmented Memory

Ok… No excuse. No plus side. Dumb idea. They mostly expunged it in the 64-bit ISA. Lets pretend it didn’t happen and move on, ok?

Odd Specialized Instructions

Here’s the thing… Some of them are actually good and useful. Even most “RISC” ISAs ended up adding some as well. Certainly ARM (again see Thumb, and handling of immediates is … interesting). The RISC dream of a completely fixed width instruction encoding using only basic fundamental ops with pure load/store semantics basically died long ago, no one is trying to do this anymore.

It turns out cracking instructions into micro-ops isn’t all that hard. This gets you most of the benefits of RISC while allowing some condensed instructions that end up being quite useful. Remember instruction encoding? Having some specialized “condensed” instructions has positive effects on icache use.

Point of all this is that Intel had to go to great heroics to make x86 work in a modern CPU architecture. But they got lucky in that the ISA they designed back in the 70s is actually very conductive to implementation in a modern CPU architecture…

It turns out that everyone else had to go through basically the same heroics to a degree – almost all modern high performance CPUs are superscaler OOO designs with similar organization. So the only price Intel really ends up paying for their “bad” ISA is in decode, and that is a rather small price to pay by most accounts.

Everything else they would have had to do anyway.
2015-02-04 2:44 am

galvanash
Forgot to add my other 2 cents…

Feelings about the “goodness” or “badness” of different ISAs aside, they almost don’t matter anymore in light of a simple fact: virtually ALL competitive high performance processors available in the market today are Out-Of-Order designs.

By competitive, I mean they have wide market applicability and sell in reasonably large volumes.

There have only been a handful of designs in recent memory that have been in-order that still performed relatively well (POWER6, Itanium), but they tended to have very quirky performance behavior (great at some things, horrible at others). Itanium is dead, POWER6 begot POWER7 (which is OOO) and here we are.

Thing is, if you are doing OOO you might beat Intel marginally on a few things, but overall you are by definition going to end up in the same ballpark performance wise. Your ISA has absolutely nothing to do with it.

Any OOO chip is going to have basically the same performance as Intel (at best). You can of course optimize a bit for power use (ARM) or raw horsepower (POWER7), but its not going to change the big picture much. Intel knows how to make OOO chips, they long ago figured out how to work around the warts in their ISA, and they have better fabs than you do… How do you expect to beat them with what is basically the same design architecturally?

If someone is going to beat Intel (I mean really beat them) it won’t be with an OOO design. Because if you come up with some wiz-bang new optimization in OOO design it will be in their chip next year…

It will have to be in-order, because that is the only type of design that can avoid being marginalized, and it is the only type of design where the ISA does still matter. Thing is no one has figured out how to build a really fast one yet that doesn’t fall flat on its face when you throw general purpose code at it.

If the Mill (http://millcomputing.com/) ever sees the light of day it might be competitive, but that is a big might. At least they are genuinely trying something that is very, very different… Short of that one chip, nothing out there is even remotely interesting.
2015-02-06 1:04 pm

Fergy
It will have to be in-order, because that is the only type of design that can avoid being marginalized, and it is the only type of design where the ISA does still matter. Thing is no one has figured out how to build a really fast one yet that doesn’t fall flat on its face when you throw general purpose code at it.

Do you mean Denver?
2015-02-06 11:32 pm

galvanash
Do you mean Denver?

Maybe Denver 2 or 3…

It has certainly proven that Transmeta style code morphing (or DCO, or whatever they want to call it) is viable for the ARM isa, and that the resulting performance can be pretty darn impressive, at least some of the time.

But that doesn’t make it competitive with x86 in straight up performance, its far from that at this point. Its still intended to be a low power design, not a high performance one, so the comparison isn’t even fair.

Maybe after a few revisions though…

Edited 2015-02-06 23:33 UTC
2015-02-07 6:45 pm

Fergy
It has certainly proven that Transmeta style code morphing (or DCO, or whatever they want to call it) is viable for the ARM isa, and that the resulting performance can be pretty darn impressive, at least some of the time.

But that doesn’t make it competitive with x86 in straight up performance, its far from that at this point. Its still intended to be a low power design, not a high performance one, so the comparison isn’t even fair.

Maybe after a few revisions though…

What I mean is that that is an in order architecture with a smart compiler frontend for ‘legacy’ code.
2015-02-04 1:48 am

Megol

The thing with x86 is that, probably out of pure luck, it managed to select a design that turned out to be a very good choice in several places.

I know fuck-all about this stuff, but why “out of luck”? Why not assume that the largest chip maker in the world knew what it was doing?

Because anybody studying the development of x86 can clearly see the lucky events? Because some limitations later (as microarchitectures changed) became advantages?

One simple example is the LEA (load effective address) instruction. It calculates an address in the same way a memory accessing instruction does but stores the result into a register.

For the 8086 the only real usage was for address calculations. A base register could be BX, BP, SI or DI.

When the 80386 was designed Intel extended the addressing modes to use a (near) orthogonal format: [segment_base+base_register+index_register*scale+displacement] where displacement is an optional signed integer.

Suddenly the LEA instruction can be used for doing scaled 3 input adds. E.g. LEA EAX, [EBX+EBX*4-15] computes EAX=EBX*5-15.

—

Another example is the use of prefixes. They have helped x86 evolve further than most processors while originally being used for a completely different purpose.

2015-02-04 10:01 pm

viton
it’s odd to see the 80s/90s RISC marketing arguments still surviving to this stage, even if the technical reality is completely different.

Totally wrong. You need 10x resources to build fast x86 core. Even AMD is struggling for years to build one.

Intel design tied to their process quirks very tightly, unlike generic synthesized ARM cores created to be reproducible on any factory.

2015-02-04 10:49 pm

galvanash
Totally wrong. You need 10x resources to build fast x86 core. Even AMD is struggling for years to build one.

Intel design tied to their process quirks very tightly, unlike generic synthesized ARM cores created to be reproducible on any factory.

You are using “fast” as a relative term in a situation where it is not at all relative…

AMD’s weakest x86 designs completely and utterly destroy any existing ARM in performance (usually by more than double). That is ignoring the fact that they also happen to make actual high performance designs too, with which ARM isn’t even in the same galaxy performance wise…

ARM’s are great lower power processors. They are very fast low power processors. They are not, and frankly probably never will be, great high performance processors.

If by some miracle ARM ever gets there, then you can use “fast” as a relative term and have it mean something. For now, you are comparing apples and orangutans.
2015-02-05 1:47 pm

Megol
it’s odd to see the 80s/90s RISC marketing arguments still surviving to this stage, even if the technical reality is completely different.

Totally wrong. You need 10x resources to build fast x86 core. Even AMD is struggling for years to build one.

LOL! You are completely clueless

The maximum overhead for x86 support was (as cited by people who actually designed both x86 and other processors) ~15% for both extra resouces needed and power. In 2004 IIRC.

As modern chips adds more cache resources the x86 tax decreases.

Intel design tied to their process quirks very tightly, unlike generic synthesized ARM cores created to be reproducible on any factory.

What? I can’t parse that gibberish.

2015-02-03 5:27 am

Drumhellar
The ARM is similar. The best analogy is do you want a phonetic alphabet or a pictogram? The x86 has each byte maybe be an instruction with lots of prefixes, modifiers, and such, and is coerced into something RISC like inside the processors, but this takes lots of silicon. Even if you don’t go all the way to full RISC, you at least want a longer word and some orthogonality.

People tend to over-estimate the cost of decoding x86 instructions. The amount of silicon required is For example, AMD’s Bobcat uses maybe ~3% of the die for decode, maybe 8% with the UOP ROM (Atom originally didn’t decode into smaller uOPs – not sure about current versions). Once you start tacking on the other components that go into a SOC – bus interfaces, WiFi units, and especially a graphics core, the actual amount of silicon dedicated to decoding complex x86 instructions is kinda small.

Intel is pushing down power levels incredibly quickly, considering where they were with NetBurst just a few years ago.
2015-02-03 9:39 am

osvil
The only thing impressing in the Intel world is how well economies of scale work, and how much technical debt they can compensate.

Having said that… if the world went with 68k back in the time technical problems wouldn’t be that much different, and we will have processors that would be more orthogonal, but with a lot of logic to compensate a difficult to decode, variable length, instruction set stream. In that sense the most amazing feat is probably in IBM mainframe land were pretty clean processor designs (POWER) are executing decades old legacy software written in a very CISC assembler.

BTW: I despise x86 architecture quite a bit, but their implementations have their merits.
2015-02-03 12:38 pm

Megol
The ugly but still 4004 compatible x86 instruction set? It should be killed, though it might require a stake through it’s heart, beheading, and scattering the ashes.

X86 isn’t 4004 compatible. It isn’t 8080 compatible. Unless you mean they are all stored program processors and so may emulate each other given enough storage.

Long ago, it was the 68000 v.s. 8086, and the 68000 was just so much better. But the IBM PC picked the 8088 because it had an 8 bit bus and the 68008 wasn’t out yet.

Better how? The 68000 was huge physically at the introduction and cost accordingly. The 8086 was much smaller with less than 1/2 the amount of transistors of the 68000. The 68000 was a 16 bit implementation of a 32 bit ISA, the 8086 was a 16 bit processor.

They weren’t intended for the same application!

But the advantage of the 68000 ended effectively in 1985 – when the 80386 was introduced. Intel succeeded in making a 8086 compatible chip that not only could execute in a 32 bit mode but could do so in a (almost) orthogonal way.

So the advantage of the 68000 series mostly comes down to having more architectural registers – 8 integer registers and 8 address registers compared to 8 general registers (integer or address) of the 80386.

You later complain about the complexity of x86 decode but doesn’t do the same of the 68k. Strangely enough many people that actually implemented real processors claim that x86 is easier to decode efficiently than the 68k.

Looking at the instruction encoding for the 68k one soon realizes that it is designed for compact* encoding of a microcoded implementation – not for efficient decoding.

(* x86 code is in many cases more compact)

2015-02-03 8:13 am

Radio
Those Intel chips are build with the latest processes – 14 nm finFET (3D stacking of transistors). If TSMC wasn’t so sh*tty, the last generation of ARM chips would already be on the same generation of process and would still kick x86’s ass by a wide margin. So no, x86 hasn’t really improved, and I don’t think it can.
2015-02-03 9:50 am

avgalen
Weren’t we there 5 years ago?

5 years ago I bought a netbook (some variation of the ASUS EEE PC 1005) that was 64 bit and that I used extensively during travelling. It’s battery lasted me 10 hours of video-watching but I could also use it to run SharePoint 2010 on Windows Server 2008 R2 (I was teaching myself some development). The only bad parts about that machine were the harddisk (7200 RPM, not SSD) and the low resolution (1024×600) which were acceptible for those days.

That machine came out during the same time as the (ARM) iPad, was cheaper, way more powerful, more storage, more connections and had similar battery life (of course it was a bit heavier and thicker, but not as much as people would think). That machine was the reason I never understood the adversity against netbooks or the love for iPads

How about currently?

As far as I know the current MacBook Air is quoted for 15 hours of battery while the iPad is quoted for 10. ARM has been on that “10 hours” for ages because manufacturs have chosen to go up in performance. Intel has managed to increase performance while going down in power consumption up till the point where they are now fanless and razorthin. ARM is still king in phones while Intel is still king in laptops which probably has more to do with those manufacturers understanding those architectures better than with the technical benefits of each platform. It will be interesting to see where tablets/convertibles are going (if they are going anywhere at all)

2015-02-03 1:55 pm

Luminair
That was the first great netbook, but I think you might be overstating the speed. Today’s Atom chips can multitask modern websites in Windows, and it works. The old chip would probably take multiples of time longer.

The Asus X205ta is very much the modern version of that netbook, and it is superior in every way. It’s really a fantastic device considering it sold for ~$200 and is super light. The cost savings however come from the terrible LCD, so it’s not appropriate for anyone with the taste for a good computer.

I think Thom’s observation finally came to fruition in 2014. If Microsoft had made a Surface with Atom, normal people would have bought it and never noticed they were on a slow chip. Unfortunately no one has yet taken the risk of pairing Atom with high-end hardware. It’s always saddled with a bad screen. Hopefully in 2015…

2015-02-03 8:00 pm

Flatland_Spider
The original x86 chip was an embedded chip, and CISC was considered superior to RISC in embedded applications due to it’s instruction density.

Yeah, the performance per watt was better.