Linked by Thom Holwerda on Tue 5th Dec 2017 20:19 UTC
Windows

HP and Asus have announced the first Windows 10 PCs running on ARM - Snapdragon 835 - and they're boasting about instant-on, 22 hour battery life, and gigabit LTE. These machines run full Windows 10 - so not some crippled Windows RT nonsense - and support 32bit x86 applications. Microsoft hasn't unveiled a whole lot just yet about their x86-on-ARM emulation, but Ars did compile some information:

The emulator runs in a just-in-time basis, converting blocks of x86 code to equivalent blocks of ARM code. This conversion is cached both in memory (so each given part of a program only has to be translated once per run) and on disk (so subsequent uses of the program should be faster, as they can skip the translation). Moreover, system libraries - the various DLLs that applications load to make use of operating system feature - are all native ARM code, including the libraries loaded by x86 programs. Calling them "Compiled Hybrid Portable Executables" (or "chippie" for short), these libraries are ARM native code, compiled in such a way as to let them respond to x86 function calls.

While processor-intensive applications are liable to suffer a significant performance hit from this emulation - Photoshop will work in the emulator, but it won't be very fast - applications that spend a substantial amount of time waiting around for the user - such as Word - should perform with adequate performance. As one might expect, this emulation isn't available in the kernel, so x86 device drivers won't work on these systems. It's also exclusively 32-bit; software that's available only in a 64-bit x86 version won't be compatible.

I'm very curious about the eventual performance figures for this emulation, since the idea of running my garbage Win32 translation management software on a fast, energy-efficient laptop and external monitor seem quite appealing to me.

Order by: Score:
This is awesome
by Morgan on Tue 5th Dec 2017 20:40 UTC
Morgan
Member since:
2005-06-29

I've been waiting for this for a very long time. Reservations about running a Microsoft OS aside, I'm excited to see how well Windows 10 will perform and what kind of crazy portable computers will come out of this. Maybe we'll finally get a proper full Windows tablet/convertible that ticks all the right boxes.

This could be the moment Intel starts to lose its stranglehold on the mainstream computing market. Combined with Qualcomm's performance in the server market[1], they should be shaking in their boots right now.

[1] https://rwmj.wordpress.com/2017/11/20/make-j46-kernel-builds-on-qual...

Reply Score: 4

RE: This is awesome
by viton on Tue 5th Dec 2017 20:52 UTC in reply to "This is awesome"
viton Member since:
2005-08-09

This could be the moment Intel starts to lose its stranglehold on the mainstream computing market.

Intel is losing the tech process race to Samsung and TSMC. Recent roadmap leak suggests they abandoned new 10nm parts and will try to sell yet another refresh of
14nm Skylake core.
It would be a perfect time to gain market for ARM, but the window of opportunity will close soon.

Edited 2017-12-05 20:54 UTC

Reply Score: 3

RE: This is awesome
by CaptainN- on Wed 6th Dec 2017 18:05 UTC in reply to "This is awesome"
CaptainN- Member since:
2005-07-07

That'd be nice, but this is still a crippled version of Windows - unable to run 64-bit x86 code.

Reply Score: 1

RE[2]: This is awesome
by poliorcetes on Wed 6th Dec 2017 22:59 UTC in reply to "RE: This is awesome"
poliorcetes Member since:
2009-05-06

That is not correct

Reply Score: 1

RE[3]: This is awesome
by CaptainN- on Fri 8th Dec 2017 00:58 UTC in reply to "RE[2]: This is awesome"
CaptainN- Member since:
2005-07-07

Without identifying which part is not correct, your reply is all noise, and no signal

Reply Score: 0

RE[4]: This is awesome
by Kochise on Fri 8th Dec 2017 05:56 UTC in reply to "RE[3]: This is awesome"
Kochise Member since:
2006-03-03

You are true, it can't run Itanium compiled applications so it is a crippled version of Windows.

Reply Score: 2

RE[2]: This is awesome
by Morgan on Thu 7th Dec 2017 01:59 UTC in reply to "RE: This is awesome"
Morgan Member since:
2005-06-29

I have many, many more 32 bit apps on Windows than I do 64 bit. Firefox, TightVNC, and a couple of games are x64, everything else is legacy because their devs haven't ported to x64 yet. UWP apps are, as the name implies, universal so they'll run the same no matter what CPU is in there. There are practically zero 64-bit-only Windows apps in the wild.

Remember, this isn't Linux where your apps have to be the same architecture as your OS, Windows is still the king of backward compatibility (for better or worse). I don't see it being a problem for a long time to come.

Reply Score: 4

Lack of touch apps
by Square on Tue 5th Dec 2017 20:42 UTC
Square
Member since:
2005-10-01

I have a few windows 10 based tablets. The problem isn't battery life. It's that very few windows apps work well with touch. I have a 2-1 laptop that gets 10 hours of battery but I rarely use it in tablet mode as nothing I want to use on a windows machine works well with touch. Ether the icons in the app or too small or the app doesn't work the same way as it does with android. For example in chrome on android if you touch a bunch of links close together it zooms in to let you select what link. that function appears to be missing in the windows version of chrome

Edited 2017-12-05 20:43 UTC

Reply Score: 1

Comment by smashIt
by smashIt on Tue 5th Dec 2017 20:57 UTC
smashIt
Member since:
2005-07-06

1996 called and wants its FX!32 back.

Reply Score: 7

RE: Comment by smashIt
by bert64 on Tue 5th Dec 2017 22:40 UTC in reply to "Comment by smashIt"
bert64 Member since:
2007-04-23

Indeed, it's basically a port of FX!32 to ARM, only without the performance...
ARM running native code is significantly slower than current model x86, under emulation they will be far behind. Alpha on the other hand was significantly faster than x86 in its heyday.
The extra overhead of the emulation won't help battery life either, and will increase memory usage. Again not problems Alpha systems really had to content with.

ARM laptops running Linux would do 99% of what x86 laptops running linux do while providing improved power efficiently, ARM laptops running windows will be significantly worse than x86 laptops.

People will buy ARM laptops on the promise of long battery life and compatibility with windows software, but will find the software runs slowly and the battery life tanks if you actually try to do anything... Word will spread, and ARM laptops will be seen as a shit product.

Edited 2017-12-05 22:43 UTC

Reply Score: 3

RE[2]: Comment by smashIt
by viton on Tue 5th Dec 2017 23:03 UTC in reply to "RE: Comment by smashIt"
viton Member since:
2005-08-09

Indeed, it's basically a port of FX!32 to ARM, only without the performance

SD835 with x86 emulation is roughly equivalent of quad core Atom processor (Pentium N).

ARM laptops running windows will be significantly worse than x86 laptops.

These laptops are not for heavy load. SD835 perf is good enough for typical use case.
I have a few ARM/Linux systems and these systems are 2-4 times slower than SD835. However they can handle most of my tasks.

Edited 2017-12-05 23:05 UTC

Reply Score: 4

RE[3]: Comment by smashIt
by smashIt on Wed 6th Dec 2017 06:23 UTC in reply to "RE[2]: Comment by smashIt"
smashIt Member since:
2005-07-06

SD835 with x86 emulation is roughly equivalent of quad core Atom processor (Pentium N).
...
SD835 perf is good enough for typical use case.



As someone who uses a quad-core atom convertible I can tell you that word and excel are already pushing it.

Using 8x2.5 GHz to barely run word is not good...

Reply Score: 1

RE[4]: Comment by smashIt
by Alfman on Wed 6th Dec 2017 07:55 UTC in reply to "RE[3]: Comment by smashIt"
Alfman Member since:
2011-01-28

smashIt,

As someone who uses a quad-core atom convertible I can tell you that word and excel are already pushing it.

Using 8x2.5 GHz to barely run word is not good...


For a historical perspective consider that itanium emulated x86 much in the same way ARM is now doing (for all I know microsoft may even be using the same 32bit emulator today, ported to ARM).

The hope with itanium was that the x86 emulator would be a temporary tool to bridge the gap, but the native itanium applications never came and itanium became known as a very expensive and very slow x86 emulator.

In this instance with Windows 10 on ARM, things are a bit different. These devices will initially be sold to people who want compatibility, but soon after they'll likely get nudged into using microsoft's app store apps that can run natively on ARM.

Edited 2017-12-06 07:57 UTC

Reply Score: 3

RE[2]: Comment by smashIt
by areilly on Tue 5th Dec 2017 23:15 UTC in reply to "RE: Comment by smashIt"
areilly Member since:
2015-04-07

It will be interesting to see. I've used two systems that were based on dynamic translation, a Fuji Transmeta-based laptop long, long ago and a Nexus 9 tablet more recently. Both had their advantages and disadvantages, but both worked well for what I wanted at the time. JIT compiler tech has had a _lot_ of work done on it since FX!32, not necessarily for x86 so much, but Dalvik in Android, and Javascript/WebAssembly in everyone's browser. The fact that this thing (unlike all of the others I mentioned) will be caching translations to disk/flash suggests that everything will be effectively "native" shortly after install. While ARMs don't (yet) clock at the same rate as the desktop x86 parts, they do get about as much done per clock. They'll be fairly competitive with similarly clocked laptop CPUs.

Reply Score: 3

RE[3]: Comment by smashIt
by moondevil on Wed 6th Dec 2017 12:03 UTC in reply to "RE[2]: Comment by smashIt"
moondevil Member since:
2005-07-08

ART, introduced in Android 5 as Dalvik replacement, does cache native code in disk.

Between Android 5 and 7, ART does AOT compilation to native code at installation time.

Starting with Android 7, ART is a mix of assembly interpreter, JIT and AOT compiler, making use of PGO, with native code cached between builds.

https://source.android.com/devices/tech/dalvik/jit-compiler

Reply Score: 4

RE[3]: Comment by smashIt
by CaptainN- on Wed 6th Dec 2017 20:37 UTC in reply to "RE[2]: Comment by smashIt"
CaptainN- Member since:
2005-07-07

It isn't enough though to simply cache the translation - the quality of the translation matters, and that's going to be challenging to get right in any high performing way.

As a matter of interesting history - back when Apple banned Flash from iOS, Adobe create a Flash bytecode to ARM compiler using LLVM. The whole pipeline is - Actionscript 3.0 (which is really Ecmascript/JavaScript 4) -> ABC (Flash's version of webassembly) -> LLVM -> Native code. (After Flash is killed off in 2020, AIR, which is just Flash packaged up as apps will live on - take that Steve Jobs!)

It does produce very fast code (mostly due to the strength of LLVM), but it's still not as fast as even modern JavaScript engines like V8, Nitro or SpiderMonkey, which are incredibly fast these days (almost C++ fast). Part of that is the programming model - dynamic languages like AS/JS require a lot of safety checks and translation at runtime, and to do AOT the way LLVM must do with ABC, it has to bake all that into the runtime. This carries a lot of overhead. The modern JS engines are able to optimize a lot of that out at runtime, using dozens of really neat tricks.

Edited 2017-12-06 20:39 UTC

Reply Score: 0

RE[3]: Comment by smashIt
by klahjn on Sun 10th Dec 2017 03:15 UTC in reply to "RE[2]: Comment by smashIt"
klahjn Member since:
2013-08-17

I used to HATE supporting the Transmeta Crusoe, despite liking the idea (in theory).

Brings me back to the late 90's-2001 when I worked for Sony... crazy times... I remember coming into work and wondering why flag was at half mast, until they told me about the towers...

::shakes off rambling:::

damn nostalgia!! lol

Reply Score: 1

RE: Comment by smashIt
by AntonioTrindade on Wed 6th Dec 2017 17:36 UTC in reply to "Comment by smashIt"
AntonioTrindade Member since:
2012-04-23

I was thinking just the same.
I tried FX!32 back in the day and ran WinZIP on a DEC Alpha running Windows NT 3.51. It really worked great.

Reply Score: 2

Is it crippled?
by Alfman on Tue 5th Dec 2017 21:19 UTC
Alfman
Member since:
2011-01-28

Thom Holwerda,

HP and Asus have announced the first Windows 10 PCs running on ARM - Snapdragon 835 - and they're boasting about instant-on, 22 hour battery life, and gigabit LTE. These machines run full Windows 10 - so not some crippled Windows RT nonsense - and support 32bit x86 applications. Microsoft hasn't unveiled a whole lot just yet about their x86-on-ARM emulation, but Ars did compile some information:



Given that it only runs 32bit code, it seems at least somewhat crippled for that reason. But Another thing that was crippled with WinRT was the UEFI bootloader, which microsoft's licensing terms explicitly prohibited manufacturers from allowing owners to install third party secure boot keys (ie for dual booting).

I've honestly been eagerly awaiting ARM PCs for a very long time, but my biggest fear was that they would arrive and be more restricted than the x86 PCs they'd be replacing. Does anyone know if microsoft is banning alternative operating systems on these new ARM PCs that are certified to run windows 10? If so, it'll be a tragedy for ARM desktop computing ;)

ARM offers much needed competition, but if the ARM PCs that show up on the market end up robbing us of the choice of operating systems, then IMHO we'd be loosing just as much as we've gained.

Reply Score: 2

RE: Is it crippled?
by dionicio on Tue 5th Dec 2017 22:15 UTC in reply to "Is it crippled?"
dionicio Member since:
2006-07-12

Windows 10S, Alfman. This is about stronghold.

Hardware talks lots about stronghold, also. This is bridge tech. To carry market the other side of the pond. Don't think will see 64b version, ever.

Microsoft will build from here up. System DLLs already native, just as example.

On differing from past efforts, this one could actually deliver.

Reply Score: 2

RE[2]: Is it crippled?
by dionicio on Wed 6th Dec 2017 20:56 UTC in reply to "RE: Is it crippled?"
dionicio Member since:
2006-07-12

Once Inside this market segment, MS will be able to see what LEGACY We are trying to bring into mobile. What's worth recompiling, rewriting for ARM...

Reply Score: 2

RE: Is it crippled?
by viton on Tue 5th Dec 2017 22:32 UTC in reply to "Is it crippled?"
viton Member since:
2005-08-09

Given that it only runs 32bit code, it seems at least somewhat crippled for that reason.

And "that reason" is a performance.

If so, it'll be a tragedy for ARM desktop computing ;)
How so? These are laptops* and tablets*, not a desktop systems ;)
If you want proper ARM desktop, you can grab one of ARM ATX boards (ok, there is only one board <$500) and build it yourself.

Too bad, business people from companies like Cavium, Qualcomm (or even ARM) who are trying to push ARM servers still don't realize the demand of ARM workstations, even after being pointed to this by numerous of high-profile folks including Linus.

At least in a year 2017 linaro started to think about this.
http://connect.linaro.org/resource/bud17/bud17-508/

That's a pity, because even low-power versions of 24 core Centriq or 28 core TX2 can humiliate most of Intel/AMD HEDT SKUs.

* There is no doubt these devices will be as restricted and locked as possible.

Edited 2017-12-05 22:39 UTC

Reply Score: 3

RE[2]: Is it crippled?
by nitrile on Wed 6th Dec 2017 01:22 UTC in reply to "RE: Is it crippled?"
nitrile Member since:
2010-05-06

Even if for no other reasons than diversity*, if what I want to see is an alternative to an x86 PC platform, these developments are interesting, but I've little expectation this will take me where I want to go.

I have an RT tablet (in death, it's a good RDP client) - the problem, as a tablet - wasn't ever to do with performance - it was, is fast enough - jailbroken it ran recompiled Quake 3; jailbroken, you could never update it;

The problem was just the restrictions - that is it. and I'd say the whole RT project could hardly have been more designed to fail than if it were run by a saboteur.

A more sensible approach this time is evident - acknowledging that, x86 emulation fills the space of allowing it to run software so old that the developer, 6 bosses and two departments and a buyout ago, with source that doesn't exist anymore.

Anything newer compiles, possibly does already; I've already run some. It's not about emulating everything; that's nonsensical. A store (detestable as this is, on anything calling itself a PC) can make distribution of multi-arch binaries simpler - it was mandating it, that killed RT.

Maybe it doesn't this time, and there is an aura of self-awareness that was totally absent before, consumed as it was by lust for skimming 30% from an app store.

Which is probably going to still exist, and why it probably won't really work for me; my wants don't really align with theirs.

Undoubtedly, they're not really ok with it being your device just because you purchased it; will it be standard enough (defacto, by volume) and open enough that linux will run on it, with a standard boot process, and become basically a PC 2.0; and scale to say, 24 core, open workstations.

That's a pipe dream; maybe it'll come, from a completely unrelated angle. But it's a small increase in oxygen, if it merely opens vectors for competition again. PC's should never have become single source; it remains to be seen if it'll be a PC, or just a windows on-ramp; you know which they're really going for.

* I could add others, like intel, from general observation, happy to sit on their ass for 6 years until AMD finally being able to kick it again, or the management engine.

Reply Score: 1

RE[3]: Is it crippled?
by bhtooefr on Wed 6th Dec 2017 10:48 UTC in reply to "RE[2]: Is it crippled?"
bhtooefr Member since:
2009-02-19

As far as I can tell, RT was never meant to be a product (sure, it was pitched as one), it was meant to be a threat - basically, Microsoft saying that they could make a fully-functional ARM Windows device if they wanted to, and here's the proof, that consumers can buy today.

Intel responded to the threat with Bay Trail and Cherry Trail, but Willow Trail was cancelled, so Microsoft then had to make good on their threat.

Reply Score: 4

RE[2]: Is it crippled?
by Alfman on Wed 6th Dec 2017 02:18 UTC in reply to "RE: Is it crippled?"
Alfman Member since:
2011-01-28

viton,


And "that reason" is a performance.


I think it depends, 64bit code translation may not necessarily perform poorly, it could just not have been implemented. In any case x86 emulation ARM is obviously going to be for people who need compatibility more than performance.

How so? These are laptops* and tablets*, not a desktop systems ;) If you want proper ARM desktop, you can grab one of ARM ATX boards (ok, there is only one board <$500) and build it yourself.


Yea, I'm quite tired, probably could word it better. I'll gladly take a look if you send over links. Affordability is important to me though.

* There is no doubt these devices will be as restricted and locked as possible.


It'd be a pity then. I really am sad at the prospect of open/unlocked ARM desktop/laptop computers becoming a "niche" product.

Some people will say "well then just don't buy it", but to the degree that many computers on the market end up being vendor locked, this causes harms to the secondhand market who's new owners may want the hardware but not want to run the same operating system. After all, most alt-os users are actually re-purposing existing PCs to do it.

Edited 2017-12-06 02:25 UTC

Reply Score: 2

RE[3]: Is it crippled?
by viton on Wed 6th Dec 2017 03:15 UTC in reply to "RE[2]: Is it crippled?"
viton Member since:
2005-08-09

I think it depends, 64bit code translation may not necessarily perform poorly, it could just not have been implemented.


I have no idea how they implemented emulator, but I expect in 32-on-64 bit user-space case, emulator core can allocate 32-bit virtual memory region of x86 process in 64-bit ARM space and let MMU hardware handle all the things. Quick and simple.

In any case x86 emulation ARM is obviously going to be for people who need compatibility more than performance.

Compatibility is a complex thing. There are two types of code what I expect to cause problems:
Self-modifying programs.
Parallel programs with lockless synchronization.

I'll gladly take a look if you send over links.


The only (not yet obsolete) board under $500 is "8040 community board":

https://www.solid-run.com/marvell-armada-family/armada-8040-communit...

It has some serious networking support.
Yeah, dual 10GbE are cool but nearly useless at home.

In other words I don't think it is worth to bother with "real" ARM PC so far.
So my current ARM PC is $199 Jetson TK1 with SSD.
Jetson TX2 is 3x expensive and looks like embedded devkit rather than a computer.

Recently I've found an ARM workstation, but it is based on old slow ThunderX chips.
https://www.avantek.co.uk/store/avantek-32-core-cavium-thunderx-arm-...

Edited 2017-12-06 03:16 UTC

Reply Score: 3

RE[4]: Is it crippled?
by Alfman on Wed 6th Dec 2017 07:33 UTC in reply to "RE[3]: Is it crippled?"
Alfman Member since:
2011-01-28

I have no idea how they implemented emulator, but I expect in 32-on-64 bit user-space case, emulator core can allocate 32-bit virtual memory region of x86 process in 64-bit ARM space and let MMU hardware handle all the things. Quick and simple.


I can't think of a reason why a 64bit ARM processor would not handle the memory requirements of 64bit x86 processes. As long as there's enough memory to fit the amd64 app, it ought to be fine.


The number of general purpose registers would be a greater challenge though. AMD64 doubled the number registers from x86. If the emulator assumes a fixed relationship between x86 registers and ARM registers, there might be a lack of registers in the 64bit emulator, which results in the need to stuff some of them into slower ram. Software based emulators (qemu/virtualbox without HW-virt) can recompile the code stream to produce native register allocations for the target. Alas, I don't know if microsoft's emulator does this too or if it relies on a naive 1:1 opcode translation algorithm.


Compatibility is a complex thing. There are two types of code what I expect to cause problems:
Self-modifying programs.
Parallel programs with lockless synchronization.



Obviously self-modifying code has long been discouraged for numerous reasons. It's behavior is model-specific if you don't explicitly flush the cpu pipelines:
https://groups.google.com/forum/#!topic/comp.arch/k8tKb2TzufM

Nevertheless an emulator handles it by setting compiled pages to read only and invalidating the compiled code on a page fault. The cool thing about this is that the underlying cause of the code page modification doesn't matter. The page may have been modified by self modifying code, a JIT compiler, runtime debugger, loading a DLL, etc. Self-modifying code comes for free and does not create an exceptional code path from the others.


Lockless synchronization boils down to atomics and SMP memory models. I believe ARM natively supports all the same atomics, however I wasn't so sure about the memory model.

This is from 2012, things may have changed since then:
http://preshing.com/20120930/weak-vs-strong-memory-models/

Any differences in memory model would affect both 32bit and 64bit. Languages like C define their own memory models, which ought to be compatible as long as the barriers are correct, but I'm not sure if there could be side effects. Conceivably it could expose hidden race conditions? I'll have to look into it further. Anyways +1 for bringing it up!



The only (not yet obsolete) board under $500 is "8040 community board":

https://www.solid-run.com/marvell-armada-family/armada-8040-communit.....

It has some serious networking support.
Yeah, dual 10GbE are cool but nearly useless at home.

In other words I don't think it is worth to bother with "real" ARM PC so far.


I think you are right, but it's awesome nevertheless! Some day maybe...

Reply Score: 4

RE[5]: Is it crippled?
by viton on Wed 6th Dec 2017 18:19 UTC in reply to "RE[4]: Is it crippled?"
viton Member since:
2005-08-09

can't think of a reason why a 64bit ARM processor would not handle the memory requirements of 64bit x86 processes.

Isolation. You need to keep JIT cache and dynamic recompiler structures inaccessible for emulated code.
Memory protection of JIT cache? Too slow, IMHO.

which ought to be compatible as long as the barriers are correct

x86 do not need barriers because of it's memory model. Lockless code carelessly written for x86 and not tested on ARM, can fail.

Reply Score: 4

RE[6]: Is it crippled?
by Alfman on Wed 6th Dec 2017 18:33 UTC in reply to "RE[5]: Is it crippled?"
Alfman Member since:
2011-01-28

viton,

can't think of a reason why a 64bit ARM processor would not handle the memory requirements of 64bit x86 processes.

Isolation. You need to keep JIT cache and dynamic recompiler structures inaccessible for emulated code.
Memory protection of JIT cache? Too slow, IMHO.


I don't follow, specifically what makes you assert that the recompiler must be worse with 64bit code than 32bit code?


which ought to be compatible as long as the barriers are correct

x86 do not need barriers because of it's memory model. Lockless code carelessly written for x86 and not tested on ARM, can fail.


Not exactly, x86 guaranties the order of operations, but developers must use barriers even on x86, otherwise the compiler would not be able to generate proper SMP code.

Edit: Some x86 developers may neglect this, but technically it opens up race conditions even on x86. It wouldn't surprise me one bit if alot of multithreaded x86 code has these kinds of race conditions in it today.

Edited 2017-12-06 18:41 UTC

Reply Score: 2

RE[7]: Is it crippled?
by Alfman on Wed 6th Dec 2017 23:41 UTC in reply to "RE[6]: Is it crippled?"
Alfman Member since:
2011-01-28

I found a nice post about memory barriers on x86 that might help in explaining when the x86 memory model guaranties ordered semantics and when it doesn't. In particular, it works with respect to a single memory location or single cpu, but x86 cores are allowed to violate ordering across different memory addresses unless a barrier is used:

https://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-...


Here's an example:

[ a ] = 0
[ b ] = 0

CPU0
[ a ] = 1
c = [ b ]

CPU1
[ b ] = 1
d = [ a ]


Since reads and writes to different memory locations can be reordered by x86, the result is that the reads on both cores can technically execute before the writes on both cores, resulting in c==d==0 even though there's no possible way for this code to produce that output when it's run sequentially (the first instruction would ALWAYS be one of the memory addresses being set to 1, so regardless of the timing of subsequent instructions, c or d MUST equal 1).

Note that with a single CPU, the x86 semantics guaranty that the results of reordering will not change values in the code. However device drivers still need special care because memory mapped bus devices could potentially still produce side effects due to CPU reordering even if it doesn't effect the code. Linux has special primitives to generate memory barriers throughout the kernel to force the CPU to access memory in order.

https://www.kernel.org/doc/Documentation/memory-barriers.txt


There's a lot of nuance and subtle details that leaves a lot of room for error, even for experienced programmers who've written a lot of multithreaded code.

https://stackoverflow.com/questions/27595595/when-are-x86-lfence-sfe...

https://stackoverflow.com/questions/27522190/is-the-mesi-protocol-en...

Edited 2017-12-06 23:55 UTC

Reply Score: 3

RE[2]: Is it crippled?
by tsedlmeyer on Wed 6th Dec 2017 23:05 UTC in reply to "RE: Is it crippled?"
tsedlmeyer Member since:
2005-07-07

And "that reason" is a performance.


That reason is probably patents. Intel has already strongly threatened to use their patent portfolio against anyone trying to do x86 emulation. The 32bit instruction set can be mostly implemented without violating any Intel patents because any relevant ones that existed have expired. There are a few troublesome instructions but those were only implemented on later 32bit processors and were not available on the majority of 32bit x86 processors, so the impact of not implementing them is basically non-existent. The situation for the 64bit instruction set is very different. Intel has patents required to implement quite a few of the key instructions.

Reply Score: 3

RE: Is it crippled?
by tidux on Wed 6th Dec 2017 17:24 UTC in reply to "Is it crippled?"
tidux Member since:
2011-08-13

That's my concern as well. If these come with unlocked bootloaders, I'll absolutely buy one and put Linux on it.

Reply Score: 2

My guess ...
by ameasures on Tue 5th Dec 2017 23:38 UTC
ameasures
Member since:
2006-01-09

Is that half the uplift in battery life will come when native executeables are in dominant use.

A second guess is that real effort will go into reducing reliance on the 32bit x86 executeables so dual porting is already in progress. This may be via recompiled C++ or via .NET depending on each case.

Of course, once a critical mass of ARM executeables are on tap, then the picture changes: why would you get an x86 laptop with half the battery life of it's ARM equivalent.

This has been a very long time coming.

Reply Score: 4

Selling chromebooks for $$$
by bnolsen on Wed 6th Dec 2017 00:58 UTC
bnolsen
Member since:
2006-01-06

Looks like they're selling chromebooks, but sticking on windows 10s and adding 500usd to the sticker price.

Reply Score: 1

RE: Selling chromebooks for $$$
by viton on Wed 6th Dec 2017 01:13 UTC in reply to "Selling chromebooks for $$$"
viton Member since:
2005-08-09

Google Pixelbook with 8/256GB, with 10h battery life and without modem is available for $1199

Edited 2017-12-06 01:14 UTC

Reply Score: 2

we're going to need 20h of batterry
by xristos on Wed 6th Dec 2017 09:24 UTC
xristos
Member since:
2014-04-25

... because (with emulation) everything will take longer to do.

(jk)

Reply Score: 1

Virtualize it!
by mack on Wed 6th Dec 2017 10:55 UTC
mack
Member since:
2015-02-18

Now we just need the VirtualBox people to get their behind in gear and add ARM support...

Reply Score: 2

Amazing!
by ebasconp on Wed 6th Dec 2017 13:49 UTC
ebasconp
Member since:
2006-05-09

Since Windows libraries are already natively running in ARM, a lot of existing code could be just "recompiled" to run natively in these machines (imagine a lot of open source software available for Windows).

Reply Score: 2

Garbage in, garbage out.
by The123king on Wed 6th Dec 2017 14:19 UTC
The123king
Member since:
2009-05-28

If your software runs crap on a real x86, why would emulation make it run faster? That's like saying "my blender is too slow, let my chop these vegetables by hand..."

Reply Score: 1

RE: Garbage in, garbage out.
by Alfman on Wed 6th Dec 2017 15:00 UTC in reply to "Garbage in, garbage out."
Alfman Member since:
2011-01-28

The123king,

If your software runs crap on a real x86, why would emulation make it run faster? That's like saying "my blender is too slow, let my chop these vegetables by hand..."


Who/what are you referring to here. Is it this?

The emulator runs in a just-in-time basis, converting blocks of x86 code to equivalent blocks of ARM code. This conversion is cached both in memory (so each given part of a program only has to be translated once per run) and on disk (so subsequent uses of the program should be faster, as they can skip the translation).


If so, they don't mean faster than the original, they meant faster than the first time it's run because the JIT code is being compiled.


In practice I've benchmarked QEMU's software emulation to be about 15-20% as fast as on bare metal. Hardware VT acceleration was able to achieve 100% in the same test. In theory poorly optimized code on x86 could end up being faster if the emulator has a sufficiently powerful code optimizer. We may get there with advanced AI techniques, but it's not possible now.

Reply Score: 4

RE[2]: Garbage in, garbage out.
by Brendan on Wed 6th Dec 2017 18:14 UTC in reply to "RE: Garbage in, garbage out."
Brendan Member since:
2005-11-16

Hi,

In practice I've benchmarked QEMU's software emulation to be about 15-20% as fast as on bare metal. Hardware VT acceleration was able to achieve 100% in the same test. In theory poorly optimized code on x86 could end up being faster if the emulator has a sufficiently powerful code optimizer. We may get there with advanced AI techniques, but it's not possible now.


Qemu's software emulation mostly stitches together snippets (functions created by a C compiler); which is great for portability but very bad for performance. A JIT designed for performance (e.g. something that converts machine code directly, which is able to map most "guest registers" to "host registers") can achieve about 90% of native speed.

I'd expect Microsoft's JIT is designed for performance, which means that the performance of Qemu is a bad/misleading indicator of the performance you'd expect from Microsoft's JIT.

More specifically; I'd expect that the performance of these ARM systems will be bad (compared to modern 80x86) because the performance of the ARM CPU itself is bad, even when running native ARM code with no JIT at all.

- Brendan

Reply Score: 3

RE[3]: Garbage in, garbage out.
by Alfman on Wed 6th Dec 2017 19:09 UTC in reply to "RE[2]: Garbage in, garbage out."
Alfman Member since:
2011-01-28

Brendan,

Qemu's software emulation mostly stitches together snippets (functions created by a C compiler); which is great for portability but very bad for performance. A JIT designed for performance (e.g. something that converts machine code directly, which is able to map most "guest registers" to "host registers") can achieve about 90% of native speed.

I'd expect Microsoft's JIT is designed for performance, which means that the performance of Qemu is a bad/misleading indicator of the performance you'd expect from Microsoft's JIT.


I don't use qemu as an example because of it's performance, but rather because of it's cross-architecture support. Do you know of other cross-architecture emulators with 90% speed efficiency? I'd like to read about it if you've got a source.


More specifically; I'd expect that the performance of these ARM systems will be bad (compared to modern 80x86) because the performance of the ARM CPU itself is bad, even when running native ARM code with no JIT at all.


Well sure, they don't match the performance of high end x86 PCs, but these laptops are meant to compete with intel's atom and celeron processors being used in consumer laptops today. I'm definitely curious how these new ARM laptops will do in benchmarks running both x68 and native code. Hopefully Thom will post an article about it ;)

Reply Score: 3

RE[4]: Garbage in, garbage out.
by Brendan on Thu 7th Dec 2017 04:12 UTC in reply to "RE[3]: Garbage in, garbage out."
Brendan Member since:
2005-11-16

Hi,

I don't use qemu as an example because of it's performance, but rather because of it's cross-architecture support. Do you know of other cross-architecture emulators with 90% speed efficiency? I'd like to read about it if you've got a source.


That's like asking for a list of things that are both wet and dry. To get acceptable performance for a JIT it has to be tied directly to both the host architecture and the guest architecture. It can't be done in a portable way (without sacrificing most of the performance).

The most common high performance JIT is Sun's (Oracle's) Java virtual machine, which achieves around 90% of native.

Apple's Rosetta wasn't quite so well optimised and only achieved 60% to 80% of native speed.

Intel's "IA-32 Execution Layer" (for emulating 32-bit 80x86 on Itanium, after they removed 80x86 instruction set support from the CPU itself) achieved 50% to 70% of native speed; but Itanium was a peculiar beast - translating a "normal" instruction set to VLIW would've been much more challenging.

These are all at least 4 times faster than the fastest portable emulator that I know of (Qemu).

Of course these are all doing pure translation; without the benefit of storing/caching the translated code on disk to avoid "re-translation" the next time the same executable is executed. The latter 2 are also old (from around 2000?) and don't benefit from recent research/improvements.

- Brendan

Reply Score: 3

RE[5]: Garbage in, garbage out.
by Alfman on Thu 7th Dec 2017 06:52 UTC in reply to "RE[4]: Garbage in, garbage out."
Alfman Member since:
2011-01-28

Brendan,

That's like asking for a list of things that are both wet and dry. To get acceptable performance for a JIT it has to be tied directly to both the host architecture and the guest architecture. It can't be done in a portable way (without sacrificing most of the performance).

The most common high performance JIT is Sun's (Oracle's) Java virtual machine, which achieves around 90% of native.


I do commend you for this answer, but such trickery, haha ;)

Java programs aren't really emulated in modern JVMs, instead they are run natively with bits and bobs to perform the JIT compilation on the fly. The .class files are merely a binary representation of the source and not an executable binary in the same sense as x86 or arm code.

It would be conceptually very similar to take a .c program, zipping it up into a "binary" .c.gz file. And then passing this binary file to a "C-virtual machine" (which achieves 100% of native btw). But the CVM is NOT emulating C, neither is the java virtual machine emulating java(*)... both are compiling it down to native in order to run on bare metal.

* I am aware the original JVM implementations really did have virtual machine emulation, but this was very slow and not the kind of emulation that achieves "around 90% of native".


Apple's Rosetta wasn't quite so well optimised and only achieved 60% to 80% of native speed.


This may in fact be the best example, although I see conflicting information about just how fast is was.

https://www.anandtech.com/show/2064/18

https://www.theguardian.com/technology/blog/2005/jun/10/applegoesint...

http://www.mactech.com/articles/mactech/Vol.22/22.05/Office2004Benc...

Unfortunately the authors didn't establish a native speed baseline on both sides, comparing times off two two arbitrary machines doesn't give a good indication of the performance of the emulator itself.


These are all at least 4 times faster than the fastest portable emulator that I know of (Qemu).

Of course these are all doing pure translation; without the benefit of storing/caching the translated code on disk to avoid "re-translation" the next time the same executable is executed. The latter 2 are also old (from around 2000?) and don't benefit from recent research/improvements.



I think most of the interest in pursuing software emulation was lost with hardware virtualization, but if ARM PCs become more popular, it could stimulate R&D for software emulation again.


Just found this:
MS Powerpoint running on Linux on top of ARM processor via WINE and QEMU.
https://www.youtube.com/watch?v=9G06JmL9mkQ

Pretty cool, albeit slow.

Reply Score: 3

RE[6]: Garbage in, garbage out.
by Brendan on Thu 7th Dec 2017 21:19 UTC in reply to "RE[5]: Garbage in, garbage out."
Brendan Member since:
2005-11-16

Hi,

I do commend you for this answer, but such trickery, haha ;)


There's no trickery.

Java programs aren't really emulated in modern JVMs, instead they are run natively with bits and bobs to perform the JIT compilation on the fly. The .class files are merely a binary representation of the source and not an executable binary in the same sense as x86 or arm code.


This doesn't even make sense. Java byte-code is machine code for a "pretend CPU" (ignoring the "Jazelle DBX" extension ARM tried) and as far as JIT is concerned it makes very little difference how real the original CPU was.

All JIT compilers have "bits and bobs to perform the JIT compilation on the fly" - that's literally what JIT is.

It would be conceptually very similar to take a .c program, zipping it up into a "binary" .c.gz file. And then passing this binary file to a "C-virtual machine" (which achieves 100% of native btw). But the CVM is NOT emulating C, neither is the java virtual machine emulating java(*)... both are compiling it down to native in order to run on bare metal.


Nonsense. Java's compiler compiles Java source code (text) into Java byte-code (machine code for a virtual machine). This machine code is then emulated (with a combination of interpretation and JIT).

* I am aware the original JVM implementations really did have virtual machine emulation, but this was very slow and not the kind of emulation that achieves "around 90% of native".


The original JVM implementations used "pure interpretation" to emulate the virtual machine (which is slow), and newer JVM implementations use a combination of interpretation and JIT to emulate the virtual machine (which is faster). The use of JIT to emulate a virtual machine does not mean that there is no virtual machine.

Notes: For JIT compiling it costs a little overhead to do the initial translation (and optimising the result costs more), so for code that is only executed once it's faster to interpret. Most emulators that use JIT actually use multiple tiers - interpret the first time it's executed, then do a simple JIT if it's executed more than once, then do complex/optimising (more expensive) JIT if it's executed a lot. If Microsoft are caching the resulting translated code on disk, then they could just do complex/optimising (more expensive) JIT (making it much slower the first time an executable is executed, but significantly faster after that when everything is translated and optimised and there's no more "JIT overhead").

I think most of the interest in pursuing software emulation was lost with hardware virtualization, but if ARM PCs become more popular, it could stimulate R&D for software emulation again.


Hardware virtualisation only works when the hardware/CPU supports the instruction set being emulated; and CPUs that support hardware virtualisation of other CPUs is almost non-existent (the "Jazelle DBX" extension ARM tried is the only case I can think of at the moment). Modern Intel and AMD CPUs have hardware virtualisation that is only capable of virtualising 80x86 and nothing else. Some modern ARM CPUs have hardware virtualisation that is only capable of virtualising ARM and nothing else.

Most of the interest in cross-architecture software virtualisation comes from economics/marketing - Apple trying to make it possible for existing customers to switch from Motorola 68K CPUs to the completely different PowerPC CPUs; or Apple trying to make it possible for existing customers to switch from PowerPC CPUs to the completely different 80x86 CPUs; or Intel trying to make it possible for existing customers to switch from 80x86 CPUs to the completely different Itanium CPUs.

In other words; it's a solution to the "nobody writes software because there's no users, but there's no users because nobody wrote software" problem.

The other use of virtualisation is "containerisation" (e.g. for security purposes or for "hardware as a service"), and that's where hardware virtualisation is used.

- Brendan

Reply Score: 3

RE[7]: Garbage in, garbage out.
by Alfman on Fri 8th Dec 2017 02:02 UTC in reply to "RE[6]: Garbage in, garbage out."
Alfman Member since:
2011-01-28

Brendan,

This doesn't even make sense. Java byte-code is machine code for a "pretend CPU" (ignoring the "Jazelle DBX" extension ARM tried) and as far as JIT is concerned it makes very little difference how real the original CPU was.
...
Nonsense. Java's compiler compiles Java source code (text) into Java byte-code (machine code for a virtual machine). This machine code is then emulated (with a combination of interpretation and JIT).
...
The original JVM implementations used "pure interpretation" to emulate the virtual machine (which is slow), and newer JVM implementations use a combination of interpretation and JIT to emulate the virtual machine (which is faster). The use of JIT to emulate a virtual machine does not mean that there is no virtual machine.


The java byte code is merely a binary representation of the source code. The binary format makes it easier for computers to parse, but it is equivalent to the source code (minus comments and whitespacing). It's not very accurate to say that the JVM translates from one machine code to another in the same sense as x86 to ARM does.

Look, I don't want to argue over semantics, so if you want to view it as a virtual machine, then fine. But make no mistake, putting a "virtual machine" label on a java compiler does NOT put it in the same category as x86 machine code virtualization. The Java compiler's binary input is equivalent to the original source. On the x86 side we simply don't have the source code in any format to recompile natively to another platform. This is why equating the two is trickery. I'm still glad you brought it up, but the JVM performance numbers aren't necessarily meaningful for x86 machine translation.


Hardware virtualisation only works when the hardware/CPU supports the instruction set being emulated; and CPUs that support hardware virtualisation of other CPUs is almost non-existent (the "Jazelle DBX" extension ARM tried is the only case I can think of at the moment). Modern Intel and AMD CPUs have hardware virtualisation that is only capable of virtualising 80x86 and nothing else. Some modern ARM CPUs have hardware virtualisation that is only capable of virtualising ARM and nothing else.


Yes, the rise of ARM PCs may help push cross architecture software emulation further. I use hardware virtualization all the time for my servers, but it's not so clear whether I'd have a need to do cross-platform virtualization.

I may one day have ARM servers, but it's hard to envision a scenario where I'd be running an x86 VM on an ARM server or an ARM VM on an x86 server. The goal of virtualization is usually to do it with minimal performance overhead, hardware virtualization gets us there but without a lot more progress software virtualization probably does not. I'd still play with it though ;)

Reply Score: 3

RE[2]: Garbage in, garbage out.
by The123king on Thu 7th Dec 2017 09:03 UTC in reply to "RE: Garbage in, garbage out."
The123king Member since:
2009-05-28

It was aimed at Thom and his "idea of running [his] garbage Win32 translation management software on a fast, energy-efficient laptop"

Reply Score: 0

RE: Garbage in, garbage out.
by Alfman on Thu 7th Dec 2017 14:26 UTC in reply to "Garbage in, garbage out."
Alfman Member since:
2011-01-28

The123king,

It was aimed at Thom and his "idea of running [his] garbage Win32 translation management software on a fast, energy-efficient laptop"


Well, now I understand, but your original post seems to debunk claims that Thom never actually made about emulation being faster:

I'm very curious about the eventual performance figures for this emulation, since the idea of running my garbage Win32 translation management software on a fast, energy-efficient laptop and external monitor seem quite appealing to me.


It could be that Thom simply wants a fast efficient ARM laptop that also happens to run his garbage win32 translation management software. Like many businesses, his win32 apps may be a critical part of his workflow, but it doesn't mean he won't also be running fast native apps too.

Anyways until we have more data, we're mostly speculating about how good or bad this will be. I think we're all curious.

Edited 2017-12-07 14:27 UTC

Reply Score: 4

Just like Apple's Rosetta
by AntonioTrindade on Wed 6th Dec 2017 17:38 UTC
AntonioTrindade
Member since:
2012-04-23

It's not unlike Apple's Rosetta, which did quite a very good job emulating PowerPC code on an Intel Mac.

Reply Score: 2

RE: Just like Apple's Rosetta
by darknexus on Wed 6th Dec 2017 17:53 UTC in reply to "Just like Apple's Rosetta"
darknexus Member since:
2008-07-15

It's not unlike Apple's Rosetta, which did quite a very good job emulating PowerPC code on an Intel Mac.

It did indeed do a good job, but the performance impact was definitely obvious even so. I'm curious to get my hands on one of these, though I don't really like Windows 10 enough to want to buy one.

Reply Score: 2

RE[2]: Just like Apple's Rosetta
by Alfman on Wed 6th Dec 2017 18:26 UTC in reply to "RE: Just like Apple's Rosetta"
Alfman Member since:
2011-01-28

tidux,

That's my concern as well. If these come with unlocked bootloaders, I'll absolutely buy one and put Linux on it.



darknexus,

It did indeed do a good job, but the performance impact was definitely obvious even so. I'm curious to get my hands on one of these, though I don't really like Windows 10 enough to want to buy one.


Haha, ironically I'm not sure there's that much demand for ARM among typical windows users. But there is a pent up demand for ARM computers on the linux side though. I'd be ready to buy one and load up a linux distro on there. Outside of embedded devices x86 has dominated computers for my entire life, I'd like to see some variety already... Just please don't cripple secure boot to remove my choice of software!

Reply Score: 1

darknexus Member since:
2008-07-15

Yeah, while the demand for Linux computers isn't really that high in the consumer space, demand for Windows on ARM is likely even less prevalent. It's cool from a technological standpoint, but I can just see the holiday shopper:
Shopper: Hey, this is cool. Does it really get a full day on battery like my iPad can?
Sales: Sure does.
Shopper: And it'll run Word and Outlook and all that? It will really let me edit my pictures?
Sales: Sure will.
*** Shopper, at home: Damn, this thing is slow. I can't get anything done. My phone runs faster than this! I'm returning this stupid thing.

They might actually sell more of these if they sell them unlocked, since the likely market for these is going to end up being enthusiasts at least for the time being. Of course, I doubt Microsoft is going to approach this logically. I'm betting they'll try the Windows RT approach again, and will probably doom the idea as a result. In fact, the cynic in me sees a way they could use the poor performance as a way of removing the ability to side-load applications altogether, and claim they were right all along to limit Windows RT the way they did.

Reply Score: 0

dionicio Member since:
2006-07-12

Will wait for Acer, from S frame, for a start.

Reply Score: 2

Not the first time, if I understand it right
by gus3 on Wed 6th Dec 2017 17:56 UTC
gus3
Member since:
2010-09-02

Saving the translated instructions along with the original text (albeit in a separate section) is nothing new, right? Wasn't this the philosophy of the AS/400, with its Technology-Independent Machine Interface?

Reply Score: 3

Linux for ARM!
by gehersh on Thu 7th Dec 2017 01:13 UTC
gehersh
Member since:
2006-01-03

I would love to get one of such devices (with Snapdragon 835 or 845 when available) and run Linux for ARM on it. No Intel and no Windows! Like a fairly tale came true. Don't know too much about Linux for ARM, though. I believe both Debian and Ubuntu have such an animal, but I wonder whether generic 64-bit ARM version of Linux would run on any such chip (i.e., Snapdragon) and also availability of applications recompiled for ARM. (you probably can cross-compile them if push comes to shove) (?)

Reply Score: 1

RE: Linux for ARM!
by The123king on Thu 7th Dec 2017 15:59 UTC in reply to "Linux for ARM!"
The123king Member since:
2009-05-28

Raspberry Pi

Reply Score: 0