Run x86-64 games on RISC-V with felix86

Thom Holwerda 2025-05-01 Hardware 7 Comments

If RISC-V ever manages to take off, this is going to be an important tool in RISC-V users’ toolbox: felix86 is an x86-64 userspace emulator for RISC-V.

felix86 emulates an x86-64 CPU running in userspace, which is to say it is not a virtual machine like VMware, rather it directly translates the instructions of an application and mostly uses the host Linux kernel to handle syscalls.
Currently, translation happens during execution time, also known as just-in-time (JIT) recompilation. The JIT recompiler in felix86 is focused on fast compilation speed and performs minimal optimizations. It utilizes extensions found on the host system such as the vector extension for SIMD operations, or the B extension for emulating bit manipulation extensions like BMI. The only mandatory extensions for felix86 are G, which every RISC-V general purpose computer should already have, and v1.0 of the standard vector extension.
↫ felix86 website

The project is still in early development, but a number of popular games already work, which is quite impressive. The code’s on GitHub under the MIT license.

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

7 Comments

2025-05-01 4:30 pm
Alfman verbose=1
Cool project. The article doesn’t cover the question that’s probably on everyone’s mind: performance. The usability of code translation depends on compatibility and performance. The compat page shows that it still needs work, but this can keep improving.
https://felix86.com/compat/
For performance, what’s the overhead for translated code versus native code?
Also qemu does the exact same thing so I’d like to hear more about how the goals of this project differ from those of qemu? Is this merely an alternative or is there something that will differentiate it?

2025-05-05 3:47 pm
LeFantome
@Alfman
You are a well informed person so perhaps you know more about this than I do. However, I do not think that QEMU does the same thing as Felix86.
QEMU is a VM platform. It is going to emulate everything. Felix86 is going to JIT the application logic from x86-64 to RISC-V but the Linux system calls are going to be sent directly to the host kernel, where they will be executed natively. While QEMU does employ JIT techniques, I do not believe that QEMU is so “Linux aware”. And, as with a container, there is no “guest operating system” here. Again, things run directly on the host kernel. This is going to lead to a significant difference to both performance and resource use.
When you combine this with the fact that, for games, a lot of the heavy lifting is done by the GPU, I think performance could be pretty good. The GPU stuff can be handled by the GPU natively without concern for how the CPU instructions are being handled.

2025-05-05 6:21 pm
Alfman verbose=1
LeFantome,
QEMU is a VM platform. It is going to emulate everything. Felix86 is going to JIT the application logic from x86-64 to RISC-V but the Linux system calls are going to be sent directly to the host kernel, where they will be executed natively.
QEMU supports both VM as well as userspace emulation. I should have made my post clearer because many QEMU users may only be familiar with the full system emulation…
https://www.qemu.org/docs/master/user/main.html
QEMU user space emulation has the following notable features:
System call translation:
QEMU includes a generic system call translator. This means that the parameters of the system calls can be converted to fix endianness and 32/64-bit mismatches between hosts and targets. IOCTLs can be converted too.
POSIX signal handling:
QEMU can redirect to the running program all signals coming from the host (such as SIGALRM), as well as synthesize signals from virtual CPU exceptions (for example SIGFPE when the program executes a division by zero).
QEMU relies on the host kernel to emulate most signal system calls, for example to emulate the signal mask. On Linux, QEMU supports both normal and real-time signals.
Threading:
On Linux, QEMU can emulate the clone syscall and create a real host thread (with a separate virtual CPU) for each emulated thread. Note that not all targets currently emulate atomic operations correctly. x86 and Arm use a global lock in order to preserve their semantics.
While QEMU does employ JIT techniques, I do not believe that QEMU is so “Linux aware”. And, as with a container, there is no “guest operating system” here. Again, things run directly on the host kernel. This is going to lead to a significant difference to both performance and resource use.
My mind immediately wonders how these emulators compare to each other. Here is an unrelated benchmark comparing several userspace x86 emulators for ARM…
https://box86.org/2022/03/box86-box64-vs-qemu-vs-fex-vs-rosetta2/
In that instance, QEMU (without the benefit of KVM) is left behind by more optimized emulators. A benchmark to compare emulators on RISC-V is warranted, unfortunately I don’t have any RISC-V computers to test for myself, haha.
When you combine this with the fact that, for games, a lot of the heavy lifting is done by the GPU, I think performance could be pretty good. The GPU stuff can be handled by the GPU natively without concern for how the CPU instructions are being handled.
That’s true, many games demand more of the GPU than CPU. Emulation overhead will result in higher CPU loads and power consumption, but it may not matter if the load remains less than 100% CPU regardless. I’m guessing that microstuttering and compilation delays might be more noticeable with a JIT design. Depending on the game, new code might be executing throughout the game, or all the main code paths will be compiled up front without further involvement of the compiler inside the game loop. Somebody’s gotta benchmark this stuff for the sake of science 🙂

2025-05-06 6:25 am
LeFantome
@Alfman
Thank you so much for that. I had either forgotten or did not know about QEMU user mode.
Now I want to spend the day exploring and benchmarking as you say. Sadly, I do not have the time. I think Felix is much like Fex, so those benchmarks may give us some hints. From your link, QEMU was by far the worst both in terms of performance and compatibility.
What I am blown away by right now, though, is that QEMU can do usermode emulation of BSD on a Linux kernel (so, running BSD software on Linux without a BSD kernel) and vice versa (Linux software on BSD). I for sure had no idea that such a thing was possible. I really do have to find the time to play sometime.

2025-05-06 1:21 pm
Alfman verbose=1
LeFantome,
From your link, QEMU was by far the worst both in terms of performance and compatibility.
Indeed. In the past when I tested QEMU VMs without KVM, the software emulation scored about 20-25% of native (x86 on x86). However the fact that “QEMU doesn’t integrate a pass-thru mecanism for GL by default.” seems like a death sentence for both compatibility and performance.
There are challenges when it comes to emulation across architectures. x86 in particular has stricter memory semantics, which is non trivial to replicate on architectures that have looser memory semantics. QEMU takes a hit for this…”Note that not all targets currently emulate atomic operations correctly. x86 and Arm use a global lock in order to preserve their semantics.”. I wonder how much performance QEMU looses because of the global lock. Apparently apple implemented x86 memory semantics in their ARM CPUs to help with rosetta2 emulation. It makes me wonder if box86, FEX, felix86 are emulating x86 memory semantics correctly? Most software does not depend on x86 semantics and so not implementing it could help increase performance without a negative impact. I can’t say anything definitely without going through the code and testing each emulator’s behavior.

2025-05-01 5:04 pm
kurkosdr
JIT sucks. I am so glad Android moved away from it and went AOT.
JIT for ISAs not meant to be JITed such as x86 sucks even harder.
2025-05-02 4:54 am
pikaczex
I can see Felix is trying to thunk various stuff, which is not done by qemu (e.g. it’s cheaper to load native riscv library and proxy the calls instead of calling into JITed shared libraries such as OpenGL), so it’s going to be huge benefit if code calls into external libraries. I think xbox emulator team used similar approach when calling win32 api and probably Microsoft uses similar techniques to bridge 32bit and 64bit stuff. Looks like they just cover basic stuff, however more work on the JIT would be beneficial.
It also looks like they switched to qemu like approach for translation – instead of comples SSA logic, which is good for compilers, just translate necessary stuff and let CPU do the rest.