Word is out there that an individual is trying to develop Pentium III emulation as part of a fork of 86Box, regardless of how slow it is, in the name of “hardware preservation”. But why didn’t we do it in the first place? Why did we, developers of a PC emulator clearly aimed at the preservation of hardware and software, limit ourselves to the Pentium II and an underperforming competitor (the VIA Cyrix III), and why did we do these two knowing they’re already pretty slow to emulate? It’s story time.
When I started reading this article I had no idea there was going to be some classic open source/forking drama at the end, but even with that, it’s a good article and definitely worth a read.
It’s hard to imagine there are that many 86Box users who wouldn’t be better served by more modern software translation techniques (like that used by qemu & rosetta) or even plain virtualization. All these produce similar results and can emulate much newer CPUs while being significantly faster than emulating a CPU in software.
So my question is are there that many games & applications running under 86Box that won’t run under virtualization and faster types of emulation?
There are some limits to emulation of hardware peripherals.
Old sound cards, VGA behavior, or other random stuff might be easier in DOSBox/86Box than trying to do the same in a paravirtualized or a hypervisor based environment.
Also for some programs you want to limit CPU speed.
It is essentially a tradeoff. Do you want faster emulation? Or more accurate one?
sukru,
Is there really a difference though? Port IO is port IO and memory IO is memory IO regardless of how the CPU accomplishes it’s work. I believe at one point QEMU and Bochs used the exact same virtual hardware abstractions despite drastically different CPU implementations. Bochs emulated the CPU state in full, while QEMU used dynamic JIT code. You should be able to replace the CPU simulator of 86Box and while maintaining the same virtual PCI bus peripherals.
Today it’s still noteworthy that QEMU supports the very same virtual drivers when running its own JIT translation mode and running on a physical CPU under KVM.
If the old software can run on new CPUs (x86 has an amazing reputation for backwards compatibility), then it will likely work well under virtualization too. It’s possible that old software will exhibit new race conditions, but this has less to do with virtualization and more to faulty logic in a program that will fail even on real hardware. For example many old turbo pascal programs would crash on newer computers because the standard library’s calibration routine fails on fast computers.
Being able to faithfully slow down an emulator is an interesting problem. It’s easy to slow down to match the average execution speed, but if those pauses happen in different places it could produce unexpected problems for things like framerate and sound playback. Bad software depends on instruction timings instead of interrupts & timers. I had a midi player that suffered from tracker jitter and I think this is why. Windows software is more robust than DOS software because it was designed for concurrent multiprocessing in the first place whereas DOS programs would assume there’s nothing else running and they could use busy looping for timing. I would guess these are a common source of problems for emulators. Even so though, there’s no technical reason you shouldn’t be able to emulate authentic timing characteristics close to the instruction level. You’d just have to inject a whole bunch of delays throughout the code. In principal with enough information about instruction timings, you could incorporate the same delays in modern emulators.
Alfman,
Both 86box and DOSBox have dynamic recompilation. But it comes at the cost of accuracy: https://86box.readthedocs.io/en/latest/settings/machine.html#dynamic-recompiler
Yes, you might try do do timing in the generated code, but it would be a more challenging task. Especially when we already have enough “headroom” to do cycle-per-cycle emulation in the first place.
Take a demo like “Unreal” from the old scene days.
https://www.oldgames.sk/en/game/unreal-demo/download/1057/
It uses precisely timed VGA palette changes to display more than 256 colors at the same time. Synchronizing this in a native virtualized machine with emulated VGA hardware would really be more difficult. (And the demo still needs some tweaking to run on DOSBox).
Also this is the only way we can run them on a Raspberry PI, or other non-x86 machines.
sukru,
Assuming you actually have a way to compute the desired instruction clocks in the first place (which is implied if you have a cycle accurate emulator), then I think it should be viable to implement the appropriate delays in generated code. Especially given that we’re talking a magnitude of performance difference giving the emulator plenty of time to implement delays (it feels ironic to say this, haha).
The loop will just delay until appropriate cycles have passed.
Well, that would be challenging if you had to perfectly synchronize both the CPU and GPU with wall clock time. But that seems like overkill when we’re talking about virtualization instead of physical hardware. Wall clock time doesn’t matter as much in the virtual domain. As long as the same order of operations is maintained using an implied ratio between of GPU and CPU clocks then the operation will be mathematically identical regardless of the speed of the emulator.
For example unreal is probably using something like Bresenham’s line drawing algorithm to synchronize it’s own timing with the VGA card timing. As long as we can maintain that ratio the exact instruction times becomes much less critical.
Yes, it obviously becomes more work to create transcoders for other platforms, but it shouldn’t create insurmountable problem for code translation as long as the Pi is fast enough.
In this discussion I’ve made a fundamental assumption that the JIT code generation itself doesn’t create problems. Even if the generated “hot” code paths are cycle accurate, the JIT compiler will cause brief delays before the code can begin to execute. This shouldn’t be much of a problem for the emulated program. The JIT compilation delay that takes place doesn’t technically need to be revealed to the program running. But code generation events might be apparent to the user. In practice though old PCs were so slow to load anything that the delay will probably be much less than what we were accustomed to on original hardware.
86Box and PCem are cycle accurate emulators. Basically, they are something like bsnes or BettlePSX, but instead of precisely emulating the hardware of an NES or PS1, they precisely emulate the hardware of a PC. Yes, this means you have to choose things like motherboard brand and provide a BIOS ROM for it.
As for examples of apps, I have a collection of “killer” games that expose flaws in non-cycle accurate emulators and only work well on PCem or 86box:
1) Speed Haste (runs like crap in DOSBox and switches back to low resolution, and that’s with the hacks that they provide in their compatibility database)
2) Need For Speed (the first one) in Windows mode (doesn’t launch in VirtualBox)
3) Need For Speed II and III (running in software-renderer mode) (they don’t launch in VirtualBox)
4) PIKTO (a French-made game I own) (doesn’t launch in VirtualBox)
For all cases, I had installed graphics drivers in VirtualBox, so it wasn’t lack of SVGA support causing it.
So yeah, PCem and 86box are mainly for games.
When it comes to game emulation, I personally don’t understand the need for half-arsed emulators like zNes, ePSXe and DOSBox (including DOSBox-X), considering that most PCs nowadays can run cycle-accurate emulators. With cycle-accurate emulators you simply don’t have to worry about compatibility, they just work.
PS: Need For Speed II and III have patches in PCGamingWiki to run them in modern Windows, but if you want to run them as they are, you can in PCem or 86Box.
(btw the OS in VirtualBox was Windows 98SE, so it wasn’t an OS issue either)
DOSBox has the advantage of being “good enough”, and having tons of nice frontends, with well tuned pre-made configs.
Even GOG was using DOSBox on their service for some games,
https://www.dosbox.com/wiki/GOG_games_that_use_DOSBox
Yes, it is not perfect, but gets the job done 95%+ of the time. For the last 5%, you really need a more accurate emulator.
Depends on what you mean “good enough”. Even in their own compatibility database, some games don’t even work.
If your goal is to run every game, then 95% with asterisks won’t cut it.
I don’t think there’s an emulator that would claim 100% compatability. Heck, you couldn’t get that with real hardware either
kurkosdr,
Windows software should be (and normally is) timer & interrupt driven, not dependent on instruction timing. Windows could and did pre-emptively steal cycles all the time in normal use. It’s objectively bad to write software that critically depends on instruction timing. It will become very fragile on new hardware, but apparently some developers did it anyways.
Thanks for providing examples. I wonder if I can get ahold of any of those. I’ve found that DOSBOX has suboptimal timing above and beyond instruction timings, so I don’t think it’s the best test to prove that software requires cycle accurate emulation.
I’m curious if you ran those titles natively on a modern windows machine without any emulation at all if there would still be problems. I would like to try it under virtualization.
Sure, but I don’t think that “cycle accurate emulator” is mutually exclusive with the faster code generation techniques that would enable newer hardware than what 86Box supports currently.
Ah, so it doesn’t even run on newer computers without patches. Yeah that’s a clear sign of badly written software, but regardless the fact that such software exists means something has to be done to support it. I’m not convinced that more modern emulation techniques can’t play a role to support newer CPU targets though.
I don’t know if it’s worth going into details, but mathematically speaking the timing of most instructions will have no bearing on the CPU emulation accuracy whatsoever. The software has no idea that it’s executing too fast or too slow up until it performs some kind of IO with a periferal. The point here is that it really doesn’t matter that an algorithm runs too fast as long as it’s synchronized at the point of IO. The timing of individual instructions isn’t that important as long as it mathematically produces the same outcome as the original.
DOS game software like Speed Haste operates under the assumption it’s the only software running on the system (save for the TSRs it may use).
Then there is Windows software like Need For Speed III which has a micro-benchmark that overflows to negative on fast CPUs. It also has issues with too much RAM The person who made the compatibility patch to run it on modern Windows had to write several thousand lines of assembly.
Also, keep in mind that in game emulation cycles, by “cycle accurate” we also typically mean accurate emulation of peripherals and their synchronization with the CPU. I am not sure the reason VirtualBox fails to run many full screen applications is due to non-cycle accurate emulation or due to bad SVGA emulation. Anyway, my point is that it’s worth IMO to give PCem, bsnes and BeetlePSX the extra CPU compared to faster but less accurate alternatives.
kurkosdr,
A lot of DOS software made such assumptions and as a result they didn’t work well under windows even back in the day. At least the assumption was true under DOS, but windows software had no business making such assumptions.
Yep. it leads to fragile compatibility whether you are using an emulator or not. In these cases the emulators aren’t the cause of the problem as much as bad software practices are.
I feel that by the time we were using Pentiums it was already normal for software to adjust performance dynamically rather than assuming CPU performance. Otherwise you’d have game play be dependent on the performance of the machine, which would have been considered poor quality even back then. Upgrading your machine really shouldn’t cause the game to be faulty (early sierra games had this problem).
No argument here. I think accurate emulation of faster CPUs could be achievable too.
The thing is, the same factors that contributed to the golden era of PC gaming in the 90s and 2000s (that is, no “live service” garbage, no DLC storms, no subscriptions) are the same factors that meant that code for games had a self life of 1-2 years. Once people had bought and played your game, the code had no commercial value. This meant that, as long as it worked on the systems of the time, it was good to ship.
That’s why the process of playing those old games (particularly pre-Windows 2000) involves either repairing the executable or providing a perfectly-accurate emulation, like you would on a NES. You should assume every hardware quirk, every synchronization quirk and every OS bug has been exploited.
kurkosdr,
Even with a shelf life of a couple of years, there were huge performance differences between hardware generations back then. It’s not like today where gains are far more marginal. Even back then the best practice was to not depend on precise instruction times. Clearly emulators are rehashing the problems today, but it’s important to realize they aren’t the cause of this problem. Titles that assumed instruction times would experience breakages even back in their time by upgrading to new ram or intel/amd CPU, graphics card, etc.
Well, consoles could get away with assumptions like that because every console was homogeneous. But this was always a bad practice for PCs with many different components from different vendors operating at different speeds, etc. Software that was robustly written back then is software that still runs well today under emulation.
If you’ve spent any time with low level programming for Qemu, you quickly figure out that its emulation accuracy starts and ends with “what will make the Linux driver for this hardware happy?” Very, very often Qemu’s emulation disagrees with the hardware documentation and you have to go read the source code to figure out what they’ve done. That’s a very different thing than accurately emulating a CPU down to memory timing.
Qemu is fine if you want to run Linux or something that has been written specifically to run on Qemu. If you want to run some arbitrary hunk of code that used to run on a Pentium II in the days of yore then you might need something a little more comprehensive.
Jeeves,
I understand what you’re saying, but I think you kind of jumped over the point I wanted to make, which is that 86Box could be upgraded to modern software translation techniques to execute faster targets than just a P2.
Alfman,
Dynamic translation comes at the cost of I/O accuracy.
By the time you execute a CPU instruction, a timer interrupt might have happened, DMA might have overriden a memory section, sound blaster might have finished processing an audio chunk, VGA registers might have changed indicating the CRT ray has started scanning a new line, or a new byte might be ready on the (virtual) serial port.
There are just too many things to coordinate to the host CPU timing.
sukru,
I disagree with the assertion that it has to be. Most of the time our VM are optimized to maximize performance. But there’s no reason they have to be. Anything that’s too fast can be slowed down as necessary.
Yes but unless your emulated CPU is talking directly to physical hardware on the system bus, then none of the wall clock timing matters for a virtual machine. Mathematically speaking it’s only the order of operations that matters, which a virtual machine has control over.
Just as an example, you could pause a whole system emulator and start it back up years later, both the emulated software and hardware will remain completely oblivious to the passage of time.
Alfman,
The emulated CPU will of course have no idea what is happening outside. It should have no direct connection to real hardware. We could even be emulating with pen and paper if we want.
However, what I think about dynamic recompilation (JIT) is that it will run a large series of code without going back to the emulator, Ideally, on x86, if there is no I/O or page faults, the code can run for indeterminate amount of time before we need to intervene again.
However when we want accuracy, we would need synchronization much more often in the code.
For example:
rep movsb
will not translate to the same instruction, but it will become
loop:
pusha
call hw_sync
popa
movsb
pusha
call hw_sync
popa
jmpe loop
which undermines the original intent of dynamic translation. I might be over dramatizing, but am I missing something?
Alfman,
Anyway, I have been dragging this a bit too long.
Hopefully we will have even better software preservation in the future for past systems. What concerns me is though, the current locked down ones.
sukru,
A CPU state emulator fetches opcodes, execute them, and then delays appropriate time to mimic an old CPU. Rinse and repeat. I still maintain that everything including timing can be accomplished using code generation techniques too. In other words there’s no fundamental reason that accuracy and performance must necessarily be mutually exclusive.
I will concede that VMs are generally designed to run as fast as possible without regards to legacy cycle accuracy, but I would like you to concede that in principal one could write a code generator that slows down execution to match explicit instruction timings. I think we should be able to agree on this.
Well, that’s one approach. But I do want to comment that any hardware interrupt handler that is sensitive to the values of SI/DI/CX at the time of a hardware interrupt invocation would be extremely fragile whether it’s in an emulator or oh physical hardware. Even just a tiny change in a microarchitecture’s pipeline could break said software. Even the operating system handling events unbeknownst to the software will clearly cause timing deviations. Software that cannot handle interrupts at any time has bugs that are waiting to happen whether it’s in an emulator or not.
Anyways, intel guaranties that an instruction is executed atomically, but CPU designers have the choice of finishing an instruction or rolling it back such that it restarts after an iret.
https://stackoverflow.com/questions/53687178/interrupting-instruction-in-the-middle-of-execution
“rep” gets treated as a bunch of consecutive instructions in a loop, and so we should be able to treat it in the same way in our code generation.
I would think the intent is to reproduce the same results, but to do it more efficiency. So I guess rather than looking at code generation as a completely different implementation, you could think of it as the same implementation of the CPU state emulator, but unrolled and optimized to eliminate all unused code paths for a given instruction. Since all the instruction parsing can be optimized away it becomes far more efficient. Also the JIT compiler can perform further instruction & register optimizations like a high level compiler does. This will produce the same result as the CPU state emulation including any synchronization. But since it’s more efficient it can emulate faster CPUs.
Nah, truly the fault is all mine. I actually like talking about this stuff, but I know I can get long winded.
Yeah, we’re at high risk with today’s software relying on remote data centers for dubious reasons. At least DRM can be removed. Software historians might even have to rely on warez copies. But even assuming the software is not locked down, many multiplayer games today don’t provide any P2P connectivity like they used to. These were hugely popular when I was in college, but today the requirement to connect to centralized servers has become increasingly hard coded. This will make future game play impossible when those servers shut down. 🙁
Alfman,
Internet Archive already has such an exemption:
https://archive.org/about/dmca.php
But that needs to be renewed every 3 years or so. Which is not very reassuring.
It is worse than that. The recent Halo: Infinite release had no single player campaign on disc. This was due to some quirk of releasing multiplayer early. Nevertheless, if Microsoft servers go down, even local single player will be quite difficult.
Same thing with Gran Trusimo 7, where they actually had this problem happen for a day or two. Their servers went down, and even local play was limited.
Or take Android or iOS apps. If it received and update, there is no (simple) way to roll back. There has been instances in the past where full versions of games received ads in updates, etc.
We are unfortunately not moving in the right direction.
sukru,
Simcity was criticized for this as well over purely local gameplay. It was done purely to give the publishers remote control over users. If it’s just a denial of service lever, it can be hacked out just like any other copy protection, but if some critical functionality were to be taken out of the local software and moved to the server then it becomes completely unplayable.
It sucks that publishers are spending time & money to increases failure modes with no benefit to users. 🙁
I would be so annoyed if my apps did that. I try to archive the APKs that are most critical to me, but it’s a bit tedious and I’m not consistent with it. I need to find a better way to handle this.
Related off-topic. I wish there could be like a “Retroarch” for x86 emulators. Where you can have a single GUI and choose the core that will run the emulation of your virtualHDD images.
martini,
Turns out somebody has already done it:
https://forums.libretro.com/t/guide-add-and-launch-dosbox-games-in-retroarch/31356
https://www.digimoot.com/retropie-dosbox-setup-guide/
Not “click and play”, but a bit involving. Nevertheless the DOSbox core is already in retroarch.