Linked by Thom Holwerda on Fri 20th Jun 2014 18:22 UTC
General Development

I gave a talk in 2007 that explained 8088 Corruption in detail, and in that talk I explained that displaying FMV using CGA in graphics mode would be impossible. This is because CGA graphics mode uses 8x the amount of video memory that 8088 Corruption was handling. Even a simple calculation assuming 24fps video reveals that the amount of data needing to be updated per second (24fps * 16KB = 384KB/s) is outside of the IBM PC's capability: CGA RAM can only be changed at a rate of 240KB/s, and most hard drives of the era operate at roughly 90KB/s. It sure felt impossible, so that's why I said it.

Then I thought about the problem for 7 years.

This is amazing. I also have no idea under which category to file this, but I settled for this one.

Thread beginning with comment 591046
To read all comments associated with this story, please click here.
The author underwhelms me...
by Odwalla on Fri 20th Jun 2014 22:13 UTC
Odwalla
Member since:
2006-02-01

The linked article starts off with the author stating he's been fascinated by digital video for decades, goes on to explain what he said about not being able to do FMV on CGA in 2007 and then goes on to explain the two breakthroughs he's discovered that made it possible for him to declare himself wrong.

Those breakthroughs are: frame deltas and buffered disk I/O.

By his own statements he brings his decades learned digital video skill into question.

Reply Score: 3

RE: The author underwhelms me...
by GDXN on Sat 21st Jun 2014 00:41 in reply to "The author underwhelms me..."
GDXN Member since:
2012-07-02

This is more the result of a common illness, normally the things that are studied are the latest and the old and/or most basic thinks are forgotten to be studied leading to the rediscovery of years old techniques as new revolutions.

Reply Parent Score: 2

WereCatf Member since:
2006-02-15

I was still impressed, it's not easy getting a 8088 to do much. That said, he had actually not realized that you don't need to update the whole screen every time you update it? That's basic graphics programming, no one sane would ever tell you to update the whole thing at all times. I don't know, that part just surprised me.

Reply Parent Score: 3

flypig Member since:
2005-07-13

I disagree (respectfully!). This looks to me like the classic difference between realtime oldskool demo programming and the pre-emptive stochastic programming most people are more familiar with now.

If you have cycles to spare on a multitasking system, updating only as much as you need to makes sense. However, if your demo requires a consistent 50 FPS update with 100% CPU utilisation, you have to take the worst case scenario of full screen redraw as your default case.

Most classic demos, just like most games today (except FMV), always do a full-screen redraw (nowadays the effort is handled by the graphics card anyway).

At any rate, even though it's a reinvention of modern video compression, I was also still impressed.

Edited 2014-06-21 11:19 UTC

Reply Parent Score: 4

Kancept Member since:
2006-01-09

But that was not basic graphics programming in the 8088 days. Back then, we still had computers that hooked up to TVs, and that /was/ a whole screen refresh, such as TVs still do to this day. So to be in the mindset that a full screen refresh was needed isn't such a farfetched thing.

Still, a neat read.

Reply Parent Score: 3

Trixter Member since:
2014-06-22

That said, he had actually not realized that you don't need to update the whole screen every time you update it? That's basic graphics programming, no one sane would ever tell you to update the whole thing at all times. I don't know, that part just surprised me.


Sorry, I wasn't clear in what I wrote. Obviously deltas have been around since the 1980s, and any sane person would use them. The problem to be solved this time around was how to represent and play back deltas within the very limited amount of time and processing power available. The reason I did entire memory moves in the first production was because (I thought) there wasn't enough CPU time to perform even two branches per delta. This time around, there still isn't, so I resorted to outputting code to avoid branches.

Reply Parent Score: 3

RE: The author underwhelms me...
by Trixter on Sun 22nd Jun 2014 16:38 in reply to "The author underwhelms me..."
Trixter Member since:
2014-06-22


Those breakthroughs are: frame deltas and buffered disk I/O.


Gosh no, that's trivial. No, actually the breakthroughs were realizing ordered dithering would greatly increase visual fidelity without significantly increasing the number of deltas, and outputting code instead of data (the encoder is a compiler).

Reply Parent Score: 4

Alfman Member since:
2011-01-28

Trixter,

Gosh no, that's trivial. No, actually the breakthroughs were realizing ordered dithering would greatly increase visual fidelity without significantly increasing the number of deltas, and outputting code instead of data (the encoder is a compiler).


It's neat that you are here! I'm curious, did you try using a conventional data format first, or did you take that route to scratch an itch (ie simply wanted to try it)? I'm not terribly familiar with CGA bit planes, but it seems that something like this might have worked without using an exotic x86 binary format.

event_loop:
; Todo: disk handling...
; Todo: keyboard handling...
; Todo: audio handling
; Todo: timing

next_span:
; [DS:SI] -> input stream pointer - 2
; [ES:BX] -> last byte in stream - 2 - max_span_size
; [ES:DI] -> output video buffer pointer
mov DI, [SI+2] ; DI = span position
mov CX, [SI+4] ; CX = number of words to copy from input stream
add SI, 6 ; [DS:SI] -> CGA pixel data
rep movsw ; copy CX words from input stream to screen
mov CX, [SI] ; CX = number times to repeat last AX
rep stosw ; repeat AX value on screen CX times
; [ES:DI] -> next span in input stream - 2

cmp SI, BX ; more data in stream?
jns next_span

jmp event_loop

So the input data would look like this:
DW 0000h, 0001h, 0000h, 4000h ; set whole screen to black
DW 0010h, 0002h, 1234h, 5678h, 0000h ; set eight pixels at position 10h to 1h, 2h, 3h, 4h, 5h, 6h, 7h, 8h
...


Maybe we could save a couple cycles with unrolling. Now bare in mind this algorithm allows for arbitrary combinations of RLE and pixel copying in every span. This is probably unnecessary flexibility, so by grouping together common span lengths we could eliminate repetition of the length field, reduce the size of the data and save a few more instructions.


DW 0003h, 0002h ; 3 spans, 2 dwords each
DW 0000h, 1111h, 1111h ; 1st span at 0000h, data=1111h, 1111h
DW 0100h, 2222h, 2222h ; 2nd span at 0100h, data=2222h, 2222h
DW 0200h, 3333h, 3333h ; 3rd span at 0200h, data=3333h, 3333h
...

A data representation like this would probably be more compact than x86 binary code. Another idea would be to allow a span to reuse bytes in the data stream that were previously sent and are still in ram, which should be fairly inexpensive "compression" even on a 8088. A pattern that repeats over and over (esp vertical objects) wouldn't need to be resent.

I consider everything I said above fairly "obvious", so I assume you probably thought of all that already. However a (possibly) novel idea would be to use probabilistic encoding. To consider the above encoding probabilistically, one might precomputed the distribution of span lengths. The decoder would then always expect A spans of length X, B spans of length Y, C spans of length Z, etc. The data stream could omit span lengths entirely since the decoder would already have the distribution. The encoder would have to be written very differently: I have A spans of length X, B spans of length Y, C spans of length Z, how can I best arrange them to maximize quality of a decoded image? This could save lots of DISK bytes while adding almost no decoder CPU overhead!

You could take this to another level and make the distributions themselves dynamic as different parts of the video might render better with different distributions...?

I have no idea which is a bigger bottleneck on the 8088 though, disk or cpu? Back when Stacker was around, it was said to make I/O faster because disk throughput was a bigger bottleneck than CPU, however this was many CPU models after the 8088.

Anyways, it's a neat mental challenge and it's cool that you followed through.

Edited 2014-06-23 05:08 UTC

Reply Parent Score: 3