Linked by Thom Holwerda on Fri 30th Nov 2018 23:47 UTC
Internet & Networking

On Halloween this year I learned two scary things. The first is that a young toddler can go trick-or-treating in your apartment building and acquire a huge amount of candy. When they are this young they have no interest in the candy itself, so you are left having to eat it all yourself.

The second scary thing is that in the heart of the ubiquitous IMAP protocol lingers a ghost of the time before UTF-8. Its name is Modified UTF-7.

Order by: Score:
Even scarrier
by Carewolf on Fri 30th Nov 2018 23:58 UTC
Carewolf
Member since:
2005-09-08

Inside UTF-7 is an even older more powerful ghost that is still lurkering EVERYWHERE...

Base64. Needed to make binaries survive buggy conversions between ASCII and EDCDIC.

And now what is EBCDIC.... No one knows for sure except it an Ancient one from the time of the mainframes, but the mainframes can not be killed, they only slumber to return when the stars are right.

Reply Score: 8

RE: Even scarrier
by kuiash on Sat 1st Dec 2018 11:22 UTC in reply to "Even scarrier"
kuiash Member since:
2018-05-21

Even BASE64 is riddled with incompatibilities. Been bitten by that before!

I'd love to understand the rationale behind EBCDIC.

Although my first computer (actually /mine/, not borrowed) didn't support ASCII. The ZX81 (and it's predecessor) have a character set all of their own invention with only 64 characters and nothing is in any "standard" place.

The very first computer I programmed didn't have a character set at all in any meaningful sense.

The KIM-1 had 7 segment (plus a dot) LED readouts. So the "character codes" were the pattern of ons/offs you needed to represent each character. Not all could be displayed. "M" and "W" were just not possible.

Reply Score: 3

RE[2]: Even scarrier
by kurkosdr on Sat 1st Dec 2018 23:09 UTC in reply to "RE: Even scarrier"
kurkosdr Member since:
2011-04-11

I'd love to understand the rationale behind EBCDIC.

So... there is this thing called BCD (binary-coded decimal), which is decimal numbers written as a series of zeros and ones. Think of it like an ASCII code only for numbers, which only needs to be 4-bits wide. The advantage of BCD is that it allows you to put decimal numbers into a binary computer without any rounding inaccuracies happening due to conversion to binary, which is important in the field of financial transactions. Immediately, computer manufacturers realised that 4 bits gives 16 possible combinations and decimal numbers used only 10 of them, which left 6 combinations to use as they please. That was Extended BCD aka EBCD. Soon, BCD got extended even further with 6-bit and 8-bit codes on top of the 4 bit code. Since the damn thing "grew" instead of being standardised, it came with some major WTFs, such several incompatible variants and gaps of undefined segments in the encoding (head over to Wikipedia for the full list of atrocities against information technology and common sense). So the rationale is in the name: And Interchange Code of Extended Binary Coded Decimal bits for computers belonging to a certain family.

The rationale for not replacing EBCDIC with ASCII is that software using EBCDIC makes certain assumptions about what a series of bits means, and the code implementing those assumptions is spread inline throughout the code. Thing of how many ASCII text tools make the assumption that a textual byte with all bits zero is a string termination byte (fails in 16-bit Unicode obviously) and you get the idea why you just can't easily convert EBCDIC software to ASCII or UTF-8.

Edited 2018-12-01 23:28 UTC

Reply Score: 2

RE[3]: Even scarrier
by kurkosdr on Sat 1st Dec 2018 23:33 UTC in reply to "RE[2]: Even scarrier"
kurkosdr Member since:
2011-04-11

BTW when I said replacing EBCDIC with ASCIIz I meant it for the textual part of the software.

Edited 2018-12-01 23:33 UTC

Reply Score: 2

RE[3]: Even scarrier
by kuiash on Sun 2nd Dec 2018 05:09 UTC in reply to "RE[2]: Even scarrier"
kuiash Member since:
2018-05-21

OK. I guess the connection to BCD makes some sense. The thing that REALLY irks me is the non-contiguous nature of letters.

And I think I know why that's the case.

It was only ever a system for storing "BCD" and that only needs 10 rows on a punch card (yup, punch cards).

There's a picture on Wiki here https://en.wikipedia.org/wiki/EBCDIC#/media/File:Blue-punch-card-fro...

The 3 top rows (unlabelled) are a kind of "space/register select" and the absence or presence of a single hole in each column (at least for registers and numbers).

OK, so I reckon the engineers put the blocks of characters on mod 16 because it's easier to calculate in binary (possibly) but there actually are no rows for bits 10 through to 15. Therefore there are gaps in the letter parts of the char sets. The last 6 of each block of 16 are empty - later that went back and backfilled them.

In my "ideal" universe '0' is at 0x00 and the digits are immediately followed by letters. *sigh*

Reply Score: 2

RE[4]: Even scarrier
by kurkosdr on Sun 2nd Dec 2018 21:44 UTC in reply to "RE[3]: Even scarrier"
kurkosdr Member since:
2011-04-11

Yes, that's the main problem with EBCDIC. Since it "grew" instead of being standardised, not only there are multiple incompatible encodings (that came into existence as the thing was extended on an as-needed basis), but the mapping of the encoding was made to serve the implementation details of the era (mainly to serve punch-cards but who knows what else) as the first and foremost priority.

BTW there is one punch-card implementation detail that has crept into ASCII: The DEL character is encoded as such that all 7 bits are ones (which places it on the other side of the table compared to the other control characters) because in punch-cards, when you wanted to delete a character, you just punched all the holes (since you cannot un-punch a hole, it was agreed that punching all holes would be the agreed way to delete aka ignore a character), and having the DEL character map to all bits one was helpful to that (the parity bit would make sure the eighth bit would get punched too).

Implementation details do creep in all standards. It's the reason the distinction between RGB 15-235 ("limited range") and RGB 0-255 ("full range") exits in digital video (it's in the Nvidia control panel and Intel Graphics control panel in case you want to find it). When analog video was digitized by studios back in the early days of digital video, it wasn't usually sent to consumers but kept archived or converted back to analog, so it made sense to encode the signal as-is, including the level difference between v-sync (RGB 0) and 0% black level (which is always higher than the v-sync so the analog TV can differentiate and was mapped to RGB 15). Unfortunately PCs used the full RGB range for things like png and gif files and even computer-generated video and mapped 0% black to RGB 0, so the difference exists and is handled by the GPU. And HDMI can handle both RGB 15-235 and RGB 0-255 (in most TV sets anyway). Another implementation detail in the HDMI standard is the fact 1080p 3D is capped to 24Hz. Since 1080p 3D has slightly higher than double the bitrate of 1080p 2D, capping 1080p 3D to 30Hz (and not 24Hz) would make it have more than double the bandwidth of 1080p at 60Hz and would require from TV manufacturers to change the pixel clocks in their HDMI inputs to accommodate the higher bandwidth. Instead it was decided to let TV manufacturers not change their pixel clocks. Which makes Nvidia 3D vision not work well over HDMI 3D because most games won't allow a refresh rate setting of 24Hz.

Implementation details are inside all standards. EBCDIC was just worse than average because there was no real standardization.

Edited 2018-12-02 21:51 UTC

Reply Score: 1

RE[5]: Even scarrier
by kurkosdr on Sun 2nd Dec 2018 21:55 UTC in reply to "RE[4]: Even scarrier"
kurkosdr Member since:
2011-04-11

BTW the reason 1080p 3D requires more than double the bandwidth of 1080p 2D instead of just double is because the left and right image are placed on top of each other and separated by a 45 horizontal lines of black (aka 1920*45 pixels), in order to give 3D TVs the time to swap from left to right without the need for any buffering. Which makes the 24Hz restriction for 1080p 3D in HDMI an implementation detail meant to serve another implementation detail.

Edited 2018-12-02 21:58 UTC

Reply Score: 2

RE[5]: Even scarrier
by Alfman on Mon 3rd Dec 2018 07:18 UTC in reply to "RE[4]: Even scarrier"
Alfman Member since:
2011-01-28

kurkosdr,

Implementation details do creep in all standards. It's the reason the distinction between RGB 15-235 ("limited range") and RGB 0-255 ("full range") exits in digital video (it's in the Nvidia control panel and Intel Graphics control panel in case you want to find it). When analog video was digitized by studios back in the early days of digital video, it wasn't usually sent to consumers but kept archived or converted back to analog, so it made sense to encode the signal as-is, including the level difference between v-sync (RGB 0) and 0% black level (which is always higher than the v-sync so the analog TV can differentiate and was mapped to RGB 15).


Interesting, I never heard of this problem with studio equipment. The fact that that studio analog levels need to differentiate "black" from "0V sync" makes sense.

I know it's decades after the fact, however the question that arises for me is why they felt the need to represent an analog gap in the digital domain at all? Was it just because the technology to remap the ranges was technologically impractical *?


Unfortunately PCs used the full RGB range for things like png and gif files and even computer-generated video and mapped 0% black to RGB 0, so the difference exists and is handled by the GPU.


I'm coming from a different angle, but I actually think it's very fortunate that computers use the full 0-255 RGB range for things like png files and even HTML files where #000000 is supposed to be black. It's better for electrical engineers to remap the ranges on their end than for a subset of the 8 bit color gamut in file formats to accommodate studio voltage levels. Voltage requirements can change, just think of how confusing it would be for digital formats to accommodate electrical requirements: your PNG file is incompatible with your TV, try a different TV or use an app to re-encode your PNG.


IMHO it makes more sense for hardware to remap 0-255 RGB to the 0.39% - 100% voltage levels or whatever is needed by studio standards. This isn't a problem anywhere today is it? If it is, do you have links so that I could read more about it?


Edit: * I have a limited electronics background, but I would think an opamp with a simple resister network could do the trick of remapping one range to another. Do you have any insight into why they wouldn't have remapped the full range of values from 0-255 with 0 for black?

Edited 2018-12-03 07:32 UTC

Reply Score: 3

RE[6]: Even scarrier
by kurkosdr on Mon 3rd Dec 2018 14:14 UTC in reply to "RE[5]: Even scarrier"
kurkosdr Member since:
2011-04-11

I agree that RGB 15-235 is the most idiotic thing ever, the most idiotic thing I had ever seen at least. Much like the case for capping 1080p 3D at 24Hz, some part (that ceased to be relevant a couple of years ago) in some input (some DAC or ADC component for the case of RGB 15-235 I guess) had to be preserved and we are stuck with it. It's in your GPU's settings and it's in DVDs, BluRays and MP4s. Don't know about WebM, but wouldn't be suprised if it has it too. And "limited-range" RGB creates problems such as compression noise that is "blacker than black" and "whiter than white" which shows up in poorly calibrated TVs and quality loss from converting between limited range and dynamic range and vice-versa (sometimes both in some poorly configured equipment).

Edited 2018-12-03 14:19 UTC

Reply Score: 1

RE[7]: Even scarrier
by Alfman on Mon 3rd Dec 2018 14:57 UTC in reply to "RE[6]: Even scarrier"
Alfman Member since:
2011-01-28

kurkosdr,

I agree that RGB 15-235 is the most idiotic thing ever, the most idiotic thing I had ever seen at least. Much like the case for capping 1080p 3D at 24Hz, some part (that ceased to be relevant a couple of years ago) in some input (some DAC or ADC component for the case of RGB 15-235 I guess) had to be preserved and we are stuck with it. It's in your GPU's settings and it's in DVDs, BluRays and MP4s. Don't know about WebM, but wouldn't be suprised if it has it too. And "limited-range" RGB creates problems such as compression noise that is "blacker than black" and "whiter than white" which shows up in poorly calibrated TVs and quality loss from converting between limited range and dynamic range and vice-versa (sometimes both in some poorly configured equipment).


I didn't realize at first what you were referring to, but now I understand you are talking about this:
https://referencehometheater.com/2014/commentary/rgb-full-vs-limited...

The levels are a biproduct of the differences color spectrum slices represented in the RGB and YCbCr color models.

https://www.faceofit.com/ntsc-vs-srgb/

Yeah some formats represent color in YCbCr, others in RGB, and some can do either. I'm most accustomed to PC RGB colors and like the simplicity of RGB representing the intensity of physical pixel elements in the screen. There's a solid logic to using that, but I guess it's a real tossup for color scientists because YCbCr is supposed to more accurately represent colors the way we perceive them.

This is an interesting topic to have in the context of UTF-7, haha!

Reply Score: 3

RE[7]: Even scarrier
by kuiash on Mon 3rd Dec 2018 20:14 UTC in reply to "RE[6]: Even scarrier"
kuiash Member since:
2018-05-21

Ha! Yeah. I've worked on video cards, GPUs, video drivers, video encoders/decoders and god only knows what in the last (nearly) 30 years.

And that "limited" range is a real pain. Its even sillier now as little of our content requires those "sync" levels.

Of course the original video recordings were literally (literally, literal) the output of a camera. Sync pulses and all.

Back to the ZX81 - the original ULA didn't provide proper 0V level HSYNC pulses OR colour burst. Consequently an old ZX81 doesn't play well with most modern TVs that need those levels for calibration. So, the older the TV the better.

Reply Score: 1

RE[3]: Even scarrier
by Alfman on Mon 3rd Dec 2018 06:24 UTC in reply to "RE[2]: Even scarrier"
Alfman Member since:
2011-01-28

kurkosdr,

So... there is this thing called BCD (binary-coded decimal), which is decimal numbers written as a series of zeros and ones. Think of it like an ASCII code only for numbers, which only needs to be 4-bits wide. The advantage of BCD is that it allows you to put decimal numbers into a binary computer without any rounding inaccuracies happening due to conversion to binary, which is important in the field of financial transactions.


Your post is informative, I didn't realize this was the way EBCDIC evolved. I always found it strange when dumping VSAM files on the mainframe that numbers were encoded to be human readable.

However I do want to make one correction: BCD is only important for humans. As far as math goes, binary numbers work fine for financial transactions. Floating point numbers exhibit rounding instability when converting between number bases and should be avoided, however integer numbers are stable regardless of number base. You can use what's called fixed point arithmetic to avoid floating point error. So long as you choose the one's unit to represent the first whole unit of currency you need to represent, then there will not be rounding inaccuracies since every number represented in binary (and every other number base including 10 for BCD) will EXACTLY equal the smallest currency unit you want represented.

So for example, if the smallest currency unit you want to be able to represent is a penny, then you make 0b0001=$0.01, 0b0010=$0.02, 0b0011=$0.03 and so on.

This came up once before:
http://www.osnews.com/thread?655616


I think BCD (and hence EBCDIC) makes lots of sense given the historical context in which humans were directly programming mainframes by hand, but it's less important today.

Edited 2018-12-03 06:33 UTC

Reply Score: 3

RE[4]: Even scarrier
by kurkosdr on Mon 3rd Dec 2018 14:08 UTC in reply to "RE[3]: Even scarrier"
kurkosdr Member since:
2011-04-11

Basically, financial institutions wanted fixed-point decimal math, don't know the exact details, but my guess is there are all kinds of complex accumulated interest calculations it makes sense, or it was just for use by humans. Not much clue really. Also, upon futher reading it turns BCD was first extended to BCDIC (multiple incompatible encodings) and then EBCDIC. Go figure.

Reply Score: 1

RE[5]: Even scarrier
by Alfman on Mon 3rd Dec 2018 14:24 UTC in reply to "RE[4]: Even scarrier"
Alfman Member since:
2011-01-28

kurkosdr,

Basically, financial institutions wanted fixed-point decimal math, don't know the exact details, but my guess is there are all kinds of complex accumulated interest calculations it makes sense, or it was just for use by humans.


When mainframes used punchcards, BCD enabled programmers to literally punch in every decimal digit as input directly into the every byte/nibble. If mainframe computers were going to interact in binary, then the programmers would have to convert every number into binary for input and convert every number from binary for output, which would have been even more unbearably inefficient and error prone than it already was.

I wasn't around back then, but logically considering that BCD numbers have a 1 to 1 representation with binary numbers, there's no mathematical need for BCD and I'm pretty certain that human data entry requirements were the driving factor for it's adoption in early systems. In any case, it's extremely rare to see BCD today except in legacy systems that continue the mainframe tradition.

Reply Score: 2

RE[6]: Even scarrier
by Vanders on Mon 3rd Dec 2018 17:22 UTC in reply to "RE[5]: Even scarrier"
Vanders Member since:
2005-07-06

In any case, it's extremely rare to see BCD today except in legacy systems that continue the mainframe tradition.

There's a surprising number of sensors that provide data in BCD.

Reply Score: 3

RE: Even scarrier
by subsider34 on Mon 3rd Dec 2018 20:40 UTC in reply to "Even scarrier"
subsider34 Member since:
2010-11-08

And now what is EBCDIC.... No one knows for sure except it an Ancient one from the time of the mainframes, but the mainframes can not be killed, they only slumber to return when the stars are right.

The stars have aligned. All hail the Ancient Ones reborn!

https://www.serverwatch.com/server-news/ibm-z14-mainframe-and-power9...
https://www.forbes.com/sites/forbestechcouncil/2018/07/06/guess-what...

Reply Score: 2

v Comment by Geft
by Geft on Mon 3rd Dec 2018 08:36 UTC
Speaking of email headers
by kriston on Mon 3rd Dec 2018 16:42 UTC
kriston
Member since:
2007-04-11

Who else remembers ISO-2022-JP and Shift-JIS?

Edited 2018-12-03 16:50 UTC

Reply Score: 2