Choosing human-readable file formats is an act of technological sovereignty. It’s about maintaining control over your data, ensuring long-term accessibility, and building systems that remain comprehensible and maintainable over time. The slight overhead of human readability pays dividends in flexibility, durability, and peace of mind.
These formats also represent a philosophy: that technology should serve human understanding rather than obscure it. In choosing transparency over convenience, we build more resilient, more maintainable, and ultimately more trustworthy systems.
↫ Adële
It’s hard not to agree with this sentiment. I definitely prefer being able to just open and read things like configuration files as if they’re text files, for all the same reasons Adële lists in their article. It just makes managing your system a lot easier, since I means you won’t have to rely on the applications the files belong to to make any changes.
I think this also extends to other areas. When I’m dealing with photo or music library tools, I want them to use the file system and directories in a human-readable way. Having to load up an entire photo management application just to sort some photos seems backwards to me; why can’t I use my much leaner file manager to do this instead? I also want emails to be stored as individual files in directories matching mailboxes inside my email client, just like BeOS used to do back in the day (note that this is far from exclusive to BeOS). If I load up my file manager, and create a new directory inside the root mail directory I designated and copy a few email files into it, my email client should reflect that.
As operating systems get ever more locked down, we’re losing the human-readability of our systems, and that’s not a good development.
Thom Holwerda,
You’re view makes sense to me here Thom, but it makes me curious if you are willing to uphold this view as criticism against systemd. Many have griped about systemd replacing text logs that could be read/scanned under any text software with binary data that can’t be directly read without conversion.
Thom is a translator who makes a living converting text into a different language because the original “human readable” text was not human readable. To claim that any text is “human readable” you have to be a bigoted racist and accept something like “all the people who can’t read English are not human”.
Worse; even if you’re happy to be a bigoted racist, “human readable” still does not work. Something like “123” leaves the reader clueless about whether it’s grams or pounds or any other units, with no idea if “unknown” is a valid possibility; and something like “one hundred and twenty three” is likely to fail because the parser you need for “human readable” isn’t able to read “human readable” either. To solve these failures, “human readable” still requires a specification that describes the language for the data – e.g. something that clearly says things like “The tag is used to provide a weighting between 0% to 100% for the algorithm, expressed as an integer from 0 to 100. The only other valid option is “auto” where a default 50% weight was implied for earlier versions of this spec and any value could be auto-determined in future versions of this spec”. In other words, you have to be stupid to think that “human readable” works (in addition to being a bigoted racist).
The only alternative to being a stupid bigoted racist is to have some kind of system to convert binary data (including all of the different incompatible encodings and languages of plain text) into something that is actually human readable; and in that case pure binary data is significantly more efficient, with file sizes an order of magnitude smaller and the possibility of memory mapping files “as is” onto structures with no need to build parsers or waste hundreds of cycles every single time you’re forced to convert “123456” into 123456 (and back again when saving, all for literally no valid reason whatsoever).
The bigger problem is that regardless of which file format it is (and regardless of whether it’s “human readable” or not) programmers are too lazy to create the tools needed to convert files into something that’s actually human readable. Part of this problem is that most of the time it’s an incredibly idiotic goal – nobody wants to (e.g.) use their text editor to edit a movie in the first place; so nobody wants create tools to convert between the mp4 file format and representations that are actually human readable.
In fact; most of the time the best representation (and the best viewer and best editor) depends on the nature of the data; so you get special apps for editing movies, and special apps for doing CAD and special apps for composing music and… ; and all of these apps convert data to/from whichever representation is the more human understandable (including representations that are human readable text if/where appropriate). Even if you’re editing actual text (e.g. writing a book) you’ll have a WYSIWYG word processor with full internationalization (and/or icons) for its menus to make sure that the controls are human readable and not “human readable”; and the file format will be a binary file format because you have to have a sadomasochism fetish to tolerate latex.
And that’s really what we’re talking about here: human understandable apps that are superior in every way that matters versus “human readable” stupid bigoted racist inferior shit.
Brendan,
Why are you responding to me like this? We’ve been respectful with each other in the past and I’m not really sure what’s going on with this post. Text vs binary have pros and cons, debating it is well and good, but it’s no place for calling people bigots and racists. I request you dial down the level of aggression, ok?
I said “to make the claim that text is “human readable” you’d have to be a bigoted racist.” but nobody had made the claim that text is “human readable”, so…
If you feel included by the words “bigoted racist” then that’s a choice you chose to make for yourself.. It would’ve been more natural for you to have assumed that the words “bigoted racist” don’t apply to you and for you to have responded with something like “Oh, yeah, I didn’t think of it that way”.
Text vs binary has pros and cons, debating it is well and good, and a necessary part of that debate is reminding people that “I read English and I’m privileged and I don’t care about anyone else in the world” is a con.
You can’t have a debate if you’re afraid to have a debate.
Brendan,
I’ll say it then: text formats don’t exist for the computers sake, they exist to be human readable. So now that your words explicitly apply to me, how do you justify calling me a bigoted racist? Maybe in your head it felt like you’d earn creativity points but I’m telling you it did not land well.
With UTF8 being widely supported, text is not limited to latin languages anymore. I guess you were trying to make a broader point about things like field/property names being in english. The argument still misses the mark though because english naming bias is still present in binary SQL databases, binary library files, file/directory names, scripting languages, command names, command arguments, etc. So blaming “text” doesn’t really work. If you’d like to make the argument about english bias for computer usage in general, then maybe a good discussion can still come from this.
Agree to put the insults behind us?
See my comment below, those pieces of software do exist in several forms (binary and “human readable” aka source code).
@Brendan
I am assuming you set out here to purposely make some deeper point by illustrating a form of “human unreadable” text. If not, I am with @Alfman.
> To claim that any text is “human readable” you have to be a…
First off, you seem to be talking mostly about if something is “intelligible” which is nice but not necessary for something to be “human readable”. First, a lot of the benefit comes from it simply being text encoded. As mentioned, that opens up a host of tools that can now be used. And if I have read a document that tells me I need to change some parameter, I can search for it, find it, and modify it even if I cannot actually understand it (perhaps it is written in Portuguese).
Another point was longevity. Text formats are likley to be readable long into the future. And, while I may not know what things mean, at least I am not staring at ones and zeros. With effort, I will be able to figure it out.
> programmers are too lazy to create the tools needed to convert files into something that’s actually human readable.
This is going to require serialization to and from some opaque format. These tools will only work on versions of the underlying format they understand. So, they are fragile. And they do not ship with the files themselves. So, over time, they will be lost. See my last point before this one.
> Something like “123” leaves the reader clueless
Not really. I can see that it is a number. I can guess that 123 is a valid value. I would probably start exploring by making it 50% larger or smaller and seeing what happens. Or, if I have a value that I have gleaned from some dark archive, I have a place to put it. And whatever appears before 123 gives me a pretty good idea what the value is for. That all seems like a significant gain over just seeing 7B wedged in between a stream of bytes in my hex editor as I try to figure out what I am looking at. Even if it is a foreign language, I might recognize Korean and be able to translate it.
And after we have all forgotten what JSON or TOML are, the LLMs of the day can still tell me what format it is.
That opens up a host of tools that are all worse, and prevents you from having better tools (with internationalization, context specific hints, built-in error checking, ….). You hover your mouse over the text “123” and a hint pops up saying “must be a value from 50 to 250” so you type in the word “chicken” and the app says “Nah, that’s not valid”, so you fix the error while you’re there? No. It’s plain text and your text editor does none of that. Half an hour after you made the error the file ends up at a half-baked parser and nobody can guess what will happen (will the parser send an error message that stops production, or assume the value was 0 and keep running with the wrong value doing who-knows-what, or….?).
You have a text file containing details of all the cars used by your company; and you want to find red cars that are more than 10 years old, so you start typing your command line with “grep …” and then you realize that it’s all too hard and clunky and bad and should’ve been a database.
Text formats that are not readable now will suddenly become readable at some point in the future?? You must have a specification describing the language, structure, options, …. regardless. That’s why (e.g.) the HTML file format (a supposedly “human readable” file format) needs 1500+ pages of specifications ( https://html.spec.whatwg.org/ ) to describe it – so that people reading it know how to interpret the data (because merely seeing text that you can’t interpret is worthless). If HTML was a binary file format you’d still need the same 1500+ pages of specifications to understand it.
Text formats make literally no difference to longevity at all. Either the specification/standard that describes the file format exist or it doesn’t, regardless of whether the file format began as text or not.
.
No, you’re just obsessed with making everything worse. If I converted a photo of a cat from JPG to “plain text” comma separated values (like “12, 56, 88, 17, 57, 85, 19, 57, 83, …” of RGB for each pixel); would it make it easier for you to see the cat? Would it help if converted the sound of a wind chime into JSON? How about stock market values – would you like a chart or XML?
You see “123” and you don’t know if it’s grams or tons or ounces or pounds or which pounds, and you don’t know if “123.456” will be accepted and/or rounded, and you don’t know if you can add human readable commas to larger numbers (like “1,234” instead of “1234”) or if that’ll confuse Europeans and/or be treated as a decimal point. You don’t know the range (the minimum and maximum values) and maybe “122” is lower than the minimum. You don’t know if the value is required or can be omitted. You don’t know if there’s any alternative keywords, and maybe “unchanged” or “default” or “auto” or “nil” are valid and more suitable.
There’s a huge amount of “everything” that you can’t deduce from seeing “123” and that huge amount of everything is massive compared to the pathetic scraps of almost nothing that you do know from seeing “123”. In the same way, if you saw that the 12th byte of a binary file contained the value 0x12 you’d be able to deduce that the 12th byte of a file contained the value 0x12; which is almost the exact same “opposite of knowledge” that you got from plain text.
No. Think of it as a hierarchy with “binary” at the top, and “UTF-8 character encoding” under that (as a sub-category of binary), with “JSON” under that (as a sub-category of UTF-8); and then 1500+ pages describing which tags are valid and what the contents mean because JSON failed to do anything that really matters.
Brendan,
Nothing prevents you from enhancing the text editor to improve the editing experience and in fact many text editors have context aware parsers/error checking/folding/etc, even for some formats where you wouldn’t expect it. Intellisense brings an even higher level of sophistication, although that’s usually reserved for programming languages. In principal though you can write the appropriate plugin for any format you want. Binary formats don’t automatically give us better tools as you seem to be assuming.. Look at regedit, a binary format that millions of people use and it’s quite awful.
Another point is that even if we assume the binary format has a nice GUI for it, it doesn’t necessarily mean everyone prefers to use it. Power users often prefer text interfaces and text formats because these generally are far easier to automate. When I was a windows user the GUI world was a mixed blessing. User friendly wizards can ease the learning curve and sometimes I was thankful to have them, but at other times they can also get in the way and become sources of frustration due to the limitations of the tools that read/write the binary formats. Text formats can also be an insurance policy if you ever need to take your data outside of proprietary software.
@Brendan
> should’ve been a database.
Agreed. Not sure what this has to do with anything. Certainly nothing that I was saying.
> Text formats make literally no difference to longevity at all.
On my Linux machine, I can read a text file made on the first day on the first IBM PC ever powered on. If that file was instead saved as a MultiMate it will be useless gibberish to me.
Not sure if trolling or just stupid, but if it’s the latter (just stupid), keep in mind that even people who don’t speak English can recognise English text, which means they can run it through Google Translate or similar.
One of my biggest gripes with systemd is that it is user-hostile in many ways, this being one of them. I’m perfectly okay with using a systemd-based Linux distro right up until it crashes on me and I can’t view system logs because the system is so broken that journalctl doesn’t work. A while back I had a Fedora laptop that started hanging during boot after an update. I went into single user boot and tried to read logs but got “journalctl: command not found”. And since the logs are binary of course I couldn’t cat them.
Call me old fashioned but I like my computer to serve me, not the other way around. I’ve said it many times: systemd is awesome in theory but fourteen years on it is still broken in implementation. Log files were not improved by making them human-unreadable, quite the opposite.
Morgan,
All of those problems could be solved if they had not pushed the cart before the horse.
Cannot boot? Journald broken?
The “recovery” memdisk should have a simple utility to print those messages, just like cat can do for syslog before.
But they needed to push this for enterprise customers. Almost all distros followed suite, since they wanted to be run on AWS / GCP and docker containers.
Yes and no. I can understand the motivation to review files’ content, but not read them. If it’s a text document, who would read a raw RTF or LaTeX file ? Who would read a CSV file ? An EML file ? Computers carry out binary data, not text. Just imagine how much processing power is spent on converting between binary and text format for HTML or JSON ? What about the schema to apply ? Just stay on the binary side of the force, yet with open format well documented and use a “binary data browser” like https://kaitai.io/ or https://hachoir.readthedocs.io/en/latest/ or https://github.com/timrid/construct-editor to get a peek into it.
Kochise,
Agreed. That is one main reason Google’s protobuf took hold everywhere, even in competitors like Microsoft.
On disk formats are sometimes extensions of on wire formats, and that is perfectly natural. We don’t transmit network data as text (anymore, it used to be ASCII). We don’t store images as text either, they are bitmaps, and even compressed ones.
These are natural states of those data, and converting them to text, and back from it would be extremely wasteful.
Nonsense. Microsoft Office XMLs are technically “human readable.” An SVG is too. In practice they certainly aren’t.
Binary formats are absolutely fine as long as they are standardized and adhering to public open specs.
On the other hand, I can only imagine how many gigawatts of power is wasted every year parsing JSON and recompiling freaking React a gazillion times. Speaking off, the minified JavaScript is technically human readable, too. In practice, it most certainly isn’t.
Image, audio or video are even better examples, they are more human accessible in binary form when can be efficiently played, and more-less useless as tables of numbers. Interoperability & composability surely are key stuff here, not particular formats.
While I agree Microsoft Office XMLs are not really human readable, I disagree regarding SVG – I have, on multiple occasions, created an SVG file with inkscape, but couldn’t get something about it quite right, so touched up the output with a text editor. It would have been harder with a binary format.
I have on occasion used tools written in a scripting language (i.e. a human readable application) _specifically because_ they were written in a scripting language, when a native rough equivalent was available and I wanted to be able to tweak the tool in some way, or just to be able to read it and see how something in it worked.
But I have also gone the other way and used a natively compiled program when a rough equivalent in a scripting language was available specifically because I was running it a lot and wanted it to be efficient, and did not care about debugging it.
PostScript and PDF are almost the same thing, except PostScript is human readable and PDF is not. They were created by the same company, both are widely implemented by other people, both are pretty well documented, but PDF is more compact than even compressed PostScript, and I do use PDF for that reason, but I also use PostScript in cases where I need to hand-tweak after generation (usually plots that I made that don’t look quite right because I’m not too expert in the plotting programs I use, or sometimes to add things).
Some programs generate postscript that is so convoluted that I can’t read it, though. That just sucks altogether, it’s the worst of both worlds.
There are pros and cons to human readable and non-human readable, and which I value more depends on the particular circumstances, how I weight those different aspects depending on the current task. I think it’s more nuanced than just one is better than the other.
Once again, “remember me” option is broken. It still logs me out after few weeks.
Anyway, human readability is great, but on the flipside, binary formats both take less space (especially if compressed) and are faster to load than parsing a text file.
darkhog,
That’s an interesting hypothesis to test.
It’s not clear to me that binary formats are necessarily smaller when compressed. Formats like XML can be very verbose, but verbose formats tend to compress well. While it’s clear why text formats may not represent data efficiently, it doesn’t necessarily mean that a binary formats will be efficient just because it’s in binary. So I’d really like to put actual numbers to this. For now the safest answer could be “it depends”.
In principal I wouldn’t mind binary formatting if there were a standard way for to get at the data without having to write software and/or convert the file just to access the data. This tends to be the big shortcoming with binary formats whereas text can often be handled with trivial tools. I love SQL for it’s power and expressiveness and I frequently need to get data into and out of the database – virtually every time this means using a text format because it’s just more straightforward to work with text.
Binary formats would be a lot easier to use if everyone agreed on the same standardized binary format and stuck to it everywhere, but this isn’t often the case.
Of course any kind of format can be made as bloated as you want. But the well-designed binary format can easily be made smaller than compressed text format.
Of course there’s a trade-off: as you remove redundancy from the file format is becomes ever harder to parse it, even for specialized tools, thus striving to achieve “smallest possible compressed size” is not always the worthwhile goal.
zde,
I would agree with you about uncompressed data. Using a byte to express one of ten digits and a delimiter is very inefficient. However after compression things become less clear and I think we’d need to look at actual examples to avoid assumptions. They might actually be pretty close.
Indeed, it also depends on whether the binary format is specially crafted, or if it’s a basic dump of structures. There’s a lot of variables and I wouldn’t want to hand-waiving them away.
I am reminded of a discussion about number storage that came up on osnews:
https://www.osnews.com/story/30272/a-constructive-look-at-the-atari-2600-basic-cartridge/
Apparently it’s a common myth even among programmers that BCD has to be used to avoid floating point errors. This myth was prevalent enough that even a mainstream application like mysql implements decimal types using BCD for storage. Ironically adding complexity to the binary format while making it much less efficient for no benefit whatsoever. I don’t know if the person responsible was a seasoned professional or just an intern, but a lesson on fixed point math would have been extremely helpful.
Anyway, it’s stuff like this that makes “the well-designed binary format can easily be made smaller than compressed text format” seem more questionable in practice given that competence is in short supply.
This is a bit off topic, but it really highlights the issue of competence in the corporate setting. The whole exchange is so infuriating when even high level managers seem clueless. It’s hard not to laugh at how ridiculous the whole thing is.
“The Most Frustrating Customer Service Call of All Time”
https://www.youtube.com/watch?v=nUpZg-Ua5ao
I once had to explain to an accountant processing my bills how I was computing minutes on my bill when the rate was only provided in dollars per hour and there was no rate for dollars per minute. I’m not sure the person really understood and the interaction prompted me to switch to hours to one decimal point and remove minutes from my bills altogether.
Totally agree with the article and came across this problem when looking into personal note management solutions. The vast majority of those that have a mobile app as well use some awful databases into which they stick your notes.
For unlucky souls that have used shit like Huawei’s native Notes, getting them out of there when the person changes phones to something non-Huawei is a nightmare involving data takeout, github code that a great person thankfully built for this, etc.
I am very happy with Obsidian for this purpose – all you have are folders, markdown files and pictures within the folder if you added them to your notes.
It seems that many here never had to fix a broken file. With a text file it’s easier and, in a bad case, you can recover some data. With a binary file it’s a lottery,
Don’t fret, there’s a handy tool to convert any binary file format to text. It even works on corrupt files.
( this was probably funnier in my head 🙂 )
@Alfman:
Eh, you got a sensible chuckle out of me. 😉
https://media.tenor.com/y-rzUJZ1fN8AAAAC/sensible-chuckle.gif
(Edit: Why do I always forget to hit the reply button twice so it’s properly nested?)
Thom Holwerda,
As someone who used to meticulously organize everything as directories, I realized this was a losing battle.
I had all my MP3s, organized by Artist \ Album \ Artist – Album – 00 – Track Name.mp3 for a long while. But things quickly got messier with multi-artist collaborations, variations of the same albums, or the random one off track I found online.
I gave up, and uploaded all my library to a music organizing software. Bonus points: It also detects duplicates, fills in missing ID3 data, pulls album arts from the internet.
Life became much better.
Photos?
Picasa + Folders for organizing. Google bough and shut down Picasa. Switched to Adobe Lightroom and cannot be happier (with the software, not their business model. That part is a let down).
Videos?
Plex is doing a great job.
Overall dedicated library managers with advantage of metadata are doing a much better job than I can do manually with folders only.
I manage all my files through the file manager on my PC, and I copy them in the same structure to an SD card on my phone. I hate Android’s obscurity towards files in the system