Choosing human-readable file formats is an act of technological sovereignty. It’s about maintaining control over your data, ensuring long-term accessibility, and building systems that remain comprehensible and maintainable over time. The slight overhead of human readability pays dividends in flexibility, durability, and peace of mind.
These formats also represent a philosophy: that technology should serve human understanding rather than obscure it. In choosing transparency over convenience, we build more resilient, more maintainable, and ultimately more trustworthy systems.↫ Adële
It’s hard not to agree with this sentiment. I definitely prefer being able to just open and read things like configuration files as if they’re text files, for all the same reasons Adële lists in their article. It just makes managing your system a lot easier, since I means you won’t have to rely on the applications the files belong to to make any changes.
I think this also extends to other areas. When I’m dealing with photo or music library tools, I want them to use the file system and directories in a human-readable way. Having to load up an entire photo management application just to sort some photos seems backwards to me; why can’t I use my much leaner file manager to do this instead? I also want emails to be stored as individual files in directories matching mailboxes inside my email client, just like BeOS used to do back in the day (note that this is far from exclusive to BeOS). If I load up my file manager, and create a new directory inside the root mail directory I designated and copy a few email files into it, my email client should reflect that.
As operating systems get ever more locked down, we’re losing the human-readability of our systems, and that’s not a good development.
You’re view makes sense to me here Thom, but it makes me curious if you are willing to uphold this view as criticism against systemd. Many have griped about systemd replacing text logs that could be read/scanned under any text software with binary data that can’t be directly read without conversion.
Thom is a translator who makes a living converting text into a different language because the original “human readable” text was not human readable. To claim that any text is “human readable” you have to be a bigoted racist and accept something like “all the people who can’t read English are not human”.
Worse; even if you’re happy to be a bigoted racist, “human readable” still does not work. Something like “123” leaves the reader clueless about whether it’s grams or pounds or any other units, with no idea if “unknown” is a valid possibility; and something like “one hundred and twenty three” is likely to fail because the parser you need for “human readable” isn’t able to read “human readable” either. To solve these failures, “human readable” still requires a specification that describes the language for the data – e.g. something that clearly says things like “The tag is used to provide a weighting between 0% to 100% for the algorithm, expressed as an integer from 0 to 100. The only other valid option is “auto” where a default 50% weight was implied for earlier versions of this spec and any value could be auto-determined in future versions of this spec”. In other words, you have to be stupid to think that “human readable” works (in addition to being a bigoted racist).
The only alternative to being a stupid bigoted racist is to have some kind of system to convert binary data (including all of the different incompatible encodings and languages of plain text) into something that is actually human readable; and in that case pure binary data is significantly more efficient, with file sizes an order of magnitude smaller and the possibility of memory mapping files “as is” onto structures with no need to build parsers or waste hundreds of cycles every single time you’re forced to convert “123456” into 123456 (and back again when saving, all for literally no valid reason whatsoever).
The bigger problem is that regardless of which file format it is (and regardless of whether it’s “human readable” or not) programmers are too lazy to create the tools needed to convert files into something that’s actually human readable. Part of this problem is that most of the time it’s an incredibly idiotic goal – nobody wants to (e.g.) use their text editor to edit a movie in the first place; so nobody wants create tools to convert between the mp4 file format and representations that are actually human readable.
In fact; most of the time the best representation (and the best viewer and best editor) depends on the nature of the data; so you get special apps for editing movies, and special apps for doing CAD and special apps for composing music and… ; and all of these apps convert data to/from whichever representation is the more human understandable (including representations that are human readable text if/where appropriate). Even if you’re editing actual text (e.g. writing a book) you’ll have a WYSIWYG word processor with full internationalization (and/or icons) for its menus to make sure that the controls are human readable and not “human readable”; and the file format will be a binary file format because you have to have a sadomasochism fetish to tolerate latex.
And that’s really what we’re talking about here: human understandable apps that are superior in every way that matters versus “human readable” stupid bigoted racist inferior shit.
Why are you responding to me like this? We’ve been respectful with each other in the past and I’m not really sure what’s going on with this post. Text vs binary have pros and cons, debating it is well and good, but it’s no place for calling people bigots and racists. I request you dial down the level of aggression, ok?
See my comment below, those pieces of software do exist in several forms (binary and “human readable” aka source code).
@Brendan
I am assuming you set out here to purposely make some deeper point by illustrating a form of “human unreadable” text. If not, I am with @Alfman.
> To claim that any text is “human readable” you have to be a…
First off, you seem to be talking mostly about if something is “intelligible” which is nice but not necessary for something to be “human readable”. First, a lot of the benefit comes from it simply being text encoded. As mentioned, that opens up a host of tools that can now be used. And if I have read a document that tells me I need to change some parameter, I can search for it, find it, and modify it even if I cannot actually understand it (perhaps it is written in Portuguese).
Another point was longevity. Text formats are likley to be readable long into the future. And, while I may not know what things mean, at least I am not staring at ones and zeros. With effort, I will be able to figure it out.
> programmers are too lazy to create the tools needed to convert files into something that’s actually human readable.
This is going to require serialization to and from some opaque format. These tools will only work on versions of the underlying format they understand. So, they are fragile. And they do not ship with the files themselves. So, over time, they will be lost. See my last point before this one.
> Something like “123” leaves the reader clueless
Not really. I can see that it is a number. I can guess that 123 is a valid value. I would probably start exploring by making it 50% larger or smaller and seeing what happens. Or, if I have a value that I have gleaned from some dark archive, I have a place to put it. And whatever appears before 123 gives me a pretty good idea what the value is for. That all seems like a significant gain over just seeing 7B wedged in between a stream of bytes in my hex editor as I try to figure out what I am looking at. Even if it is a foreign language, I might recognize Korean and be able to translate it.
And after we have all forgotten what JSON or TOML are, the LLMs of the day can still tell me what format it is.
Yes and no. I can understand the motivation to review files’ content, but not read them. If it’s a text document, who would read a raw RTF or LaTeX file ? Who would read a CSV file ? An EML file ? Computers carry out binary data, not text. Just imagine how much processing power is spent on converting between binary and text format for HTML or JSON ? What about the schema to apply ? Just stay on the binary side of the force, yet with open format well documented and use a “binary data browser” like https://kaitai.io/ or https://hachoir.readthedocs.io/en/latest/ or https://github.com/timrid/construct-editor to get a peek into it.
Nonsense. Microsoft Office XMLs are technically “human readable.” An SVG is too. In practice they certainly aren’t.
Binary formats are absolutely fine as long as they are standardized and adhering to public open specs.
On the other hand, I can only imagine how many gigawatts of power is wasted every year parsing JSON and recompiling freaking React a gazillion times. Speaking off, the minified JavaScript is technically human readable, too. In practice, it most certainly isn’t.
Anyway, human readability is great, but on the flipside, binary formats both take less space (especially if compressed) and are faster to load than parsing a text file.