Linked by Thom Holwerda on Sat 11th May 2013 21:41 UTC
Windows "Windows is indeed slower than other operating systems in many scenarios, and the gap is worsening." That's one way to start an insider explanation of why Windows' performance isn't up to snuff. Written by someone who actually contributes code to the Windows NT kernel, the comment on Hacker News, later deleted but reposted with permission on Marc Bevand's blog, paints a very dreary picture of the state of Windows development. The root issue? Think of how Linux is developed, and you'll know the answer.
Thread beginning with comment 561362
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[8]: Too funny
by satsujinka on Mon 13th May 2013 07:00 UTC in reply to "RE[7]: Too funny"
satsujinka
Member since:
2010-03-11

While cdude was being ostentatious, he does have a point. In that, technically, a file system is a graph database...

Skipping over that and more importantly, there's no reason why you can't implement a database on top of text files. Perhaps, there might be some performance penalty due to the size of a human word and a machine word. But most other issues (i.e. indexing) are just a matter of translating from binary to what that byte actually meant.

Of course, with semi-structured text that has little embedded meta-data (i.e. syslog's logfiles,) getting adequate performance would be hard. However, I was already suggesting adding checksum meta-data; so it's not really a stretch to imagine that I'm okay with adding whatever other necessary meta-data.

Reply Parent Score: 3

RE[9]: Too funny
by Alfman on Mon 13th May 2013 15:19 in reply to "RE[8]: Too funny"
Alfman Member since:
2011-01-28

satsujinka,

"In that, technically, a file system is a graph database...Skipping over that and more importantly, there's no reason why you can't implement a database on top of text files. Perhaps, there might be some performance penalty due to the size of a human word and a machine word. But most other issues (i.e. indexing) are just a matter of translating from binary to what that byte actually meant."

I realize all of this, a file system *is* a type of database, anything with a simple key-value mapping would fit naturally. More over you could re-implement just about any other type of advanced data structure on top of it, however you'd be reinventing the wheel and probably end up with something that is slower, less flexible, and less accessible than SQL.


For SQL users, the actual data format is mostly irrelevant other than performance and integrity reasons. Mysql has a text database engine, but it isn't as good as the other engines and lacks indexing.
http://dev.mysql.com/doc/refman/5.1/en/csv-storage-engine.html

Generally speaking once you've got the data in a structured database you'll never want to revert to the text processing tools again (sed/grep/cut/etc). The main reason to convert back to text form is for data interchange with others, not for querying or manipulation.



"Of course, with semi-structured text that has little embedded meta-data (i.e. syslog's logfiles,) getting adequate performance would be hard. However, I was already suggesting adding checksum meta-data; so it's not really a stretch to imagine that I'm okay with adding whatever other necessary meta-data."

I'm not sure how much security is gained by checksuming, since if an attacker gained sufficient access to manipulate the logs, it seems they could also have sufficient access to manipulate the checksums as well. This would be true whether in binary or text.

Reply Parent Score: 4

RE[10]: Too funny
by satsujinka on Mon 13th May 2013 19:34 in reply to "RE[9]: Too funny"
satsujinka Member since:
2010-03-11

How would implementing an SQL database on top of plain text be less flexible and less accessible than SQL? That is plainly a contradiction.

A CSV variant (i.e. DSV) is already understood by the standard tools. So considering MySQL uses CSV, there's no reason why we couldn't implement a query engine that can co-exist with the standard tools. And why not provide that compatibility if we can? After all, for simple searches grep will be easier to use than SQL (simply due to having less syntax.) Is this important? No. I live with systemd. But there's no reason to isolate our logs from the tools we use with the rest of the system.

Performance issues are a different matter. For log files, there probably won't be any problem... however, as I've said already; you can do indexing on plain text. You just have to add the appropriate semantics to your text format.

As you point out yourself, if a hacker has already compromised your system (such that they can manipulate the logs) there really isn't much that can be done. It is always possible for them to modify the files; regardless of whether or not they're plain text, have checksums, or are completely binary. However, consider locks on doors. A lock doesn't prevent a burglar from getting in. It's trivial to go through a window instead. However, a lock does keep people from just randomly wandering into your home. Checksums or binary provide this type of security; in that someone who doesn't know what they're doing can't easily remove traces of what they did.

Reply Parent Score: 4