Linked by Thom Holwerda on Sat 11th May 2013 21:41 UTC
Windows "Windows is indeed slower than other operating systems in many scenarios, and the gap is worsening." That's one way to start an insider explanation of why Windows' performance isn't up to snuff. Written by someone who actually contributes code to the Windows NT kernel, the comment on Hacker News, later deleted but reposted with permission on Marc Bevand's blog, paints a very dreary picture of the state of Windows development. The root issue? Think of how Linux is developed, and you'll know the answer.
Thread beginning with comment 561424
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[9]: Too funny
by Alfman on Mon 13th May 2013 15:19 UTC in reply to "RE[8]: Too funny"
Alfman
Member since:
2011-01-28

satsujinka,

"In that, technically, a file system is a graph database...Skipping over that and more importantly, there's no reason why you can't implement a database on top of text files. Perhaps, there might be some performance penalty due to the size of a human word and a machine word. But most other issues (i.e. indexing) are just a matter of translating from binary to what that byte actually meant."

I realize all of this, a file system *is* a type of database, anything with a simple key-value mapping would fit naturally. More over you could re-implement just about any other type of advanced data structure on top of it, however you'd be reinventing the wheel and probably end up with something that is slower, less flexible, and less accessible than SQL.


For SQL users, the actual data format is mostly irrelevant other than performance and integrity reasons. Mysql has a text database engine, but it isn't as good as the other engines and lacks indexing.
http://dev.mysql.com/doc/refman/5.1/en/csv-storage-engine.html

Generally speaking once you've got the data in a structured database you'll never want to revert to the text processing tools again (sed/grep/cut/etc). The main reason to convert back to text form is for data interchange with others, not for querying or manipulation.



"Of course, with semi-structured text that has little embedded meta-data (i.e. syslog's logfiles,) getting adequate performance would be hard. However, I was already suggesting adding checksum meta-data; so it's not really a stretch to imagine that I'm okay with adding whatever other necessary meta-data."

I'm not sure how much security is gained by checksuming, since if an attacker gained sufficient access to manipulate the logs, it seems they could also have sufficient access to manipulate the checksums as well. This would be true whether in binary or text.

Reply Parent Score: 4

RE[10]: Too funny
by satsujinka on Mon 13th May 2013 19:34 in reply to "RE[9]: Too funny"
satsujinka Member since:
2010-03-11

How would implementing an SQL database on top of plain text be less flexible and less accessible than SQL? That is plainly a contradiction.

A CSV variant (i.e. DSV) is already understood by the standard tools. So considering MySQL uses CSV, there's no reason why we couldn't implement a query engine that can co-exist with the standard tools. And why not provide that compatibility if we can? After all, for simple searches grep will be easier to use than SQL (simply due to having less syntax.) Is this important? No. I live with systemd. But there's no reason to isolate our logs from the tools we use with the rest of the system.

Performance issues are a different matter. For log files, there probably won't be any problem... however, as I've said already; you can do indexing on plain text. You just have to add the appropriate semantics to your text format.

As you point out yourself, if a hacker has already compromised your system (such that they can manipulate the logs) there really isn't much that can be done. It is always possible for them to modify the files; regardless of whether or not they're plain text, have checksums, or are completely binary. However, consider locks on doors. A lock doesn't prevent a burglar from getting in. It's trivial to go through a window instead. However, a lock does keep people from just randomly wandering into your home. Checksums or binary provide this type of security; in that someone who doesn't know what they're doing can't easily remove traces of what they did.

Reply Parent Score: 4

RE[11]: Too funny
by Alfman on Tue 14th May 2013 02:08 in reply to "RE[10]: Too funny"
Alfman Member since:
2011-01-28

satsujinka,

"How would implementing an SQL database on top of plain text be less flexible and less accessible than SQL? That is plainly a contradiction."

That's not what I said, I said if you were to build your own custom database over top file system primitives, it's unlikely to be as flexible or accessible as an SQL database. The applications that I know of which do use a file system database are quite limited and not even remotely close to being SQL-complete (for example postfix mail queue). Anyways, given that all text logging systems to my knowledge use a flat files and not a file system database, I'd like for us to move past this particular issue.



"A CSV variant (i.e. DSV) is already understood by the standard tools. So considering MySQL uses CSV, there's no reason why we couldn't implement a query engine that can co-exist with the standard tools. And why not provide that compatibility if we can?"


The thing is, once you have data in a database, you wouldn't ever have a need to use the standard text tools to access the data since they're largely inferior to SQL (unless of course you didn't know SQL).

I don't object to your choice of using a text database engine if you want to. CSV is often a least common denominator format, which is simultaneously a strength (because it's pervasive) and weakness (because it lacks alot of the more advanced features a database can normally provide). But the choice is yours to make.



"Performance issues are a different matter. For log files, there probably won't be any problem... however, as I've said already; you can do indexing on plain text. You just have to add the appropriate semantics to your text format."

How do you index a plain text file using standard tools and then go on to query your records via that index? Wouldn't you need to write customized scripts to build and query the index? It seems to me that you need to rebuild custom tools frequently every you want to do something that SQL has built in.

Reply Parent Score: 2