Linked by Thom Holwerda on Sat 11th May 2013 21:41 UTC
Windows "Windows is indeed slower than other operating systems in many scenarios, and the gap is worsening." That's one way to start an insider explanation of why Windows' performance isn't up to snuff. Written by someone who actually contributes code to the Windows NT kernel, the comment on Hacker News, later deleted but reposted with permission on Marc Bevand's blog, paints a very dreary picture of the state of Windows development. The root issue? Think of how Linux is developed, and you'll know the answer.
Thread beginning with comment 561461
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[10]: Too funny
by satsujinka on Mon 13th May 2013 19:34 UTC in reply to "RE[9]: Too funny"
satsujinka
Member since:
2010-03-11

How would implementing an SQL database on top of plain text be less flexible and less accessible than SQL? That is plainly a contradiction.

A CSV variant (i.e. DSV) is already understood by the standard tools. So considering MySQL uses CSV, there's no reason why we couldn't implement a query engine that can co-exist with the standard tools. And why not provide that compatibility if we can? After all, for simple searches grep will be easier to use than SQL (simply due to having less syntax.) Is this important? No. I live with systemd. But there's no reason to isolate our logs from the tools we use with the rest of the system.

Performance issues are a different matter. For log files, there probably won't be any problem... however, as I've said already; you can do indexing on plain text. You just have to add the appropriate semantics to your text format.

As you point out yourself, if a hacker has already compromised your system (such that they can manipulate the logs) there really isn't much that can be done. It is always possible for them to modify the files; regardless of whether or not they're plain text, have checksums, or are completely binary. However, consider locks on doors. A lock doesn't prevent a burglar from getting in. It's trivial to go through a window instead. However, a lock does keep people from just randomly wandering into your home. Checksums or binary provide this type of security; in that someone who doesn't know what they're doing can't easily remove traces of what they did.

Reply Parent Score: 4

RE[11]: Too funny
by Alfman on Tue 14th May 2013 02:08 in reply to "RE[10]: Too funny"
Alfman Member since:
2011-01-28

satsujinka,

"How would implementing an SQL database on top of plain text be less flexible and less accessible than SQL? That is plainly a contradiction."

That's not what I said, I said if you were to build your own custom database over top file system primitives, it's unlikely to be as flexible or accessible as an SQL database. The applications that I know of which do use a file system database are quite limited and not even remotely close to being SQL-complete (for example postfix mail queue). Anyways, given that all text logging systems to my knowledge use a flat files and not a file system database, I'd like for us to move past this particular issue.



"A CSV variant (i.e. DSV) is already understood by the standard tools. So considering MySQL uses CSV, there's no reason why we couldn't implement a query engine that can co-exist with the standard tools. And why not provide that compatibility if we can?"


The thing is, once you have data in a database, you wouldn't ever have a need to use the standard text tools to access the data since they're largely inferior to SQL (unless of course you didn't know SQL).

I don't object to your choice of using a text database engine if you want to. CSV is often a least common denominator format, which is simultaneously a strength (because it's pervasive) and weakness (because it lacks alot of the more advanced features a database can normally provide). But the choice is yours to make.



"Performance issues are a different matter. For log files, there probably won't be any problem... however, as I've said already; you can do indexing on plain text. You just have to add the appropriate semantics to your text format."

How do you index a plain text file using standard tools and then go on to query your records via that index? Wouldn't you need to write customized scripts to build and query the index? It seems to me that you need to rebuild custom tools frequently every you want to do something that SQL has built in.

Reply Parent Score: 2

RE[12]: Too funny
by satsujinka on Tue 14th May 2013 07:08 in reply to "RE[11]: Too funny"
satsujinka Member since:
2010-03-11

Your argument seems very confused to me, but maybe I'm misunderstanding you.

I'm going to drop the indexing discussion after this because I'm not sufficiently studied on the topic to explain how a database does indexing. However, if we take file=table and line=row; then I would imagine we can cache rows and mark them with their table (inside the cache.) But as I said, I don't know what databases do; so this is just my guess. Also I'm not convinced that a log database would have performance issues (as there's really only 1 record type and logs don't cross reference each other too much.)

Moving back to the top:

if you were to build your own custom SQL database over top file system primitives, it's unlikely to be as flexible or accessible as an SQL database

The bold part is what you're missing. And is why you're contradicting yourself. You are literally saying that an SQL database is less flexible and accessible then an SQL database. The backend is totally unimportant for non-performance considerations.

The thing is, once you have data in a database, you wouldn't ever have a need to use the standard text tools to access the data since they're largely inferior to SQL (unless of course you didn't know SQL).

See but there are reasons why you might not want to use a query engine. You list a trivial one (that at least a professional system admin. should try to overcome, but not everyone is a professional system admin.) Here are some more reasons:
* Because I want to verify that the query engine is returning the correct results. (Query engines have bugs too!)
* Because writing out a full query is more work than greping for some keyword. (I'm lazy.)
* Because log files shouldn't exist in some magical land separate from all my other files (e.g. off in SQL land while all of my other files are in CLI land; this can also be read as "CLI is what I reach for first".)
* Because I don't want to have to hunt down a database driver just to pick some things out of my logs from within my program.
* Or from the other side of the fence, because I don't want to have to hunt down a database driver to write some logs for my program.
* Because I want to pipe my results out to some other program (this is more a comment on most SQL query engines then a real limitation.)

because it lacks alot of the more advanced features a database can normally provide

And what "advanced" features would apply to a log? There's only 1 record type. CSV provides sufficient capabilities to handle that.

Consider wikipedia's CSV page:
CSV formats are best used to represent sets or sequences of records in which each record has an identical list of fields. This corresponds to a single relation in a relational database, or to data (though not calculations) in a typical spreadsheet.


Does this not sound exactly like what an entry in a log file is?

Reply Parent Score: 2