Linked by Thom Holwerda on Sat 11th May 2013 21:41 UTC
Windows "Windows is indeed slower than other operating systems in many scenarios, and the gap is worsening." That's one way to start an insider explanation of why Windows' performance isn't up to snuff. Written by someone who actually contributes code to the Windows NT kernel, the comment on Hacker News, later deleted but reposted with permission on Marc Bevand's blog, paints a very dreary picture of the state of Windows development. The root issue? Think of how Linux is developed, and you'll know the answer.
Thread beginning with comment 561685
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[18]: Too funny
by satsujinka on Wed 15th May 2013 22:09 UTC in reply to "RE[17]: Too funny"
satsujinka
Member since:
2010-03-11

Okay, so moving out of the logging topic to databases in general as a organizational system for an operating system.

About CSV vs. XML: Considering Golang's csv and xml packages: both of the 2 go files for CSV have a combined size smaller than xml's 3 of the 4 go files for xml (the 4th is approximately the same size as csv's reader.go). To me this implies that CSV doesn't have any escaping issues that are particularly harder to solve then XML or JSON (JSON actually has the most code dedicated to it.)

Of course, part of this is that csv provides the smallest feature set. However, comparing similar functionality leads to the same conclusion.

As for metadata: you have to provide a schema no matter what data format you choose. XML isn't better in this regard; usually you match tag to column name. CSV has a similar rule: match on column position. I know in relational theory the columns are unordered; but in practice the columns are created/displayed with an order; just use that. Optionally, you can write a schema to do the matching. This is actually a better situation then XML which requires a schema all the time (what do we do with nesting? I can think of 3 reasonable behaviors off the top of my head.)

---

I think all programs should be using a structured library interface directly without bothering with the text conversion at all. It could be a library call similar to printf, but it would have to be capable of maintaining field metadata. This call would not output text (optionally it could for interactive debugging), instead it would transmit the log record to the system logger.

I'm not opposed to this in principle. However, I fear figuring out what this "printf"'s interface should be will not be so simple. Does it expect meta-data co-mingled with data? Does it take a schema as a parameter? Isn't "%s:%d" a schema already (one whose fields are anonymous, but paired with scanf, you can write and retrieve records with it)? Also, what should we use for a schema format? Or should we just choose to support as many as possible?

The vision was not just for logging, but actually to replace all sorts of text I/O streams with data tupples.

What would these data tuples look like? You'll need some data to mark where these tuples begin and end, their fields, and their relation. End can double as begin, so only 3 symbols are necessary (but the smallest binary that can hold 3 is 2 bits so you may as well use 4.) If you omit table separators, then you need to include a relation field.

With 4 symbols:
Bin Txt Description
00 - ( - tuple start
01 - , - field break
10 - ) - tuple end
11 - ; - table end

Ex.
(Id,First Name,Last Name)(001,John,Doe)(002,Mary,Jane);

With 3 symbols:
00 - , - field break
01 - \n - tuple end
10 - \d - table end

Ex.
Id,First Name,Last Name
001,John,Doe
002,Mary,Jane
\d
... Hey, wait a minute that's CSV! ;)

With 2 symbols:
0 - , - field break
1 - ; - tuple break

Ex.
Person,Id,First Name,Last Name;
Person,001,John,Doe;
Person,001,Mary,Jane;

Just to make it clear: I too want a standard format that everything uses. It's just that saying "use tuples" ignores the fact that we still have to parse information out of our inputs in order to do anything. You do go on to say "redesign bash to handle this". I assume you also mean "provide a library that has multiplexed stdin/stdout" as you also have to write and read from an arbitrary number of stdin/stdouts (as corresponds to the number of fields.) Alternately, you could shift to use a byte code powered shell (so that all programs use the same representations as the shell and can simply copy their memory structures to the shell.)

Reply Parent Score: 2

RE[19]: Too funny
by Alfman on Thu 16th May 2013 00:57 in reply to "RE[18]: Too funny"
Alfman Member since:
2011-01-28

satsujinka,

"About CSV vs. XML: Considering Golang's csv and xml packages: both of the 2 go files for CSV have a combined size smaller than xml's 3 of the 4 go files for xml (the 4th is approximately the same size as csv's reader.go). To me this implies that CSV doesn't have any escaping issues that are particularly harder to solve then XML or JSON (JSON actually has the most code dedicated to it.)"

This methodology really isn't sound, but I don't really want to get into it.


"As for metadata: you have to provide a schema no matter what data format you choose. XML isn't better in this regard; usually you match tag to column name. CSV has a similar rule: match on column position."

XML provides self-defining metadata (the names and possibly other attributes), where as CSV does not. It's illogical to me for you to disagree, but let's just move on.


"However, I fear figuring out what this 'printf's interface should be will not be so simple."

It doesn't really matter for the purpose of this discussion, whatever makes it easiest in the context of the language the library is being written for.


"What would these data tuples look like? You'll need some data to mark where these tuples begin and end, their fields, and their relation. End can double as begin, so only 3 symbols are necessary (but the smallest binary that can hold 3 is 2 bits so you may as well use 4.) If you omit table separators, then you need to include a relation field."

Your still thinking in terms of text with delimitors, but the whole idea behind the tuples would be to use a higher level abstraction. Think about how a class implements an interface, you don't have to know how a class is implemented to use the interface.


"It's just that saying 'use tuples' ignores the fact that we still have to parse information out of our inputs in order to do anything."

No, you as a programmer would be using the higher level abstraction of the tupple without caring about the mechanics used to implement them. You keep thinking in terms of programs parsing text streams, but with the tupple abstraction you can skip the intermediary text conversions entirely. You only need code to convert the tupple to text at the point where text is the desired form of output like in the shell or a logfile. I'm not sure your understanding this point.

Reply Parent Score: 2

RE[20]: Too funny
by satsujinka on Thu 16th May 2013 06:08 in reply to "RE[19]: Too funny"
satsujinka Member since:
2010-03-11

No, you as a programmer would be using the higher level abstraction of the tupple without caring about the mechanics used to implement them. You keep thinking in terms of programs parsing text streams, but with the tupple abstraction you can skip the intermediary text conversions entirely. You only need code to convert the tupple to text at the point where text is the desired form of output like in the shell or a logfile. I'm not sure your understanding this point.

Going back to your interface/class metaphor. I'm not the guy using the interface. I'm the guy writing the interpreter/compiler/VM that dispatches to the correct class on a particular method call from an interface. In which case, I do care how things are implemented, because I want to implement them!

As it stands, there is no hardware notion of a tuple. We just have data streams. So we either have to delimit the data or we have to multiplex the stream. If there are other options then please let me know, but "use higher abstraction" is not a means to implement a system.

If you're not interested in discussing how to implement a relational I/O stream abstraction (which we already both agree would be nice,) I guess there's really nothing else to talk about.

Moving back, my methodology is perfectly sound. I was not trying to show CSV is easier to parse. I was disproving your claim that CSV is particularly hard to parse. The fact that 2 common formats (1 of which you recommended) take more work to parse then CSV, in any general purpose language, is a sound disproof of your claim.

Next:
XML does not and cannot provide self-defining metadata. Consider your example of XML providing names. What do metadata do those names provide? What metadata does <contains>...</contains> provide? That something is contained? What if it's a type declaration? (x contains this type) In order to answer the question of what metadata is present, we need some context and if we need a context; then by definition our metadata is not self-defined. This is true of all data formats. Within a context it does make sense to say that some data has a meaning, but outside of it? No.

So back to what I originally said: XML is matched on name and CSV is matched on position. This is how we determine meaning for these two formats. Metadata follows this rule too. In XML we specify metadata with a known name (contains?) and in CSV we specify metadata with a known position (2nd?)

Reply Parent Score: 2