Linked by Thom Holwerda on Sat 11th May 2013 21:41 UTC
Windows "Windows is indeed slower than other operating systems in many scenarios, and the gap is worsening." That's one way to start an insider explanation of why Windows' performance isn't up to snuff. Written by someone who actually contributes code to the Windows NT kernel, the comment on Hacker News, later deleted but reposted with permission on Marc Bevand's blog, paints a very dreary picture of the state of Windows development. The root issue? Think of how Linux is developed, and you'll know the answer.
Permalink for comment 561881
To read all comments associated with this story, please click here.
RE[23]: Too funny
by Alfman on Thu 16th May 2013 20:25 UTC in reply to "RE[22]: Too funny"
Member since:


"There's always time to discuss, even if the article gets buried. Just go to your comments and access the thread from there."

If I'm not mistaken, the article will become locked in a few hours.

"So would you have 3 length prefixes? Table, record, field? Or would you have table be a field of the record (allowing arbitrary records in 1 stream)."

Just for records and fields.

"These are issues with arbitrary CSV-like formats."

There's no way to avoid the quoting problem with CSV though without using proprietary conventions for it. For example, I've seen datafeeds that have substituted "[NEWLINE]" for newline characters just to simplify the record parsing issues. I wonder what this developer intended to be used to convey text actually containing "[NEWLINE]"? Haha.

This is why the XML character escaping is better, special characters like < and > don't contain themselves when they're escaped ( & lt ; & gt ; ). This gives the high level XML parser the freedom to parse the XML structure without regards to false positives of these symbols showing up in the data, which simplifies it tremendously.

"You keep saying that a 'fancy state machine' is necessary, but XML requires 1 too. XML has quotes that need escaping so you still need a state machine to parse XML."

Take another look at the example I gave and see that you can trivially find the matching "<" and ">" without any regards to the text contained within BECAUSE "<" and ">" are always escaped and NEVER show up in the text. EVERY SINGLE occurrence of these symbols in XML is structural. Once you find "<", you can automatically do indexof(">") to get the matching closing tag, no exceptions to the rule. You cannot reliably do this with CSV because it depends on context.

"I'd use a DSV with ASCII 31 (Unit Separator) as my delimiter, since it's a control character it has no business being in a field so it can simply be banned. Newlines can be banned, as they're a printing issue (appropriate escapes can be passed as a flag to whatever is printing.) Which leaves us with no state machine necessary."

This is a bit shortsighted though. You cannot just tell programmers their variables cannot contain certain bytes like newlines. This is shifting the escaping problem to them based on implementation details they shouldn't have to worry about. This isn't very relevant to what I wanted to be discussing in the first place, and I have to get going, so if your still not convinced, I guess we may have to leave it there.

Reply Parent Score: 2