Linked by Thom Holwerda on Mon 26th Feb 2018 18:13 UTC
Windows

Flaky failures are the worst. In this particular investigation, which spanned twenty months, we suspected hardware failure, compiler bugs, linker bugs, and other possibilities. Jumping too quickly to blaming hardware or build tools is a classic mistake, but in this case the mistake was that we weren’t thinking big enough. Yes, there was a linker bug, but we were also lucky enough to have hit a Windows kernel bug which is triggered by linkers!

Thread beginning with comment 654161
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE: NT is still garbage
by avgalen on Tue 27th Feb 2018 10:29 UTC in reply to "NT is still garbage"
avgalen
Member since:
2010-09-23

> Building Chrome very quickly causes CcmExec.exe to leak process handles.

What the actual f--k? This would be considered unacceptable on Haiku, let alone Linux or OpenBSD.

This is unacceptable on Windows as well, but it is not something an OS/Kernel should be bothered with. Building Chrome is a usermode process. If that process causes other programs to leak resources that is another usermode process. Usermode processes are configured by default to consume as many resources as are available. In the end that means that one usermode process can consume allmost all resources, which is what you would want in all normal scenarios (no leaks).
As long as the OS can still control those usermode processes the OS is working perfectly.

(ccmexec.exe isn't even present on systems by default, it is a tool that enterprises use to monitor their systems for updates)

The underlying bug is that if a program writes a PE file (EXE or DLL) using memory mapped file I/O and if that program is then immediately executed (or loaded with LoadLibrary or LoadLibraryEx), and if the system is under very heavy disk I/O load, then a necessary file-buffer flush may fail. This is very rare and can realistically only happen on build machines, and even then only on monster 24-core machines like I use.

Well, why wasn't there a unittest for this exact scenario? /s

Edited 2018-02-27 10:42 UTC

Reply Parent Score: 5

RE[2]: NT is still garbage
by tidux on Tue 27th Feb 2018 17:12 in reply to "RE: NT is still garbage"
tidux Member since:
2011-08-13

> Well, why wasn't there a unittest for this exact scenario? /s

You don't need a unit test, just a system design that doesn't constantly thrash disk like a retard. This architecturally can not happen on Linux.

Reply Parent Score: -1

RE[3]: NT is still garbage
by zlynx on Tue 27th Feb 2018 18:33 in reply to "RE[2]: NT is still garbage"
zlynx Member since:
2005-07-20

Pretty confident about Linux there, aren't you?

Haven't you read about or experienced Linux's enjoyable bugs with O_DIRECT, and mixing memory mapped with read() / write() IO? I think I recall some bugs with Linux AIO io_submit() too.

Sure, those were fixed. But at one point in time there were inconsistent views of IO, just like what this Windows bug sounds like.

Ooh, while Googling about I found another one about transparent huge pages and O_DIRECT causing screw-ups in Linux.

I like Linux, but don't put it on a pedestal.

Reply Parent Score: 6

RE[3]: NT is still garbage
by avgalen on Tue 27th Feb 2018 19:42 in reply to "RE[2]: NT is still garbage"
avgalen Member since:
2010-09-23

You don't need a unit test

Maybe you didn't know it, but /s indicates sarcasm. Of course there wasn't a unit test for it because the circumstances are way to extreme for a unittest.

just a system design that doesn't constantly thrash disk like a retard.

It isn't a system design that trashes the disk like a retard. The guy is compiling Chrome which normally trashes the entire system (under Linux as well). The mentioned bug has the specifics that "if the system is under very heavy disk I/O load"

This architecturally can not happen on Linux.

Of course it can. There is nothing in the architecture of Linux that prevents 1 usermode process from taking up almost all the systems resources, effectively blocking a 2nd usermode process from performing well. Just like under Windows this is the normal behavior and as long as the OS is still capable of controlling both usermode processes they will both continue to run and do their work. Now there are certainly differences in how cpu/mem/io/caches are allocated but those differences cannot guarantee that both programs will get enough resources.

(here is a nice, although dated, architectural comparison with some scheduler characteristics: https://www.ukessays.com/essays/information-systems/compare-cpu-sche...)

Reply Parent Score: 4