Linked by Thom Holwerda on Mon 10th Jul 2017 18:27 UTC
Windows

This story begins, as they so often do, when I noticed that my machine was behaving poorly. My Windows 10 work machine has 24 cores (48 hyper-threads) and they were 50% idle. It has 64 GB of RAM and that was less than half used. It has a fast SSD that was mostly idle. And yet, as I moved the mouse around it kept hitching - sometimes locking up for seconds at a time.

So I did what I always do - I grabbed an ETW trace and analyzed it. The result was the discovery of a serious process-destruction performance bug in Windows 10.

Great story.

Thread beginning with comment 646553
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[2]: just terrible...
by Carewolf on Mon 10th Jul 2017 19:52 UTC in reply to "RE: just terrible..."
Carewolf
Member since:
2005-09-08

I often run make -j100 on my Linux machine. The compile jobs are distributed, but it still creates even more process than the -j48 that they had in this example. Windows just had a nasty performance regression in process teardown here.

Reply Parent Score: 6

RE[3]: just terrible...
by tidux on Tue 11th Jul 2017 04:48 in reply to "RE[2]: just terrible..."
tidux Member since:
2011-08-13

Windows's process spawn/destroy has been stupidly expensive for decades. That's what led to "shove everything into as few processes as possible" as the default model for complex Windows applications, which in turn degrades the value of memory protection and makes security that much harder to practice. The Unix tradition of lots of tiny processes and cheap IPC is simply a better fit for SMP, since 500 single threaded processes will happily scale out to dozens of CPU cores without their developers needing to make them "multicore aware."

Reply Parent Score: 6

RE[4]: just terrible...
by Megol on Tue 11th Jul 2017 11:44 in reply to "RE[3]: just terrible..."
Megol Member since:
2011-04-11

Windows's process spawn/destroy has been stupidly expensive for decades. That's what led to "shove everything into as few processes as possible" as the default model for complex Windows applications, which in turn degrades the value of memory protection and makes security that much harder to practice. The Unix tradition of lots of tiny processes and cheap IPC is simply a better fit for SMP, since 500 single threaded processes will happily scale out to dozens of CPU cores without their developers needing to make them "multicore aware."


What? There's no Unix tradition like that you speak of. And Unix (eee.. Posix) IPC being cheap? Only if we talk of the poorly designed interface - there's a reason high-performance IPC are usually done with custom, non-standard libraries.

Windows processes are more heavy-weight than e.g. Linux processes. That's not the reason threads are used on Posix systems. That's not the reason threads scale better than processes on any (reasonable) operating system. The protection provided in a process will cost more than operating system supported threads and it will cost even more than application specific threading.
--
It would benefit us all if you read up on operating system design before posting bullshit. I'm f***ing fed up in opinionated people posting things that are obviously wrong for anyone knowing a tiny bit about the area. Worse than me being irritated is that people may believe what is essentially lies.

Reply Parent Score: 2

RE[4]: just terrible...
by Alfman on Tue 11th Jul 2017 17:04 in reply to "RE[3]: just terrible..."
Alfman Member since:
2011-01-28

tidux,

The Unix tradition of lots of tiny processes and cheap IPC is simply a better fit for SMP, since 500 single threaded processes will happily scale out to dozens of CPU cores without their developers needing to make them "multicore aware."


While it's true you can deserialize code paths across many cores using many processes, it's also true that trivially forking child processes scales very poorly due to the overhead.

Take a look at apache's MPM workers, in practice the overhead of spawning processes is a performance killer. This performance overhead leads to the need to prefork process before they're needed, caching processes, and reusing them instead of giving each client a clean process. It works, but it also adds complexity that proponents of the model often overlook when considering the multi-process approach.

Also, it still doesn't scale that well and the memory overhead can be especially onerous. On the servers I manage with apache, I've had to significantly cut down the number of processes apache is allowed to spawn to prevent the linux OOM killer and swap thrashing.

You can always buy a beefier server, but switching to an asynchronous daemon can be just as effective by simultaneously handling hundreds of clients per core without any of the overhead of hundreds of processes.

Reply Parent Score: 2