Why the Design of the Kernel Scheduler is Critical

Eugenia Loli 2003-04-18 Linux 17 Comments

“Two of the most critical parts of a kernel are the memory subsystem and the scheduler. This is because they influence the design and affect the performance of almost every other part of the kernel and the OS. That is also why you would want to get them absolutely right and optimize their performance. The Linux kernel is used right from small embedded devices, scaling up to large mainframes. Designing is scheduler is at best a black art.” Read the article at LinuxGazette.

About The Author

Eugenia Loli

Ex-programmer, ex-editor in chief at OSNews.com, now a visual artist/filmmaker.

Follow me on Twitter @EugeniaLoli

17 Comments

2003-04-18 2:59 am

Anonymous
So why has it been a few hours and no one has posted a comment about an interesting article like this?

Thanks for posting this Eugenia. I will comment again once I clear my head after work, and can figure out something to say.
2003-04-18 3:38 am

Anonymous
but not much to talk about. It would be interesting to see the kernel system have different pluggable schedulers.
2003-04-18 4:10 am

Anonymous
I have read meatier articles on the subject, I think, maybe interviews with Ingo and kernel ML discussions. I’d like to see how the FreeBSD scheduler, for instance, compares to the one in Linux. Any other approaches, in fact, say Windows, AIX, or VMS.

Scheduling doesn’t seem to have a one-size-fits-all solution, and perhaps we are getting to a point where the heuristics don’t have enough to go on if we want to keep improving. Perhaps processes and shells should be able to provide more hints beyond nice level to the kernel scheduler: am I definitely an interactive task? will I ever need to have the processor’s full attention? will I ever block on anything?

That being said, I read the article but not the source code. Maybe that will prove enlightening.
2003-04-18 5:08 am

Anonymous
While, I can’t attest to the feasibility of the solution, I definitely agree that a more dynamic scheduling mechanism would be a major achievement in terms of OS design. While it’s great that the user can specify a process’ priority level, a truly dynamic scheduler could drastically improve both a system’s perceived response time and its actual throughput.
2003-04-18 5:34 am

Anonymous
Actually I have a question about schedulers. I’ve noticed in Redhat 9 (2.4.20 I think but no doubt they’ve tweaked it) that the scheduler is pretty smooth but I’ve noticed a strange jerkiness (try for example running two “really slick screensavers” as windows for example, or ico). One will run for a fraction of a second, then the other. But not quite — neither will completely stop. Is this a scheduler artifact? I am hoping that the scheduler changes in 2.5 (and eventually 2.6) will make for smoother animations and multimedia, like BeOS’s, but don’t know enough about BeOS’s scheduler to know what aspect of it was responsible for that wonderful smooth behavior. Can someone here inform me?

Erik
2003-04-18 5:37 am

Anonymous
one of the toughest things to get right is good interactive feel during heavy system load. While playing with various scheduler variants i found that the best interactive feel is achieved not by ‘boosting’ interactive tasks, but by punishing’ tasks that want to use more CPU time than there

is available. This method is also much easier to do in an O(1) fashion.

Does this mean less dependence on nice.
2003-04-18 5:51 am

Anonymous
The thing is that some demands on the scheduler are mutually exclusive. Most noteably throughput and responsiveness don’t go easily together.

For our own OS project, the idea is to provide “scheduler modes” that cater for the distinct demands. While it’s pretty common to use an OS for a server on one machine (throughput) and for a desktop on another (responsiveness), it’s rather uncommon that the user wants the very same OS on the very same machine to be a server *and* a desktop at the same time…

As for being a black art… our German speaking guests here might want to check on the books “Systemsoftware” (Nehmer / Sturm) and “Betriebssysteme” (Brause), which cover the topic rather well. The English dudes will have to settle for Tanenbaum’s classic… 😉
2003-04-18 6:58 am

Anonymous
Does Linux’s task manager now support true LWPs, or are they still essentially shared memory HWPs (i.e. take up a process table entry)

I have a friend who was trying out RH9 with the NGPTL, and ps still lists every thread as having its own PID (and consequently its own process table entry)

So is it still implemented with _clone()?

I really know nothing of the NGPTL, except that it appears to be a patch to glibc. Does one need to do anything special to use it, such as link against something other than -lpthread?

Consequently it would seem that creating/destroying a thread in Linux still requires adding or removing process table entries…

I’d like to see how the FreeBSD scheduler, for instance, compares to the one in Linux.

Unfortunately this can’t be profiled (properly) yet, as FreeBSD hasn’t added support for KSEs to its POSIX threads library.
2003-04-18 11:02 am

Anonymous
well i don’t know how the scheduler works inside, but i’ve been using 2.5.67 for a while and i have to say that is muche responsive than the 2.4 series, no more mouse freeze in X while compiling…
2003-04-18 11:12 am

Anonymous
Could you imagine that ? The 2.6 kernel will be available to give some serious competition to the very first ATT labs Unix. The very good old time of PDP7 and PDP11, whith reader and puncher cards…

That certainly what the linux’s trolls meant when they “advertise”. Linux is the _only_ official OS that may warp straight to the past ;-)))

That said, nobody needs this feature for a day to day usage of a computer.
2003-04-18 3:19 pm

Anonymous
Hmm, I haven’t looked at the internals of the BeOS scheduler, but as far as I know BeOS makes heavy use of multithreaded programming. It depends how the threading package is set up (are threads implemented in kernel space or user space), but if threads are implemented in a user-space library then the userspace library could be written multimedia-aware to make things like playing movies much more responsive.

Threads also have the possibility of being less expensive than processes in terms of context switches — context switches to another process can be costly because you have to reload that process’ CPU registers, VM page tables, etc. So that might be part of the difference.
2003-04-18 4:25 pm

Anonymous
Since IBM is deep into Linux now, I wonder if they’ll be taking any lessons-learned from MVS performance management into this space. I haven’t worked on MVS in a long while but I remember that the performance management capabilities were very highly developed. For example, it was possible to give more or less CPU to specific users and even for specific workload types (batch or interactive) for that user. Tweaking PM configuration parameters was complex enough that in some cases it was a full time position.
2003-04-18 5:39 pm

Anonymous
Yeah it’s interesting and all but you would have thought they would proof read it before up’ing it.
2003-04-18 9:19 pm

Anonymous
Since IBM is deep into Linux now, I wonder if they’ll be taking any lessons-learned from MVS performance management into this space.

Undoubtly they have done it. But your mind seems to be on some technical point, while IBM have translated this in a little bit more sales view.

I haven’t worked on MVS in a long while

So, to be up to date, please speak about z/OS, or at least os/390. don’t worry, they are only MVS renamed ;-))) Well, and enhanced. In particular, VM is now integrated.

Here is the strong point.

As for Java, a hard ressources eater, IBM really have a great passion for Linux. Another b*** if you only consider the performance point of view. If this is not clear, you need very good hardware and very good system ingineers to make those stuff working. Well, yes, IBM mainly sells hardware and consultancy, but believe me, it’s only chance ;-)))

So now, IBM have two ways to improve the poor Linux on the z/machines :

1/ improve the Linux kernel ( this poor thing whitout even a decent threading capability )

2/ make running at the same time a lot of Linux instances on one machine, to have the effect of a big cluster whitout the hassle of the cluster networking. I beg your pardon ? Yes z/OS is MVS with the VM hypervisor integrated. Really this is chance…

But if you are a true gullible, you may wait for some tweak on the Linux kernel ;-)))
2003-04-18 10:05 pm

Anonymous
1/ improve the Linux kernel ( this poor thing whitout even a decent threading capability )

This is pretty much what I was referring to. Just looking at the article, it occurred to me that if this is state-of-the-art for scheduling in Linux, it’s miles behind what IBM already had in MVS more than a dozen years ago. IBM took performance very seriously because that was the only way to make big iron work big back then. They probably have lots of algorithms and methodologies about things process and I/O queueing, memory management, etc. that could be applied to Linux.

BTW, Xavier, have you worked on a mainframe recently? Would you know what I’d be referring to by SRM and RMF? I don’t even know if they’re still called that now.
2003-04-18 11:01 pm

Anonymous
This is pretty much what I was referring to. Just looking at the article, it occurred to me that if this is state-of-the-art for scheduling in Linux, it’s miles behind what IBM already had in MVS more than a dozen years ago. IBM took performance very seriously because that was the only way to make big iron work big back then.

Right

They probably have lots of algorithms and methodologies about things process and I/O queueing, memory management, etc. that could be applied to Linux.

This could be interesting, no doubt on this matter. One weakness : compatibility. Mainframe IBM hardware is not really affordable, and even less standard.

The other weakness of this love affair is the interest for IBM to give something _really_ usefull against their own business.

BTW, Xavier, have you worked on a mainframe recently?

Four years ago. Migrate from SNA to native IP. Not really recent.

Would you know what I’d be referring to by SRM

WLM/SRM a workload manager ( System Ressource Manager), or SRM a Storage Ressource Manager. Which one do you prefer ? ;-)))

and RMF?

A tool to visualize and operate on performance parameter.

But Linux have a very strong tool for performance too, it’s SHUTDOWN : release immediataly 100% of the workload ;-)))
2003-04-18 11:42 pm

Anonymous
And why not, indeed? I’m sure SMP/E could give RPM a run for the money.

This could be interesting, no doubt on this matter. One weakness : compatibility. Mainframe IBM hardware is not really affordable, and even less standard.

I would think the performance management techniques and algorithms would be applicable. For example, the use of queueing theory principles (which the SRM did use) for scheduling would probably be applicable to managing workloads, in general.

The other weakness of this love affair is the interest for IBM to give something _really_ usefull against their own business.

It wouldn’t necessarily be against their business. There could be ways to package the technology so that it would favor IBM products and services. Some of the capabilities of MVS and VM come from the IBM hardware and having Linux avail itself of them as well can’t hurt. On the other hand, imagine if Win2K Server suddenly had SRM via Linux. That would definitely hurt!