Post a Comment
While it's a recent document, it remembers to those "what is new in 2.6" documents and it's missing a couple of feaures, ej. these days we have "Scheduler domains" (http://lwn.net/Articles/80911/) (which allows to take avdanzed optimized scheduling decisions ej: optimized the scheduler for multi-core core CPUs, or for power saving policies, and 2.6.18 will get a implementation of "smpnice" (http://lwn.net/Articles/186438/)
I believe the problems you described are not related with the Linux scheduler itself, but with the X Window System, which was designed to feature network transparency: the machine where application programs run can differ from the user's local machine.
This obviously introduces some added complexity and overhead, and in its current state is not as responsive as, let's say, Windows, whose "Window System" is integrated within the operating system itself.
I'd expect major improvements in this area in the next few years (months?), though. Now that Xorg has been modularized and eye candy and desktop acceleration (XGL/AIGLX) is getting mainstream, Xorg is due to get a major revamp. Afterall, its development rate hasn't been so spectacular probably since it was first created back in the 80's.
Edited 2006-07-05 18:47
You are correct in stating that X features network transparency. You are misinformed in that it does not introduce much overhead. X on the local machine communicates over a TCP socket which is blazing fast.
I think the user was letting loose a general gripe about "foo window toolkit is slow so that means X is slow."
Xorg has already "gotten" a major revamp and it was called X.org 7.0 which is fully modularized. This hard work to modularize X will allow developers to work on individual modules and make releases independent of the rest of X. This will speed up X development signifigantly. X.org 7.1 integrated AIGLX so exciting new eye candy will now be available to many people.
You are misinformed in that it does not introduce much overhead. X on the local machine communicates over a TCP socket which is blazing fast.
Actually, on systems that support it (which is virtually everything these days, except maybe something like BSD/OS), X uses shared memory instead of TCP sockets. Not really a nitpick, as local TCP communications still is pretty slow compared to well, almost anything else (local socket, e.g. /var/run/foo, or shared mem).
Toolkits can indeed be an issue. Any gtk+ app (the latest Inkscape build comes to mind) feels sluggish on my work XP computer, whereas fluxbox is blazingly fast on my home Arch box. Granted, I should probably check for an update to gtk for windows. But then again, I guess as long as it's not as easy as 'pacman -Syu', I won't bother...
But I believe the parent post also had issues with application start time. I don't think the GHz-count matters much when dealing with multi-MB web browsers, but a faster hard drive (think raptor, or SCSI if you're really serious about that) helps a lot.
GatoLoko, what are you talking about? Inkscape uses gtkmm. In case you didn't know, gtkmm is c++ bindings to gtk, not qt. Inkscape is a gnome app if you want to go by which toolkit it uses. If you don't believe me, look at the gtkmm code yourself:
http://inkscape.cvs.sourceforge.net/inkscape/inkscape_project/
Close, but no cigar. The speed issues aren't the result of complexity due to network transparency. They are due to two things:
1) Lack of proper alpha-blit acceleration for anti-aliased text;
2) The use of a seperate process for window management.
(1) is a driver-quality issue, and is not a problem on drivers like NVIDIA's that do at least try to accelerate special-cases of RENDER. (2) is where the complexity factors in. The communication between the WM and the X server is both highly elaborate, and prone to performance problems. For example, when the user resizes a window, you get a series of context switches between the WM, the X server, and the client app.
There is also the fact that Windows just plays a lot of shady tricks with scheduling to artificially boost response time. It gives the foreground window special priveleges just for being in front. These increase the feeling of responsiveness without actually making anything faster, but also make the system much less predictable under load.
There is also the fact that Windows just plays a lot of shady tricks with scheduling to artificially boost response time. It gives the foreground window special priveleges just for being in front. These increase the feeling of responsiveness without actually making anything faster, but also make the system much less predictable under load.
I wouldn't call it shady tricks.
windows is a desktop os.
and as such, optimising for the active application is a valid methode. If you don't like it, just disable this feature.
It is a shady trick, from a programming point of view. Just priority boosting the foreground process isn't a very precise optimization. It might make things smoother when things are going right, but it's the reason Windows grinds to a halt when the foreground process does something stupid. Today at work, Visual Studio went off the deep end while debugging an application, and the system went unresponsive for a minute or two before I could get to the task manager to shut it down. Linux or OS X never do that.
Linux or OS X never do that.
I wish. When Gnome in SLED locked up because 2 different apps were accessing a dir with 3k files over USB1, and Gnome became totally unresponsive for a minute or 2, I thought: man, Linux' thread scheduler sucks ass.
But apparently that's X' fault.
Nope, that's the Gnome programmers' fault.
Running through a directory with 3000 files is fast if you're only getting their names, but can become slow if you are also reading their ACLs, translating their UGIDs into names, and trying to identify their mime-types on top of that.
GUI apps *cannot* run this sort of tasks on UI callbacks, because that causes the UI to wait until the task is completed, thus becoming unresponsive. This is something that should go into another thread, while the main thread is free to continue responding to input events (button clicks and the like).
Did anything else besides Gnome locked up? Did X lock up? Could you switch to a text console or log into the machine through SSH? If so, it isn't the kernel's fault.
...has a significant lag when opening new applications.
scheduler really wouldn't have much to do with this- what you're complaining about is likely IO, initial load up of libs not in memory.
DE's and their apps have a fair amount of linkage- lib loadup (first time for 'em at least) is a nice hit waiting for the disk.
That said... without specifics (app, reproducing, etc), hard to comment (although again, scheduler likely has no real relevance to your beef).
As people already stated, the fault is not with the Linux kernel. The things you describe might be because of X, disabled DMA on disks, poor drivers and so on and so forth. Especially enabling DMA gives a nice overall boost to everything if it isn't enabled. Though, I don't know any recent distro which doesn't enable DMA by default..
A real time path is available from Ingo who wrote the 0(1) scheduler.
http://people.redhat.com/mingo/
It is being merged piece meal into the Linux kernel over a period of time. The patch set is already under heavy use though and rapid improvement is being made.
dr Con Kolivas' [1] Staircase scheduler [2] would deserve a bit more attention. It is an alternative to the build-in scheduler, with a different and much cleaner design (and less code) focussed on better (desktop) interactivity without less throughput. Development started in 2004, and it has been stable since some time. All benchmarks show it better or equal to the current scheduler in 2.6.
It has been in Andrew Morton's patchset for some time, but the kernel maintainers seem to rather patch their current scheduler with interactivity instead of using the inherently interactive staircase design.
[1] http://members.optusnet.com.au/ckolivas/kernel/
[2] http://kerneltrap.org/node/2744
The staircase scheduler is not in Andrew Morton's
patchset and (AFAIK) never has been.
What do you mean by inherently interactive? The
current scheduler looks to me like it tries to be
inherently interactive as well.
It just doesn't seem to be that big a deal any more.
There are lots of cool schedulers out there, but it
is quite likely that they would end up causing more
people problems than the current one, which has gone
through a huge amount of testing, if nothing else.
it sure has been in -mm (more than a year ago), but went away as it didn't seem it would get into the kernel.
inherently interactive means the interactivity is not a hack (the changing priorities) but is in the design, read about it - it's much cleaner.
The current scheduler is quite big and bloathed, and needs regular new hacks to cope with other usage scenario's. staircase needed way less tuning to accomodate to a wider range of use, and thus is much smaller. while still leading to a percievably better interactivity and equal or better throughput.
i think if it wasn't for 'political' instead of technical reasons, staircase whould've been in the linux kernel for quite some time.
it sure has been in -mm (more than a year ago),
but went away as it didn't seem it would get into the
kernel.
OK, but it isn't now, which was my point.
inherently interactive means the interactivity is not
a hack (the changing priorities) but is in the design,
read about it - it's much cleaner.
I've read not only about it, but the code itself. I
think the current scheduler is pretty ugly, but it is
still in the design (from the very beginning) to be
interactive.
The current scheduler is quite big and bloathed, and
needs regular new hacks to cope with other usage
scenario's. staircase needed way less tuning to
accomodate to a wider range of use, and thus is much
smaller. while still leading to a percievably better
interactivity and equal or better throughput.
Firstly, I don't think you can conclude anything about
interactivity based on the sample sizes and types of
tests in place.
Secondly, the current scheduler doesn't need regular
new hacks. It has probably changed no more frequently
over the past year than the staircase scheduler.
i think if it wasn't for 'political' instead of technical reasons, staircase whould've been in the linux kernel for quite some time.
There doesn't seem to be a technical reason to include
it. The kernel development model doesn't specify that
you include something unless it is proven bad. You
must include something only if it has been proven good.
>The staircase scheduler is not in Andrew Morton's
>patchset and (AFAIK) never has been.
>OK, but it isn't now, which was my point.
well, you said it never has been, so...
>There doesn't seem to be a technical reason to include
>it. The kernel development model doesn't specify that
>you include something unless it is proven bad. You
>must include something only if it has been proven good.
well, there aren't many interactivity tests, but staircase is better on those. and throughput, as i said, isn't worse. so tests say its better or even.
second, it does this with a smaller codebase, and as far as i can tell, cleaner design.
two good technical reasons to get it in, right?
>>OK, but it isn't now, which was my point.
>
>well, you said it never has been, so...
I said it is not and as far as I knew it hasn't been.
Which was correct at the time of writing.
Sorry, now I see I misread your post as saying that it
is *presently* in Andrew Morton's tree, which was what I
was trying to correct.
Anyway it doesn't matter.
>well, there aren't many interactivity tests, but
>staircase is better on those. and throughput, as i
>said, isn't worse. so tests say its better or
>even. second, it does this with a smaller codebase,
>and as far as i can tell, cleaner design.
Interactivity benchmarks don't aren't really
representative as far as I'm concerned. Before 2.6
was released, the scheduler was apparently awesome
according to all metrics and the types of workloads
that kernel developoers' workstations had. When it
actually started getting wide use, obscure problems
surfaced.
Same with throughput. It is hard to say that some
change doesn't hurt throughput if all those people
using 10GbE networking, or 500 CPU Altixes, or
10,000 disk enterprise databases haven't tested it.
>
>two good technical reasons to get it in, right?
I'm not say it is worse, nor that it isn't better.
But "let's put this huge change in, which doesn't
specifically address a bug, but hopefully isn't any
worse" isn't enough.
It is hard to judge a "clean design". Aside from the
mechanism (in which the two are pretty similar), the
whole thing is a big set of heuristics. One thing may
sound better than another, but that doesn't translate
to real world performance.
now i wouldn't say staircase should go in immediately, but i think (as i said) it deserves a bit more attention. it might not be a revolutionairy enhancement, but it definately seems better in the area of interactivity, and the smaller codesize might definately make it easier to maintain and hack on. as for the point that problems may surface, well, it has been in testing for some time, so i think that won't be a very big problem.
but indeed, it hasn't been tested much on very big and heavy stuff... well, a volunteer would help.
and again, i think it just deserves some attention.
If these new features in the Linux kernel is really "such a huge improvement", it also implies how crappy the earlier implementation was. Just think how many of these modules have been repeatedly rewritten in the last few years and boasted as the next big thing each time.
Even though Linux keeps getting new features none of them is the ultimate solution to the problem (such as ZFS of Solaris, scheduler activations of NetBSD, etc.)
So why bother? No reason to get excited. It'll probably be replaced in the next kernel version.
Your post is completely wrong, on every single count.
First, the O(1) scheduler is not new. The basic design was created in 2002. It'll be around in the next kernel version.
Second, the fact that its a huge improvement on the 2.4 scheduler does not imply the previous one was crappy. The previous one was just fine --- for systems with a moderate number of server tasks. Which is fine, because that was most of Linux's market in the late 1990s. Where the big improvement of the new scheduler comes in is on loads with thousands of threads, on highly parallel machines, or with interactive requiremen
Third, you obviously have no idea how software development works. There is no "ultimate solution" to any non-trivial problem, just various solutions with various properties. Remember Solaris's much-vaunted M:N scheduler based scheduler activations? Remember how it was quitely replaced by a 1:1 model similar to Linux's? How about Solaris's FireEngine TCP stack? Sun bill's it as a huge improvement, does that mean that the TCP stack in previous versions of Solaris is crappy?
Gradual improvement is what happens with software development. Different solutions have different performance, and experience bears out which solutions are best-suited for the loads your software encounters. Very often, earlier in the life of a piece of software, a simpler solution is implemented in order to speed time-to-market. As the software matures, pieces are replaced as experience shows such changes to be necessary.
Edited 2006-07-05 23:45
First, the O(1) scheduler is not new. The basic design was created in 2002. It'll be around in the next kernel version.
Actually it's been around since the 60s.
Second, the fact that its a huge improvement on the 2.4 scheduler does not imply the previous one was crappy.
In this instance it does. The O(1) scheduler is a perfect example of the way Linux developers come at problems without prior knowledge. The so called O(n) scheduler was a poor choice for any process scheduler, which is why you don't find it in commercial systems more recent than, oh, 1965.
This is a good example of why after more a decade of development with more than six thousand people involved the Linux kernel is barely equal to systems that are twenty years older, but with a lot fewer development hours: too little learning from the past, too much 'fix it when it breaks.'
Actually it's been around since the 60s.
Which scheduler I was referring to should be obvious based on the title of the article..
In this instance it does. The O(1) scheduler is a perfect example of the way Linux developers come at problems without prior knowledge.
Of course they don't approach it prior knowledge. This is the first system they've built. It's not like they've got a bunch of kernels under their belt. On the other hand, code complexity is also an issue. For most tasks Linux was used for in the 1990s, an O(1) scheduler wouldn't be appreciably faster, and would've increased complexity a lot.
The so called O(n) scheduler was a poor choice for any process scheduler, which is why you don't find it in commercial systems more recent than, oh, 1965.
And all those systems running with the 4.4BSD scheduler don't count as "commercial"?
This is a good example of why after more a decade of development with more than six thousand people involved the Linux kernel is barely equal to systems that are twenty years older, but with a lot fewer development hours:
That's not really a fair characterization. Yes, there are thousands of people involved with Linux, but the core team doesn't contain nearly that many people. Also, many of those people aren't full-time paid employees.
Oh, and Linux is about an order of magnitude bigger in scope than any of the older systems you're talking about. All those people working on Linux are working on different pieces. There's people doing drivers for all sorts of systems, people working on optimizing it for desktops, servers, supercomputers, PDAs, and embedded applications.
Solaris, IRIX, AIX, all are pretty much only good for servers. Load up Nextenta on an x86 box sometime. It's interactive scheduling performance is at the level of Linux 2.4...
too little learning from the past, too much 'fix it when it breaks.'
Another way to see it is "don't fix what isn't broken". Or "don't fix it until you need to." Or the classical, "premature optimization is the root of all evil."
"Actually it's been around since the 60s."
Which scheduler I was referring to should be obvious based on the title of the article..
Yes. It's Ingo's "O(1)" scheduler, which is identical to one I first saw described in DEC documentation for RSX11 back in 73.
Of course they don't approach it prior knowledge. This is the first system they've built. It's not like they've got a bunch of kernels under their belt.
What's wrong with reading and understanding the literature before you start reinventing the wheel?
And all those systems running with the 4.4BSD scheduler don't count as "commercial"?
4.4BSD did not use an O(n) scheduler. It used an O(C) scheduler, where C was then number of priority queues. Since C is constant at compile time, it was actualy an O(1) scheduler in Ingo's sense.
That's not really a fair characterization. Yes, there are thousands of people involved with Linux, but the core team doesn't contain nearly that many people. Also, many of those people aren't full-time paid employees.
There is no "core team." There are more than 1000 people paid full time to work on the Linux kernel. Check the professional attendence at OLS.
Or the classical, "premature optimization is the root of all evil."
Sigh. Selecting a constant time algorithm over an O(n) algorithm when the constant time algorithm is more than 20 years old and no more complex to implement is not "premature optimization".
And waiting 10 years to get around to fixing the O(n) algorithm is no less than 'fix it when it breaks.' And anyone who has seriously followed LKML for the last 6 years knows that 'wait until it breaks' is the first rule of Linux kernel developers, the second, apparently, being 'ignore the past'.
(By the way, I found IRIX worked very well on my SGI workstation back when I had one. You do realize that SGI was a workstation company before they got into servers, right?)
You fail to realise that Linux was not a commercial
system when the O(n) scheduler was designed. And
actually, for many workloads the 2.4 scheduler is
faster than the 2.6 scheduler due to less cacheline
footprint. And it was much simpler to understand.
As you say, O(1) scheduling algorithms aren't rocket
science. They are pretty much a 2nd year university
topic, and definitely some if not most kernel devs
would know how to do them. So when they found it was
needed, it was implemented.
There are many many places in the kernel where one
could use a theoretically better algorithm, that isn't
appropriate either due to complexity or lower order
factors. For example, pretty much anywhere that uses an
O(n) hash or an O(log(n)) rb tree could be using an
O(1) radix tree.
The secret to being a good programmer isn't by blindly
implementing everything with the lowest complexity
algorithms, but knowing what to apply and where.
This "Linux sucks because it used to have an O(n)" scheduler is tipical academentia mentality.
Academics tend to forget constants, and assume that an "O(1)" algorithm is better than an O(n) algorithm, case closed.
Well, it the so called "real world", some O(n) algorithms can actually beat O(1) algorithms up until very large values of "n", just because their constants are orders of magnitude smaller.
What's better to keep an array of 500 integers sorted and with low search times? A balanced tree or a simple, brute-force, insertion sort?
"You do realize that you're arguing against the approach that the Linux folk took, right?"
Yes, and no. Their approach was deemed the right one at the time (both simple and good enough), and suited Linux well until it didn't, and that's all that matters.
Good enginnering is all about choosing the most appropriate solution for your needs given the resources available, and knowing when that solution becomes inappropriate and work to fix it.
The O(n) scheduler was good enough, but Linux evolved in ways that forced a change, and they (whoever they were) changed it.
You people should just stop whining. Linux started as a hobby project, yes, people working on it were inexperienced, maybe. But it is eating the proprietary unixes' lunch, even though they were supposedly "better designed" from scratch. Well, it sucks to be them.
"By the way, neither btrees nor insertion sort are O(1)..."
Where did I claim they were? I was merely implying that balanced trees have a better algorithmic complexity (in the average and worst cases) than insertion sort, but up until a certain value of "n" balanced trees have enough overhead to actually make them perform worse. Again, the "good" solution is the one that provides better use of available resources, being good enough for the task at hand.
You fail to realise that Linux was not a commercial
system when the O(n) scheduler was designed.
Um, no. i know exactly what it was when the O(n) "scheduler" was designed: a hobby project undertaken by a young man with a great deal of attitude and very little expertise.
And actually, for many workloads the 2.4 scheduler is faster than the 2.6 scheduler due to less cacheline
footprint. And it was much simpler to understand.
Sorry, but that's simply not true. The 2.4 scheduler isn't any easier to understand or implement than the classic round robin scheduler and it wouldn't have a smaller cacheline footprint in either the i-cache or the d-cache. It would also tend to very quickly become slower than the classic algorithm because it would take more branches and execute more instructions.
The secret to being a good programmer isn't by blindly implementing everything with the lowest complexity algorithms, but knowing what to apply and where.
Yes. Unfortunately, the tendency in Linux is to implement the lowest complexity algorithm first and then find out what to apply later.
>Um, no. i know exactly what it was when the O(n)
>"scheduler" was designed: a hobby project undertaken by
>a young man with a great deal of attitude and very
>little expertise.
And yet you are rabidly arguing that it was a bad
choice based on the fact that you don't find it in
commercial systems.
You wrote:
The so called O(n) scheduler was a poor choice for any
process scheduler, which is why you don't find it in
commercial systems more recent than, oh, 1965.
>>And actually, for many workloads the 2.4 scheduler is
>>faster than the 2.6 scheduler due to less
>>cacheline footprint. And it was much simpler to
>>understand.
>
>Sorry, but that's simply not true.
Umm, yes, it simply is.
>The 2.4 scheduler >isn't any easier to understand or
>implement than the classic round robin scheduler and
That's a great strawman. The 2.6 scheduler is far from
a classic round robin scheduler.
>it wouldn't have a smaller cacheline footprint in
>either the i-cache or the d-cache.
Here's a clue for you though: the 2.4 scheduler is
faster at low runqueue counts, and has a smaller cache
footprint than the 2.6 scheduler.
>It would also tend to very quickly become
>slower than the classic algorithm because it would
>take more branches and execute more instructions.
"quickly" according to a theoretical analysis, maybe.
In practice few workloads have huge numbers of runnable
tasks on the runqueues. Which is obviously why it was
not replaced earlier.
Even now, the biggest benefit from the O(1) scheduler
*in practice* is the per-CPU locking and runqueues. Not
it's constant time complexity.
>Yes. Unfortunately, the tendency in Linux is to
>implement the lowest complexity algorithm first and
>then find out what to apply later.
I don't think you have any idea what the tendency in
Linux is.
Take the virtual->physical map infrastructure that
was added in 2.6. This was initially an O(1) operation
to find 1 physical mapping. This was eventually
replaced by a better design that takes O(log(m)), where
m is the total number of mappings in an address space.
The tendency in Linux is to judge the kinds of
workloads a given algorithm will need to handle, and
try to design something fast but simple. If someone
reports a failure case for their workload, it is
reevaluated.
Scheduler activations? http://tinyurl.com/kscuf
If you're going to schedule many threads on a large scalem SMP, you are eventually going to break down and implement M:N thread scheduling with a library level (user land) and a kernel level scheduler.
The reasons for this have been well known since the mid-80s, and were first documented by various people from Cray Research, based on work done on UniCos on the YM/P.
This is not the same as scheduler activations, but it provides a significant win in moderate-to-fine grain parallism on that class of machine.
Tell that to Sun Microsystems. This is one area, much like that of micro vs monolithic kernel, where what works in theory is very different from works in practice.
Who said anything about theory? The M:N scheduler work done at Cray was applied to real world workloads and showed significant performance improvements.
You will also note that I stipulated that M:N is appropriate for certain workloads. I don't claim, nor did the Cray people, that it was an appropriate scheduling approach for all problem sets.
The published papers on the Cray M:N scheduler do a good job of explaining why it works well and when to use it.
It seems the 2.6 kernel has many similarities with the Windows NT kernel (including XP). Both kernels group task priorities into "dynamic" and "realtime" classes (where "realtime" by no means provides any real guarantees in terms of WCET), they both give I/O intensive tasks priority boosting to provide better UI responsiveness, they're both preemtible to allow higher priority tasks to interrupt a running task, and so on.
There's no doubt the 2.6 kernel is an improvement over the old 2.4 kernel, but I can't say neither of them provide significantly better solutions to common scheduling problems than many other OS schedulers. Not saying they have to either, of course. 
The scheduling model you describe was first implemented in VMS, back in '78 or there about. It's not surprising that NT would use it, given the number of former VMS people working on NT.
Of course, it's a well known, and well understood process scheduling algorithm that really only needed processor affinity added to deal with SMP scheduling.




