Linked by Thom Holwerda on Mon 6th Aug 2007 15:22 UTC, submitted by lomax
Linux "I took a little while to learn more about SD and CFS to see what the linux guys were up to. I have a couple of interesting comments. Including some discussion of increased algorithm complexity in the CFS scheduler; it's no longer O(1). Please keep in mind that these comments are entirely from reading code and documentation. I make no claim of knowing which scheduler actually makes the best scheduling decisions." Jeff Roberson (jeffr), the developer of the new ULE3.0 scheduler for FreeBSD 7, wrote down some insights about the Linux SD and CFS schedulers.
Order by: Score:
ULE is great
by jackson on Mon 6th Aug 2007 15:41 UTC
jackson
Member since:
2005-06-29

I don't understand half the stuff these kernel hackers talk about, but I have been running -CURRENT for awhile now, and when the ULE scheduler updates hit in mid-July, it has been awesome. The responsiveness, especially during buildworld or buildkernel, is unbelievable. Great job, jeffr.

Reply Score: 9

Not O(1)?
by diegocg on Mon 6th Aug 2007 15:47 UTC
diegocg
Member since:
2005-07-08

Not according with its author AFAIK...

Edit: Oh, he meant insertion/removal of tasks in the rbtree. That's true, and here's the explanation: http://lkml.org/lkml/2007/4/13/252



> I'm not in love with the current or other schedulers, so I'm
> indifferent to this change. However, I was reviewing your release
> notes and the patch and found myself wonder what the logarithmic
> complexity of this new scheduler is .. I assumed it would also be
> constant time , but the __enqueue_task_fair doesn't appear to be
> constant time (rbtree insert complexity).. [...]

i've been worried about that myself and i've done extensive measurements
before choosing this implementation. The rbtree turned out to be a quite
compact data structure: we get it quite cheaply as part of the task
structure cachemisses - which have to be touched anyway. For 1000 tasks
it's a loop of ~10 - that's still very fast and bound in practice.

here's a test i did under CFS. Lets take some ridiculous load: 1000
infinite loop tasks running at SCHED_BATCH on a single CPU (all inserted
into the same rbtree), and lets run lat_ctx:

neptune:~/l> uptime
22:51:23 up 8 min, 2 users, load average: 713.06, 254.64, 91.51

neptune:~/l> ./lat_ctx -s 0 2
"size=0k ovr=1.61
2 1.41

lets stop the 1000 tasks and only have ~2 tasks in the runqueue:

neptune:~/l> ./lat_ctx -s 0 2

"size=0k ovr=1.70
2 1.16

so the overhead is 0.25 usecs. Considering the load (1000 tasks trash
the cache like crazy already), this is more than acceptable.

Ingo


Edited 2007-08-06 15:59

Reply Score: 7

RE: Not O(1)?
by anevilyak on Mon 6th Aug 2007 15:52 UTC in reply to "Not O(1)?"
anevilyak Member since:
2005-09-14

To be fair he only indicates that insertion/removal of threads to/from the run queues is O(log n) now, which would be correct if it is in fact using a red-black tree (I haven't looked at CFS's code to verify that). This doesn't actually indicate whether or not the process of selecting the next thread to run is also O(log n), and I don't think Jeff ever implied that it was. For instance, if the process of selecting the next thread to run is something like picking the topmost node of the tree, scheduling itself would still be O(1).

Edited 2007-08-06 15:53

Reply Score: 7

RE[2]: Not O(1)?
by martinus on Mon 6th Aug 2007 20:03 UTC in reply to "RE: Not O(1)?"
martinus Member since:
2005-07-06

You are right that picking the topmost node can be O(1), but as soon as you have to insert/remove elements too (which he has to), the overall complexity will be O(log n).

Reply Score: 1

RE[3]: Not O(1)?
by pseudocode on Tue 7th Aug 2007 09:46 UTC in reply to "RE[2]: Not O(1)?"
pseudocode Member since:
2007-05-30

O(1), O(log n)... who really cares, as long as the complexity is not exponential, it's ok.

The adding/removing of threads is an exceptional event from the kernel point of view. Most of the time, the kernel deals with existing threads: sleep, wake-up, wait for I/O, run, switch, ...

If the rate of added/removed threads is nearly the same than the rate of context switch, there's something wrong in your system. :-)

Reply Score: 1

RE: Not O(1)?
by smitty on Mon 6th Aug 2007 19:31 UTC in reply to "Not O(1)?"
smitty Member since:
2005-10-13

Another response by Ingo:

(disclaimer, i'm the main author of CFS.)

I'd like to point out that CFS is O(1) too.

With current PID limits the worst-case depth of the rbtree is ~15 [and O(15) == O(1), so execution time has a clear upper bound]. Even with a theoretical system that can have 4 million tasks running at once (!), the rbtree depth would have a maximum of ~20-21.

The "O(1) scheduler" that CFS replaces is O(140) [== O(1)] in theory. (in practice the "number of steps" it takes to schedule is much lower than that, on most platforms.)

So the new scheduler is O(1) too (with a worst-case "number of steps" of 15, if you happen to have 32 thousand tasks running at once(!)), and the main difference is not in the O(1)-ness but in the behavior of the scheduler.

Reply Score: 6

RE[2]: Not O(1)?
by FunkyELF on Mon 6th Aug 2007 21:11 UTC in reply to "RE: Not O(1)?"
FunkyELF Member since:
2006-07-26

This kind of stuff really ticks me off.

O(log(n)) does not equal O(1) for small values of n!!!

That would be like saying okay, its really O(n^2) but since n has an upper limit of 32,000 the worst case is O(1,024,000,000) which is still O(1).

EVERY algorithm has limits...that doesn't make them constant time algorithms!

Reply Score: 4

RE[3]: Not O(1)?
by smitty on Mon 6th Aug 2007 22:56 UTC in reply to "RE[2]: Not O(1)?"
smitty Member since:
2005-10-13

I agree, but Ingo has a point. The reason we define algorithms as O(1) or 0(n) is to give an idea of what their real-world performance will be like. In this instance, due to the fact that n will always be very small, the O(log(n)) scheduler can actually outperform the O(1) scheduler which is not what you would expect since when talking about algorithms we usually assume n is approaching infinity.

So while it is technically a O(log(n)) algorithm, in practice it outperforms the O(1) algorithm. Therefore, even describing it as O(log(n)), while correct, is misleading if you are only trying to get a handle on how it actually performs.

Reply Score: 4

RE[4]: Not O(1)?
by Doc Pain on Mon 6th Aug 2007 23:26 UTC in reply to "RE[3]: Not O(1)?"
Doc Pain Member since:
2006-10-08

"The reason we define algorithms as O(1) or 0(n) is to give an idea of what their real-world performance will be like."

To be honest... no. Characterizing algorithms according to a special O value is to give an idea of how consumption amount (time needed, ressources required, load created etc.) is related to request amount (items inserted, input data given etc.).

"In this instance, due to the fact that n will always be very small, the O(log(n)) scheduler can actually outperform the O(1) scheduler which is not what you would expect since when talking about algorithms we usually assume n is approaching infinity.

So while it is technically a O(log(n)) algorithm, in practice it outperforms the O(1) algorithm. Therefore, even describing it as O(log(n)), while correct, is misleading if you are only trying to get a handle on how it actually performs."


This is correct, as long as you may be able to assume certain values or value ranges for n. As someone else replied, even O(n^2) or O(x^n) may look "good" - if n is small enough. :-)

Remember: The O notation does not take values of n in mind.

Reply Score: 5

RE[5]: Not O(1)?
by smitty on Tue 7th Aug 2007 02:39 UTC in reply to "RE[4]: Not O(1)?"
smitty Member since:
2005-10-13

Yes, I'm well aware of everything you said. I guess what I should have said was that the O value for this algorithm is completely meaningless since for the possible values of n there is little difference.

Theoretical analysts and people making general purpose algorithms are certainly going to be interested in O values, but when I said we use it to get an idea of real-world performance what I meant was that developers using an algorithm really only care about how it applies to their particular uses.

Reply Score: 2

RE[5]: Not O(1)?
by PlatformAgnostic on Tue 7th Aug 2007 03:29 UTC in reply to "RE[4]: Not O(1)?"
PlatformAgnostic Member since:
2006-01-02

O notation ignores constant factors and constant addends, so you could always be below the crossover point for a less efficient algorithm. There are also probabilistic effects too. Did you know that qsort is technically O(n^2) under worst-case circumstances? Nevertheless we use it over merge-sort which has O(n log n) performance in all cases because it uses less memory and it's faster on average due to the constant factors.

Also in some C runtimes (I've seen this in the MSVCRT), the sorts for the small subproblems of qsort are implemented as selection sorts because those have better cache and pipeline behavior for small data sets.

Reply Score: 2

RE[6]: Not O(1)?
by smitty on Tue 7th Aug 2007 04:04 UTC in reply to "RE[5]: Not O(1)?"
smitty Member since:
2005-10-13

Big O values are always the worst case. There are a couple of other greek symbols I've forgotten that are used for the best case and average case values. I remember the average case being quite a bit more difficult to calculate.

The entire notation system isn't meant to be precise, it is to give a very rough estimate of how scalable an algorithm is to large values. Thus, an algorithm O(50000*n) still reduces to O(n) (linear) because for a significantly large value of n it will clearly separate itself from constant, logarithmic, quadratic, or exponential algorithms. Of course, it might make sense to use an O(n^2) algorithm if you knew n would never get very large in your application, or there could be plenty of other linear algorithms far superior.

Edited 2007-08-07 04:05

Reply Score: 4

RE[7]: Not O(1)?
by IamScared on Tue 7th Aug 2007 14:44 UTC in reply to "RE[6]: Not O(1)?"
IamScared Member since:
2007-01-11

Big O values are not always specified on the worst case circunstances. You can say that InsertionSort is O(n) on the best case, for example. But, given the circunstances, Big O notation gives an upper bound on the complexity of the algorithm.

The other symbols are theta(), omega(), o() and little_omega().

Reply Score: 1

RE[6]: Not O(1)?
by yorthen on Fri 10th Aug 2007 10:00 UTC in reply to "RE[5]: Not O(1)?"
yorthen Member since:
2005-07-06

Also in some C runtimes (I've seen this in the MSVCRT), the sorts for the small subproblems of qsort are implemented as selection sorts because those have better cache and pipeline behavior for small data sets.


Are you sure you didn't mean heap sort, (selection sort is O(n^2) )? There's an algorithm called introspective sort that starts out as quicksort but then switches to heap sort, worst case it's O(n log n) and is comparable to quicksort on all inputs.

Reply Score: 1

RE[4]: Not O(1)?
by g2devi on Tue 7th Aug 2007 16:24 UTC in reply to "RE[3]: Not O(1)?"
g2devi Member since:
2005-07-09

Actually, O(log(n)) might outperform O(1) for even large n or even every physically possible n. By Definition, O() rules out constants and any lower order terms as being insignificant. But in the real world, if the setup time for the O(1) scheduler is 1 day and the O(n^2) scheduler is zero, it doesn't matter that O(n^2) has horrible scaling, it's still better than the O(1) scheduler for virtually any server or desktop out there.


As far as the ULE3.0 versus CFS issue is concerned, I have no opinion, since I have no knowledge (I haven't even tested them out). But I personally welcome the critique. If it's invalid, it'll be easily refuted, but if it's valid, it'll highlight a place that needs fixing.

Reply Score: 2

multi cores
by netpython on Mon 6th Aug 2007 15:51 UTC
netpython
Member since:
2005-07-06

I wonder if freeBSD does support core2duo,X2 or core2quad.

Reply Score: 1

RE: multi cores
by Dr_J on Mon 6th Aug 2007 17:47 UTC in reply to "multi cores"
Dr_J Member since:
2005-07-06

I wonder if freeBSD does support core2duo,X2 or core2quad.

Yes, including dual quad-core systems. Why would it not? It has supported SMP for a very long time, and the recent Intel and AMD chipsets are supported well.

Edited 2007-08-06 17:48

Reply Score: 12

RE[2]: multi cores
by Oliver on Mon 6th Aug 2007 19:18 UTC in reply to "multi cores"
Oliver Member since:
2006-07-15

Lol this is the funniest question since I read OSNews :o)

Reply Score: 3

better toolchain?
by OStourist on Mon 6th Aug 2007 15:53 UTC
OStourist
Member since:
2007-06-19

I heard freebsd-7 uses gcc 4 something. I wonder
if that will help with all the broken applications .
(like yammi) . I am on DesktopBSD which uses 6.2
and it's no better or worse than linux on average
for desktop responsiveness.(but is better in other ways)

Reply Score: 2

RE: better toolchain?
by jackson on Mon 6th Aug 2007 16:00 UTC in reply to "better toolchain?"
jackson Member since:
2005-06-29

That sounds like a ports issue to me not really a toolchain one. Have you filed a PR against yammi? I have not tried yammi, but I have not had any issues with ports whatsoever, as long as you read /usr/ports/UPDATING.

Like I posted above, I am using 7.0-CURRENT and have not encountered any major issues in ports. I can even get my game face on with UT, Quake3, Tremulous, etc. I am really, really looking forward to the official release of FreeBSD 7.0. FreeBSD rocks.

Reply Score: 3

RE[2]: better toolchain?
by OStourist on Mon 6th Aug 2007 16:20 UTC in reply to "RE: better toolchain?"
OStourist Member since:
2007-06-19

I should have explained better . I like Amarok but want to try ALL the possibilies linux has like exaile and songbird, totem for movies and vlc..so far most have been good but i occasionally get a compile error during the build. This might be due
to the original software being compiled with gcc 4 and not gcc 3 since gcc 4 is the standard on linux now.
I'm not sure if reading /usr/ports/UPDATING will help
if the port you install is new.
Agreed FreeBSD rocks and wil try to use it after my last
"update" to Fedora Linux left me with a broken gdb and unusable Gnome.

Reply Score: 1

RE[3]: better toolchain?
by netpython on Mon 6th Aug 2007 17:08 UTC in reply to "RE[2]: better toolchain?"
netpython Member since:
2005-07-06

update" to Fedora Linux left me with a broken gdb and unusable Gnome

The're more linux distros you know :p
Though FreeBSD is a very fine OS.

Reply Score: 3

RE[3]: better toolchain?
by Doc Pain on Mon 6th Aug 2007 23:29 UTC in reply to "RE[2]: better toolchain?"
Doc Pain Member since:
2006-10-08

"I'm not sure if reading /usr/ports/UPDATING will help if the port you install is new."

Have you tried installing a precompiled package (pkg_add -r)? In most cases you don't need to compile your applications. The precompiled packages are usually quite up to date. Compiling from source is - in most cases - only needed at machines with lower speed where you want to achieve some speed gain due to optimization, or if you need special configurations to be configured at compile time (e. g. mplayer).

Reply Score: 2

RE: better toolchain?
by kkenn on Mon 6th Aug 2007 22:32 UTC in reply to "better toolchain?"
kkenn Member since:
2007-08-06

gcc 4 does not change performance for ordinary applications, since most applications are not CPU bound.

If you are having problems with a particular application, please file a bug report in the usual way.

Reply Score: 2

Never mind theory
by Kebabbert on Tue 7th Aug 2007 16:00 UTC
Kebabbert
Member since:
2007-07-27

If the constants are small and the input is bounded (here: 32000), you can discard theory and choose the best algorithm for this inputs. Small constants are more important than a fancy complicated algorithm with lower complexity.

Ergo; never mind if it is O(1) or O(log n), for this small input Ingo is right. Instead simulations (testing) would be an alternative instead of theory, to choose the best scheduler - in practical work.

Reply Score: 2