To read all comments associated with this story, please click here.
Roman's scheduler uses the same rbtree runqueue as CFS. This are O(1) for retrieving the leftmost task in the runqueue, which is the hot path. The insertion and removal operations are O(log N). This is the best algorithmic complexity that can be achieved in a fine-grained priority queue without consuming gobs of memory and trashing cache performance.
The runqueues used in the O(1) scheduler are arrays of linked lists of tasks. Indexing an array and inserting/removing from the head or tail of a linked list (as implemented in Linux) are both O(1). Computing the index is O(1) with the number of tasks but O(N) with the number of priorities, which is fixed at compile-time.
If the number of priorities is small, like the 100 levels in the O(1) scheduler by default, then this data structure is fine. But if the priority is based on something more continuous, like the amount of time due to a task, then an array is a very bad choice.
For one thing, an array that expresses the full resolution of time values would exhaust virtual memory all by itself. Even a rougher approximation of time would result in a large, sparse array with horrible cache performance.
The most efficient structure known to computer scientists for representing a collection sorted by a numerical key of considerable range is a balanced tree such as RB, AVL, or (for a sufficiently broad definition of "balanced") splay trees.
The O(log N) operations are a classic performance vs. accuracy tradeoff. Furthermore, algorithmic complexity doesn't correlate directly with performance. Depending on cache hotness, inserting a task into an rbtree can be significantly faster than for a priority array.
Finally, O(log N) is effectively constant across the (fairly narrow) typical range for tasks per CPU. This is one of the many advantages of per-CPU runqueues.






Member since:
2006-01-06
This whole scheduler debacle has done nothing but make me wonder WTH is going on?
I recall that although ingo's scheduler uses and RB-tree it's claimed to be O(1) because xxxK jobs only require 15 (whatever number) this. At that point it was evident that B$ needed to be called.
Seems there is lots of politics going on. Probably more than should be.
Maybe kick all the bums out and have a contest on who can write a nice clean clear scheduler with not a lot of code or hack-about mumbo jumbo with snowballing to try to justify it. Maybe Linus needs to smack all these "my code/your code" guys and get back to a culture of "this is the code, it all sucks, some parts sucks less than other parts, get over it"
At least this scheduler is clear and very proveable. Easy to understand and easy to prove is good. Elegant design generally runs better than hacked stuff. The biggest problem always is finding the best "elegant design".
Edited 2007-08-31 18:32 UTC