Linked by Thom Holwerda on Wed 5th Jul 2006 17:09 UTC, submitted by IdaAshley
Linux "The Linux kernel continues to evolve, incorporating new technologies and gaining in reliability, scalability, and performance. One of the most important features of the 2.6 kernel is a scheduler implemented by Ingo Molnar. This scheduler is dynamic, supports load-balancing, and operates in constant time -- O(1). This article explores these attributes of the Linux 2.6 scheduler, and more."
Thread beginning with comment 141177
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[15]: Linux development model
by Cloudy on Fri 7th Jul 2006 20:46 UTC in reply to "RE[14]: Linux development model"
Cloudy
Member since:
2006-02-15

Because *that* is how you do an O(1) scheduler for
a UNIX like system. Not a stupid round robin scheduler
like you were trying to suggest for 2.0!


The classic round robin scheduler I am discussing (the one I mentioned being documented in RSX documentation, very early in this discussion) is an N priority queue round-robin-within-queue scheduler.

It is the one in which no compute-bound penalty is charged but threads which block rather than having a quantum expire are switched to the 'highest priority' queue and those which have a quantum expire are switched to their 'oriignal priority' queue. It does not feature 'real time' priorities. It does feature scheduler pools because it wasn't designed for an SMP in which separate processor pools make sense. The only optimization it uses is a head pointer that remembers which is the highest priority pool that is not empty.

It is as simple to understand and implement as the "O(n) scheduler" implemented by Linus in 1991, and there is no workload with more than a few processes on the run queue for which it is slower, or has a larger cache footprint (either i or d) than the O(n) queue.

OK, mr pedant. It uses dual arrays of 140 queues. Per CPU. Happy?

I'm always happy. You're still wrong, but I'm always happy. You really should go read kern/sched.c more carefully.

"Use a round robin scheduler" is utterly naive and
wrong. *That* would have been a bad engineering choice.


On that we agree. Unfortunately, you keep using my shorthand against me, as if you can't deal with elipses. When I first mentioned the scheduler that I would have used rather than Linus' O(n) scheduler, I mentioned where it was documented. Throughout this discussion, I've called it various things, since we are just having a conversation on a web forum, and I thought you were following my context. But, for the entire time, I have meant the scheduler I described in the first paragraph of this post. It is neither a pure/simple round robin scheduler, nor is it the 2.6 scheduler. It is a better engineering choice for the sort of system that Linus proposed in '91 than the scheduler he opted for.


1. tell me why a round robin scheduler would have been
better than the O(n) scheduler in terms of general
purpose performance (not context switching performance)


See above. The scheduler I meant when I said "classic round robin" is better than"Linus O(n)" because it schedules faster and otherwise has similar properties.


2. tell me how you can design a priority queue based
scheduler with at least 40 priorities (ie. a simple
UNIX model) will be faster than an O(n) model in
"most interactive applications"


Using the queue head optimization described above means that a priority queue based scheduler always knows without searching which process to run. It runs the one the head pointer is pointing to. If you put the idle process in a queue, you don't even have to check for an empty runqueue.

The cost of the optimization is a single test whenever a process is inserted onto a run queue. The win is that no searching is ever done at schedule time.

The interactivity is achieved by placing processes that are preempted on the tail of the queue for their priority when they were selected and placing processes that become runable after blocking on the tail of the highest priority queue when they become runable.

Once again, the cost is a single test, this time at queue insertion time. (It's really two tests, but they happen in different places in the state machine, so only one is ever required: the preempter places a process at the tail of its priority queue, an act which unblocks a thread places the thread at the tail of the highest priority queue.)

Scheduling time becomes fixed O(1) with a very small value for K, and independent of both n and C. (C, you will recall, is the number of priority queues.) Similarly, process insertion is O(1) with small K and independent of n and C.

There is one weakness to this design compared to basic Unix process scheduling on a uniprocessor, and I'll leave it to you to identify it and explain what the basic Unix process scheduler does differently.

If you set C to 2, by the way, the problem mentioned above disappears, and it is better than the O(N) scheduler in all respects for the workload Linus expected in 1991.

Reply Parent Bookmark Score: 2

nick Member since:
2006-04-17

>The classic round robin scheduler I am discussing (the
>one I mentioned being documented in RSX documentation,
>very early in this discussion) is an N priority queue
>round-robin-within-queue scheduler.

Sorry, that isn't the classic round robin scheduler.

>>OK, mr pedant. It uses dual arrays of 140 queues. Per
>>CPU. Happy?
>
>I'm always happy. You're still wrong, but I'm always
>happy. You really should go read kern/sched.c more
>carefully.

No, you're wrong. What do you think it uses?

>Using the queue head optimization described above
>means that a priority queue based scheduler always
>knows without searching which process to run. It runs
>the one the head pointer is pointing to. If you put
>the idle process in a queue, you don't even have to
>check for an empty runqueue.
>
>The cost of the optimization is a single test whenever
>a process is inserted onto a run queue. The win is
>that no searching is ever done at schedule time.
>
>The interactivity is achieved by placing processes
>that are preempted on the tail of the queue for their
>priority when they were selected and placing processes
>that become runable after blocking on the tail of
>the highest priority queue when they become runable.
>
>Once again, the cost is a single test, this time at
>queue insertion time. (It's really two tests, but they
>happen in different places in the state machine, so
>only one is ever required: the preempter places a
>process at the tail of its priority queue, an act
>which unblocks a thread places the thread at the tail
>of the highest priority queue.)
>
>Scheduling time becomes fixed O(1) with a very small
>value for K, and independent of both n and C. (C,
>you will recall, is the number of priority queues.)
>Similarly, process insertion is O(1) with small K and
>independent of n and C.
>
>There is one weakness to this design compared to basic
>Unix process scheduling on a uniprocessor, and I'll
>leave it to you to identify it and explain what the
>basic Unix process scheduler does differently.

Wait, before you get carried away, you haven't showed
me a working solution yet. This solution has a far
larger cache footprint and will be slower for a small
number of tasks.

The fact that there are a large number of queues means
there will be a large cache footprint. I can't quite
understand exactly what you're describing... but what
happens when you have run the full quantum of your
"queue head" and want to find the next process to run?
You need to search the priority queues, right?

Maybe you have a list of "next to run" processes,
which means you don't have to search. Well then what
happens when you insert a new process into a priority
with no other processes in it? You have to search.

And suppose you have a single process running. OK, so
no searching, right? Maybe so, but with a dynamic
priority it is going to be wandering around the
priority queues. Cache footprint.

You still haven't shown why it is better than the O(n)
scheduler in all ways, because it quite likely would
be slower due to larger cache footprint at n=small.

Reply Parent Bookmark Score: 1

Cloudy Member since:
2006-02-15

This solution has a far larger cache footprint and will be slower for a small number of tasks.

Um, given that you point out below that you don't understand the solution, how, precisely, can you claim this about it?

The fact that there are a large number of queues means there will be a large cache footprint.

Actually, no. The memory footprint is slightly larger than that for Linus' scheduler, but is made up for by fewer branch delays, and by fewer memory fetches on average.

I can't quite understand exactly what you're describing... but what happens when you have run the full quantum of your "queue head" and want to find the next process to run?

You run the process that's pointed to by the "highest priority non-empty queue" pointer you update every time you add a process to a run queue.

You need to search the priority queues, right?

Nope. You always know which is the highest priority queue with a runable process in it, because you always set that whenever you add a process on a run queue, or delete the last active process from a run queue.

There is no searching done at schedule time. There is a single pointer dereference to select the process pointed to by the 'highest priority non-empty queue" pointer.

Maybe you have a list of "next to run" processes,
which means you don't have to search. Well then what
happens when you insert a new process into a priority
with no other processes in it?


If it's a higher priority queue than the current higest priority non-empty queue, you set the 'highest priority non-empty queue" pointer to point to it. If it's not, you do nothing.

And suppose you have a single process running. OK, so no searching, right? Maybe so, but with a dynamic
priority it is going to be wandering around the
priority queues. Cache footprint.


This algorithm has no dynamic priorities. (That's what "no compute bound penalty" is about.) However, it is possible to use dynamic priorities in a way that doesn't cause the only runable process in the system to wander around. Now that would be an optimization I wouldn't have expected from Linus in '91.


You still haven't shown why it is better than the O(n)
scheduler in all ways, because it quite likely would
be slower due to larger cache footprint at n=small.


It's not going to have a larger cache footprint. It's going to have a single additional data structure, in the form of a one dimensional array, that is accessed on very rare occassions, and is less likely to have a large cache foot print than the linked list structure used by the Linus' original algorithm.

Don't confuse memory footprint with cache footprint.

Here's some guarenteed-not-to-compile pseudocode that might help. It's for the uniprocessor case, and I left out synchronization, the implementation of 'run', which doesn't modify any of the queues, and the details of the structures... (Ignore the crappy lack of indentation. stupid web forum software eats spaces ;) )

#define MAX_PRI /* SOME VALUE */
#define PRI_IO MAX_PRI

static runqueue_head_t *rq[MAX_PRI];
static runqueue_head_t *current = rq[0];

static uint maxpri = 0;

/* the idle process is always runable and always on the priority 0 queueu. ie, rq[0] is never empty */

/* Quantum expiration handler */
void expire(void) {
runqueue_entry_t e = current->head;
/* If there's more than one entry on the runqueue */
if (current->head != current->tail) {
remove_from_head(current);
add_to_tail(e, rq[e->pri]);
if (e->pri > maxpri) maxprio = e->pri;
}
run(current->head);
}

/* Any activity that unblocks a blocked task calls this
* e isn't on any runqueue at this point */
void unblock(runqueue_entry_t *e) {
add_to_tail(e, rq[PRI_IO]);
maxpri = PRI_IO;
run(current->head);
}

/* Any activity that blocks a running task calls this
* (killing the task counts as 'blocking')
*/
void block() {
remove_from_head(current);
if (!current->head) {
/* If you figure out what is missing here,
you understand the algorithm. */
}
run(current->head);
}

void add_to_tail(runqueue_entry_t *e, runque_t *rqp) {
if (rqp->head)
rqp->tail->next = e;
else
rqp->head = e;
rqp->tail = e;
e->next = 0;
}

void remove_from_head(runque_t *rqp) {
if (rqp->head)
rqp->head = rqp->head->next;
}

Reply Parent Bookmark Score: 1