Following the recent release of an anticipatory IO scheduler, Andrea Arcangeli started a lengthy thread in which he proposed an SFQ (Stochastic Fair Queuing) disk scheduler. The idea was picked up by Jens Axboe who had evidently worked on a similar idea earlier. Jens quickly posted two different disk schedulers utilizing “fair queuing” algorithms, more commonly used in handling network traffic. Read the full story at KernelTrap.
I would suggest that network models do not directly apply to disk scheduling.
In the network model, we have multiple flows which need to be switched out an interface. If we are in a congested situation it is desirable to ensure that any one flow does not completely subscribe the link.
This analogy can be taken to disk scheduling, where we have multiple IO operations which ‘congest’ the IO subsystem in that a queue develops with requests which have not been handled.
There are a few differences however:
1. In a network, we can throw away packets after the queue has reached maximum length. What happens in an IO subsystem when reads/writes are just thrown away? Increasing the delay in fully handling IO requests will result in increased queue depth, which means more IO requests waiting to be serviced, which results in further delay for each IO request to receive a time slice.
2. In a network, there is little to no extra cost in switching between flows. In IO scheduling, there is the additional seek time each time the scheduler switches requests. This increase in disk seek activity can exacerbate item #1.
I would suggest that the purpose of the IO scheduler is to drain the list of IO requests as fast as possible. I do agree that having a mechanism to prevent huge IO operations from monopolizing the system is beneficial. However, I think a simple Weighted Round Robin algorithm with large credit values would work. (Yes, I know WRR is a form of WFQ)
The main idea is not to get to granular, or you would run into an issue such as timeslices which are to small for SMP, more time spent context switching then processing, or in this case, more time moving drive heads than writing/reading data.
– Kelson
Yeah, okay, I’m not exactly an expert of disk scheduling, but I’m certainly open to being schooled on the subject.
I see your point and I guess you’re right. I came across those problems by the time I started creating the disk I/O subsystem for an OS I’ve been developing.
The goal of that patch is not to have a better throughput but a more reactive system.
– Paolo