Linked by Hadrien Grasland on Thu 19th May 2011 21:31 UTC
Hardware, Embedded Systems Having read the feedback resulting from my previous post on interrupts (itself resulting from an earlier OSnews Asks item on the subject), I've had a look at the way interrupts work on PowerPC v2.02, SPARC v9, Alpha and IA-64 (Itanium), and contribute this back to anyone who's interested (or willing to report any blatant flaw found in my posts). I've also tried to rework a bit my interrupt handling model to make it significantly clearer and have it look more like a design doc and less like a code draft.
Permalink for comment 474216
To read all comments associated with this story, please click here.
RE[4]: Pop-up threads
by Alfman on Sun 22nd May 2011 19:18 UTC in reply to "RE[3]: Pop-up threads"
Alfman
Member since:
2011-01-28

Neolander,

"If by performance benefits you mention the fact that async has only one task running at the time and as such doesn't have to care about synchronization and that pending tasks cost is kept minimal, then yes this model may provide that."

The thing is, you are going to have async and threaded drivers running side by side. This implies that even the async drivers will need to use synchronization when accessing shared resources.

I throw out this as an example:

Two userpace apps read from two different files. In the threaded model, this results in two user+kernel threads blocking in the file system driver, which itself has blocked in the block device driver.

Ignoring all the synchronization needed in the FS driver (and writes), the block driver (or the cache handler) must/should create a mutex around the blocks being read so that other threads requesting the same blocks are blocked until it is read. I think such a structure would require at least two mutexes, one for the specific blocks being read, and another for the structure itself.

Therefor a thread reading a cached block would only need to synchronize against one mutex, find it's data, and return immediately.

A thread reading an uncached block would synchronize against the structure mutex, latch onto a new or existing block read mutex, queue the disk request (if new) and release the structure mutex.

This way the structure mutex is only ever held momentarily, and the read mutex is held until the disk reads a block. After which all blocked read threads can resume.

I expect that this is more or less what you'll be writing?


Now, my point about async drivers is that zero synchronization is needed. It's true, that requests to this driver will be forced to be serialized.

However:
1) Many drivers, like the one described above, need a mutex to serialize requests anyways. (This could be mitigated by dividing the disk/structure into 4 regions so that concurrent threads are less likely to bump into each other, however then you need another mutex to serialize the requests to disk since IO to one device cannot be executed from two CPUs simultaneously).

2) If the driver's primary role is scheduling DMA IO and twiddling memory structures with very little computation, then these tasks are not really suitable for parallelization since the overhead of synchronization exceeds the cost of just doing the work serially.


"The assumption behind this is that in many situations, the order in which things are processed only matters in the end, when the processing results are sent to higher-level layers. Like rendering a GUI..."


Yes, I won't deny that some layers will benefit from parallelism. However, propagating that parallelism into drivers which are fundamentally serial in nature will make those drivers more complex and could even slow them down. These drivers will require thread synchronizations when an async model could handle it's state without synchronizations (more on this later for the other poster).

I'd like to emphasize that I'm not arguing against the threaded model, particularly when they make or break the design of the paradigm. I'm just trying to suggest that sometimes there are cases where the best MT implementation performs worse than the best ST implementation, and device drivers may be one of those cases.

Reply Parent Score: 1