Linked by Hadrien Grasland on Thu 19th May 2011 21:31 UTC
Hardware, Embedded Systems Having read the feedback resulting from my previous post on interrupts (itself resulting from an earlier OSnews Asks item on the subject), I've had a look at the way interrupts work on PowerPC v2.02, SPARC v9, Alpha and IA-64 (Itanium), and contribute this back to anyone who's interested (or willing to report any blatant flaw found in my posts). I've also tried to rework a bit my interrupt handling model to make it significantly clearer and have it look more like a design doc and less like a code draft.
Permalink for comment 474294
To read all comments associated with this story, please click here.
RE[5]: Pop-up threads
by Neolander on Tue 24th May 2011 09:49 UTC in reply to "RE[4]: Pop-up threads"
Member since:

Sorry for the delay, I have been too sick for complex thinking during the last few days.

The thing is, you are going to have async and threaded drivers running side by side. This implies that even the async drivers will need to use synchronization when accessing shared resources.

Actually, you have this problem even if you have several distinct async drivers running side by side. As soon as there are shared resources, there is a synchronization overhead.

But I'm a bit curious about why separate drivers would need to share much resources with each other. It seems that you provide an example below, though so I'm reading a bit further.

I throw out this as an example:

Two userpace apps read from two different files. In the threaded model, this results in two user+kernel threads blocking in the file system driver, which itself has blocked in the block device driver.

Aggggghhhhhh... Disk I/O in drivers ! u_u Banish these impure thoughts from your head before becoming a fiendish servant of Satan ! You can still see the light !

Joking aside, I think that anything that requires some reactivity (and most drivers are included) should never, ever, depend on blocking file I/O. Nonblocking file I/O (like writing things is a log without caring when it's actually written to disk) is totally okay, on the other hand, but in this case we wouldn't have so much synchronization problems.

Ignoring all the synchronization needed in the FS driver (and writes), the block driver (or the cache handler) must/should create a mutex around the blocks being read so that other threads requesting the same blocks are blocked until it is read. I think such a structure would require at least two mutexes, one for the specific blocks being read, and another for the structure itself. (...)

I've already admitted before that for something as I/O centric as a disk/block device driver, where there is only little processing involved, queuing events in an async model is probably best ;)

Yes, I won't deny that some layers will benefit from parallelism. However, propagating that parallelism into drivers which are fundamentally serial in nature will make those drivers more complex and could even slow them down. These drivers will require thread synchronizations when an async model could handle it's state without synchronizations (more on this later for the other poster).

You see, this is actually something I'm wondering about. Are all drivers fundamentally serial in nature and doing little processing ?

Some time ago, I was wondering about an efficient way to do an FTIR touchscreen driver. In case you've not heard about those, their sensitive layer uses infrared light trapped within a piece of glass through total internal reflexion, that can only be scattered outside of the glass when something (like, say, a finger) comes in contact with the it. A webcam captures the infrared image, everything from that point must be done in software.

So we have to do a webcam driver, a blob detection/tracking system, and some kind of real time video output.

Now, I don't know what kind of data comes out of a webcam, but I guess it varies from one webcam to another. Maybe some will output raw bitmap pictures, some will output jpeg frames, and some will output MPEG-ish videos with i-frames and p-frames. Most peripherals send an interrupt when their output buffer is full, so we do not know how many frames will have to be processed at once (especially for variable-sized frames like in MPEG video).

Suppose that our blob detection/tracking algorithm works with bitmap data. Then there is some amount of image conversion involved. In case the frames are sufficiently independent from each other (not guaranteed for MPEG, but certainly the case for JPEG), it's better to do the decoding in parallel because it's essentially a cpu-intensive operating, and nothing scales better on multiple cores than independent operations.

Blob detection and tracking might work first by locating blobs in the initial pictures the brute-force way (tresholding + noise removal + locating sufficiently large packs of pixels in the whole picture) and then by only looking for the blobs in a region around their former positions, based on their estimated speed. Since no writes are involved, no synchronization is needed, and looking for blobs in these slices of pictures can be done on a "one-thread-per-slice" basis, with communication between threads only needed when such slices overlap each other and it's difficult to know to which slice a blob belongs.

Edited 2011-05-24 09:53 UTC

Reply Parent Score: 1