Linked by Hadrien Grasland on Thu 19th May 2011 21:31 UTC
Hardware, Embedded Systems Having read the feedback resulting from my previous post on interrupts (itself resulting from an earlier OSnews Asks item on the subject), I've had a look at the way interrupts work on PowerPC v2.02, SPARC v9, Alpha and IA-64 (Itanium), and contribute this back to anyone who's interested (or willing to report any blatant flaw found in my posts). I've also tried to rework a bit my interrupt handling model to make it significantly clearer and have it look more like a design doc and less like a code draft.
Permalink for comment 474065
To read all comments associated with this story, please click here.
RE[2]: Pop-up threads
by Alfman on Sat 21st May 2011 00:28 UTC in reply to "RE: Pop-up threads"
Alfman
Member since:
2011-01-28

"If things which can run in parallel run in parallel, you get better scalability than with an all-sequential model like async."


If those are your assumptions, then I can understand your conclusions, but you're assumptions are pretty weak. Hypothetically I could perform nearly any computation in parallel by dividing it into threads. But doing so does not imply better performance. That depends on the ratio of cpu work versus synchronization overhead.

If I create a MT ray tracer on an 8 core processor, which will perform the best?
A) a new thread for each pixel
B) a new thread for each line
C) 8 threads, processing individual lines from a dispatch queue.
D) 8 threads, processing 1/8th of the image at a time

The answer here is obviously not A or B. For performance it never makes any sense to have more threads than cores.

C and D are probably close, but C could win since some cores might finish their work before others, leaving them idle.

"They are cheap (or more exactly can be made so). Why not use them ?"

The point is that it doesn't make sense to use them just because you can, it makes sense to use them when the tradeoffs point in that direction.

"If the task is not CPU-bound, and we consider IO waiting times that are typically orders of magnitude larger than instruction execution times..."

Fine, then your report should say that: the overhead of threads is outweighed by I/O bound factors.

"If the task is CPU-bound, threads offer significantly better performance."

This isn't automatically true. It depends on the work/overhead ratio, especially small workloads will suffer greatly when multithreaded compared to async with no overhead.

It's additionally possible to run a separate async handler on each core using cpu affinity such that multiple async requests can run in parallel with no synchronization overhead at all.


"If so much synchronization is needed that it has a significant impact on interrupt processing speed, it is indeed better to use async."

"Significant" is a judgment call I leave to you. I just object to your claim that threaded is more scalable.


"Which implies going back to the kernel, creating a thread, and waiting until the scheduler dares to run it. Why not have the kernel just do it right away?"

Firstly, most I/O operations will not need threads in the first place, async is already sufficient, why go through the overhead when it's not needed. Secondly if you do have a CPU bound operation, then the cost of a syscall should be fairly negligible. Thirdly, the cost of starting a microthread should be no worse in this scenario than when you start it by default (although I realize this is not strictly true for you since you're using a microkernel which is subject to additional IPC overhead).


"Again, how can processing events sequentially be any more scalable than processing them in parallel?"

You can't just put it in a thread and expect it to run faster.

The overhead for many small threads adds up compared to bigger threads which do more work.
If it costs 3000 cycles to send a thread to another CPU and another 3000 cycles to retrieve the results (thread creation+data transfer), then computations under 12000 cycles are probably better off being run serially on one cpu. Not only is it faster in real time, but it frees the bus for useful work on the other cpu.

You may shave off cycles here and there, but have you ever asked the question "do my tasks actually need threads?"


"Drivers have to explicitely make a choice between using a threaded or an async model."

Ok, but will you get the full async performance benefits if your trying to emulate it inside a threaded model? And it still bothers me that you characterized the async model as worse for performance.

"Yup, that's the whole point of using a threaded model at all."

Forgive me if I'm pointing out something you already considered, but my point was the fact that you are pre-emptively acknowledging the interrupt means that the drivers have to protect themselves from re-entrant (or double threaded in your model) IRQ behavior which would not normally be possible otherwise.

In other words, the implicit ack requires drivers to have more synchronization than would otherwise be necessary if they explicitly re-enabled their interrupt when ready. It may not be a big deal, but it's still several hundred/thousand cpu cycles wasted every IRQ.

The thing that baffles me about your model for drivers, is that it emphasizes threads which will rarely, if ever, be used without a mutex to serialize them again. If one driver forgets a mutex and accesses a device simultaneously from multiple CPUs, it is extremely likely to be a bug rather than a deliberate action.

Edited 2011-05-21 00:29 UTC

Reply Parent Score: 1