Linked by Hadrien Grasland on Sun 27th Feb 2011 12:06 UTC
Hardware, Embedded Systems This is a situation where I need the help of you OSnews readers who are experienced with low-level development on ARM, SPARC, PowerPC, MIPS, and other hardware architectures we have on computers nowadays. The issue is that I'm currently designing the part of my hobby kernel which takes care of interrupts. Although I mostly work on x86 at the moment, I'd like to keep this code portable to other hardware architectures in the future. To do that, I have to know how interrupt handling works on as much HW architectures as possible.
Thread beginning with comment 464302
To read all comments associated with this story, please click here.
General Notes
by Brendan on Mon 28th Feb 2011 13:47 UTC
Brendan
Member since:
2005-11-16

Hi,

If you can handle a complex/messy architecture like x86, then it shouldn't be too hard to adapt that to something simpler.

For the following I use the term "OS" as a generic term meaning "thing that controls IRQs". Depending on your OS you can replace it with "kernel" or "device manager" or whatever suits.

Here's a list of things you might or might not have overlooked for x86:

1) Interrupts are prioritised, either with the method used by the PIC chips, or by the "vector number" based method used by the APICs. Some OSs bypass these priority mechanism/s (mask the IRQ, send the "End Of Interrupt" to allow lower priority IRQs to interrupt, then handle the interrupt and unmask it when you're done) but this has slightly higher overhead (especially for PIC) and is rarely used unless the OS has it's own priority scheme (e.g. relies on the scheduler's thread priorities for deciding when the IRQ will actually be handled). In any case, from a driver developer's perspective it's the same - receive interrupt, handle the interrupt, then tell the OS you've finished. The OS abstracts the underlying details.

2) For PICs, there's an extra "routing" layer added between things like the PCI host controller and the PIC chips that allows the OS and/or firmware to change the way PCI IRQs are mapped to PIC inputs. This can be used to (for e.g.) reduce the amount of IRQ sharing in some cases.

3) When APICs are used, the OS decides which interrupt vector (and therefore which priority) each IRQ uses. In multi-CPU systems the OS can also decide which CPU (fixed delivery) receives the IRQ, or can decide the IRQ should be handled by the lowest priority CPU within a certain group of CPUs or within the group of all CPUs (send to lowest priority, logical delivery).

4) For larger systems, there's the concept of IRQ load balancing. If a CPU is sleeping (to save power or reduce heat) you don't really want to wake it up to handle an IRQ when other CPUs are already running. If a CPU is running a very high priority task (e.g. real-time) then it's better if other CPUs handle the IRQs instead. This is where "send to lowest priority, logical delivery" gets tricky (although it would also be possible to dynamically reprogram the APIC/s for the purpose of managing IRQ load).

5) For ccNUMA (recent Intel and AMD with "multi-socket" motherboards) it's nice to get the IRQs delivered to a CPU that is "close" to the device itself; as this reduces a little overhead (e.g. bandwidth/latency across the Quickpath or Hypertransport links).

6) For PCI interrupts are level triggered and the same IRQ (from the CPU's perspective) can be used by multiple different devices. The end result of this is that the OS has to maintain a list of drivers that are interested in each IRQ, and ask each driver in the list if its corresponding device was responsible for causing the IRQ. This complicates things a little. On a single-CPU system (where you're asking one driver at a time if their device was responsible), you'd want to arrange those lists in order of which devices most frequently cause each IRQ. For example, if an IRQ occurs you might ask an ethernet card driver if it was responsible, and if that driver says the ethernet card was responsible for the IRQ then you wouldn't bother asking any other device drivers (to avoid unnecessary overhead). On a multi-CPU system it may be better (for latency) to ask multiple drivers (on different CPUs) at the same time. For example, with 2 CPUs you'd ask the two most likely drivers, and if neither of them were responsible for the IRQ you ask the next 2 drivers in the list, etc.

7) For APICs (especially in larger systems), there's no real limit to the number of IO APICs that may be present, and no fixed number of inputs that each IO APIC may have. You might have one IO APIC with 24 inputs, or a pair of IO APICs with 32 inputs each, or a set of four IO APICs with 16 inputs in the first 2 and 32 inputs in the second pair, or... During boot you have to detect the number of IO APICs and the number of IO APIC inputs on each.

8) For modern PCI (e.g. PCI express) there's something called "Message Signalled Interrupts". The idea is that the device uses the PCI bus to send the interrupt directly, without a using an interrupt controller input. This helps to avoid interrupt sharing caused by lack of interrupt controller inputs and/or lack of PCI interrupt lines.

9) The older "xAPIC" was only capable of handling a maximum of 255 "agents" (CPUs and IO APICs). For "x2APIC" this "255 agents" limit was broken severely (the new limit is something like "16 million agents" or something). To cope with this the IO APICs gained an interrupt redirection capability.

10) There's a maximum of 256 IVT/IDT entries, 32 of them are reserved for exceptions. For multi-CPU/APICs, at least 1 IVT/IDT entry should be used for spurious IRQs, and some more are needed for IPIs. That leaves a maximum of about 220 IVT/IDT entries. This doesn't necessarily have to be a global limit, and could even be considered a "per CPU" limit. For example, a computer with 4 CPUs could have a different IDT for each CPU and could handle up to 880 separate IRQ sources with no IRQ sharing at all. Of course this is likely to be overkill; however if you think about "huge" servers; and understand what I said in (5) about ccNUMA, what I said in (8) about MSI, and understand the APIC's interrupt priority scheme; then you might (or might not) see the need for using a separate IVT/IDT for each NUMA domain.

11) An OS may need to be able to dynamically reconfigure "all of the above" while it is running (e.g. long after boot), in response to hardware changes caused by (for e.g.) hot-plug PCI devices, changing load, power management, etc.

12) For APICs (in multi-CPU systems), as far as the interrupt acceptance logic goes there's little difference between an IRQ and an IPI (Inter Processor Interrupt). IPIs are used by the OS on one CPU to ask other CPU/s to do "something" (for a common example, read about "multi-CPU TLB shootdown" in your favourite Intel/AMD manual). These IPIs take part in the APIC's interrupt priority scheme.

13) There's also "special" types of IRQs and special types of IPIs. Examples include NMI (mostly used for watchdog timers and/or kernel-level profiling I guess); and the local APIC's IRQs used for the local APIC's timer, thermal status, performance monitoring, etc. Some of these special interrupts don't take part in the APIC's interrupt priority scheme (NMI, INIT IPI) while some do (most of them).

This all just for 80x86 alone. If your OS is flexible enough to support most/all of this; then other architectures are likely to be easy to support. About the only thing I can imagine being different enough to be interesting is only having a single IRQ and a "where from" register (rather than multiple IRQs and a table like the IVT/IDT). However; you can easily emulate a table if the architecture has none, and you can easily emulate an "interrupt number" if the architecture does have a table; so it doesn't matter much either way.

- Brendan

Edited 2011-02-28 13:51 UTC

Reply Score: 5

RE: General Notes
by Anachronda on Mon 28th Feb 2011 19:21 in reply to "General Notes"
Anachronda Member since:
2007-04-18

For example, if an IRQ occurs you might ask an ethernet card driver if it was responsible, and if that driver says the ethernet card was responsible for the IRQ then you wouldn't bother asking any other device drivers (to avoid unnecessary overhead).


Nope. Multiple devices can interrupt simultaneously. If you do it this way, you have to pay the interrupt overhead twice if that occurs (the first time when you stopped asking at the Ethernet driver, but the interrupt was still asserted; the second time as sson as the Ethernet driver handled its interrupt).

Reply Parent Score: 1

RE[2]: General Notes
by Brendan on Tue 1st Mar 2011 10:31 in reply to "RE: General Notes"
Brendan Member since:
2005-11-16

Hi,

"For example, if an IRQ occurs you might ask an ethernet card driver if it was responsible, and if that driver says the ethernet card was responsible for the IRQ then you wouldn't bother asking any other device drivers (to avoid unnecessary overhead).


Nope. Multiple devices can interrupt simultaneously. If you do it this way, you have to pay the interrupt overhead twice if that occurs (the first time when you stopped asking at the Ethernet driver, but the interrupt was still asserted; the second time as sson as the Ethernet driver handled its interrupt).
"

I'm used to micro-kernels, where drivers run in user-space and the IRQ overhead itself (and therefore the potential risk of a second IRQ) is nothing compared to the cost of a (potentially unnecessary) task switch.

For monolithic kernels where performance is considered far more important than protection/security; the fastest way is to trash the kernel's code with random data as soon as any IRQ occurs. It's guaranteed to be faster than waiting for any number of arbitrary drivers to trash the kernel... :-)

- Brendan

Reply Parent Score: 2

RE: General Notes
by DeepThought on Tue 1st Mar 2011 08:38 in reply to "General Notes"
DeepThought Member since:
2010-07-17


This all just for 80x86 alone. If your OS is flexible enough to support most/all of this; then other architectures are likely to be easy to support. About the only thing I can imagine being different enough to be interesting is only having a single IRQ and a "where from" register (rather than multiple IRQs and a table like the IVT/IDT). However; you can easily emulate a table if the architecture has none, and you can easily emulate an "interrupt number" if the architecture does have a table; so it doesn't matter much either way.

*Arg*, by no means one should try to emulate the weird and crippled x86-IRQ architecture. Most SoCs provide a much simpler way giving one interrupt number/vector per external interrupt (PIC view).

Edited 2011-03-01 08:39 UTC

Reply Parent Score: 1