The 8-bit Z-80 processor is famed for use in many early personal computers such the Osborne 1, TRS-80, and Sinclair ZX Spectrum, and it is still used in embedded systems and TI graphing calculators. I had always assumed that the ALU (arithmetic-logic unit) in the Z-80 was 8 bits wide, like just about every other 8-bit processor. But while reverse-engineering the Z-80, I was shocked to discover the ALU is only 4 bits wide! The founders of Zilog mentioned the 4-bit ALU in a very interesting discussion at the Computer History Museum, so it's not exactly a secret, but it's not well-known either.

I have been reverse-engineering the Z-80 processor using images from the Visual 6502 team. The image below shows the overall structure of the Z-80 chip and the location of the ALU. The remainder of this article dives into the details of the ALU: its architecture, how it works, and exactly how it is implemented.

Ken Shirrif's blog is an absolute must for fans of ultra-low-level hardware stuff. This goes way over my head, but interesting nonetheless.

To read all comments associated with this story, please click here.

Member since:

2005-07-08

I reverse engineered a dozen of these 1979 NMOS processors to learn about how the ALUs, register files, ROMs, RAMs, PLAs, clocks, back bias circuits worked. I had forgotten how the Z80 worked though, but from the circuit given here there is a good reason why it was done this way.

In the 6800, 8080, 6502 the carry path used was usually a pass gate with a minimal delay per bit. 4 carry cells in series would look like a distributed 4 bit tree like nand gate with extra devices to steer 1s & 0s or to bypass to the output and then invert. This is a very slow gate with all the associated capacitance but it does cover 4 bits in one go with 2 logic delays, slooow + fast. So an 8 bit adder delay would look like a 1 bit adder with 4 extra carry gates in delay. Thats about the limit of the clock cycle.

This Z80 used 2 faster gates per bit, so 4 bit cells will add up to 8 gate delays on top of the basic adder cell. Thats about the limit of the clock cycle.

In another design, some designers paired off odd/even slices to achieve 2 bit carry in 2 gates of logic, allowing 8 bit design with 8 gate delay carry.

Its a swings vs roundabout issue. By the time the Z80 was built, the trend was moving away from the ripple pass dynamic gates to full static logic, and in the Z8000 the entire 16 bit path had a full custom design with full static carry look ahead asymmetric logic, all done in about 8-12 fast gates iirc, no precharging logic needed.

The 68000 and 8086 retained the ripple pass gate but enhanced it by using boot strapping on a floating pass gate with a precharge. In these schemes a clock would be rammed through 8 pass gates in short order and then buffered for the next 8 bit block. Boot strapping allowed 16 bit addition in a reasonable time but required a clock for precharge.

So in the dynamic style, 1 clock precharges and the next conditionally discharges or computes the gate values. In the static style, both clock phases can do useful work.

Even the Pentium 4 a 32b processor uses a double pumped 16 bit ALU too.