Linked by Thom Holwerda on Sat 7th Sep 2013 09:54 UTC
Hardware, Embedded Systems

The 8-bit Z-80 processor is famed for use in many early personal computers such the Osborne 1, TRS-80, and Sinclair ZX Spectrum, and it is still used in embedded systems and TI graphing calculators. I had always assumed that the ALU (arithmetic-logic unit) in the Z-80 was 8 bits wide, like just about every other 8-bit processor. But while reverse-engineering the Z-80, I was shocked to discover the ALU is only 4 bits wide! The founders of Zilog mentioned the 4-bit ALU in a very interesting discussion at the Computer History Museum, so it's not exactly a secret, but it's not well-known either.

I have been reverse-engineering the Z-80 processor using images from the Visual 6502 team. The image below shows the overall structure of the Z-80 chip and the location of the ALU. The remainder of this article dives into the details of the ALU: its architecture, how it works, and exactly how it is implemented.

Ken Shirrif's blog is an absolute must for fans of ultra-low-level hardware stuff. This goes way over my head, but interesting nonetheless.

Order by: Score:
4 Bit ALU's
by shotsman on Sat 7th Sep 2013 11:54 UTC
shotsman
Member since:
2005-07-22

Back in the day were pretty common.
The first MicroProcessor I got my hands on was an IMP-16p (National Semiconductor). This was made up of 4 x 4bit ALU's chained together. I found this very strange at the time as I was coming from a PDP-11/40.

LAter on, I got involved with developing interfaces for DEC. Our CPU of choice for many of these was the 2901. This was another 4 bit slice device. On the TSU05 Tape controller, we had 4 of these connected together giving us basically, a 16-bit word CPU.

It does not surprise me that the Z80 has a 4bit ALU one little bit.

Reply Score: 5

This Might Explain A Few Things
by Pro-Competition on Sat 7th Sep 2013 18:36 UTC in reply to "4 Bit ALU's"
Pro-Competition Member since:
2007-08-20

This isn't talking about chained 4-bit units - it's a single 4-bit unit, where the slices are processed serially, not in parallel.

Actually, this probably helps to explain the high number of clock cycles the Z-80 used to perform operations. I'm not sure if this was a good design trade-off or not.

(I always admired the rich instruction set of the Z-80 in comparison to the 6502 I was working with on the VIC-20 / C-64. But then I looked at the timing data and wasn't quite so jealous. ;^) )

P.S. I remember reading about those bit-slice processors in data books as kid. (Yes, I was a nerdy kid.) I think that was an elegant solution at the time.

Reply Score: 6

moondevil Member since:
2005-07-08

We were all nerdy back then.

I got my ZX Spectrum compatible (Timex 2068) at the age of 10 and started coding around the age of 12.

Reply Score: 3

shotsman Member since:
2005-07-22

I did say that 4bit ALU's were common in those days.
The IC technology available then was by today's standards pretty crude so making 8,12 or even 16 bit ALU's was for a while impossible.
I went on to give some examples of some other uses for them
The 2901 could be used on its own. It didn't have to be used with others.
A lot of engineers quickly realised that 4bits was very limiting especially as many of the other CPU's(non microprocessor) around in those days had far longer word lengths. could this be why we chained 4bit devices together?
Intel realised this as well. How long did the 4004 last before they came out with the 8008?

Reply Score: 4

puenktchen Member since:
2007-07-27

This isn't talking about chained 4-bit units - it's a single 4-bit unit, where the slices are processed serially, not in parallel.


Doesn't that make the Z80 count as a 4-bit CPU?

Reply Score: 3

transputer_guy Member since:
2005-07-08

No, plenty of cpus have used serial computation and had reasonable performance, TI 9900 did 16 bits in 18 clocks or so, it allowed the clock to run faster to make up for it. The architecture defines the width of a processor, not the internal design. And the Pentium 4 also used a 16 bit ALU double pumped, still a 32b processor.

Reply Score: 3

puenktchen Member since:
2007-07-27

The architecture defines the width of a processor, not the internal design.


I alway thought the internal design was part of the architecture. And that the meaning of 8/16/32/64 bitness changed over the years with a little help by marketing. So the width of the processor is defined by the instruction set, not by the data path or registers or .. ?

Reply Score: 3

Drumhellar Member since:
2005-07-12

It's defined by the instruction set, and not the actual implementation

Z80 is an 8-bit architecture because you add two 8-bit registers to get a result. The fact that behind the scenes, it's breaking it down into multiple 4-bit adds is inconsequential. They could change it at a later point to give it a true 8-bit ALU and nobody would know the difference.

Same about data buses. The 8086 had a 16-bit system bus, while the 8088 had an 8-bit bus. This wouldn't make the 8088 an 8-bit chip, since the you were doing 16-bit math in 16-bit registers.

But, there was time when it was reasonable to assume that 8-bit chips had 8-bit buses and 16-bit chips had 16-bit buses, but as time progressed the ISA became further and further divorced from the actual implementation.

Reply Score: 4

bartgrantham Member since:
2011-12-31

Just playing devil's advocate, but the Z-80 could also do 16-bit adds and substracts the the HL/BC/DE register pairs. Wouldn't that make it a 16-bit CPU by your definition (width of operands)?

Reply Score: 5

Alfman Member since:
2011-01-28

bartgrantham,

I agree. Consider that one could hypothetically implement/emulate a 64bit ISA CPU on top of 16bit components, but personally I still think it makes sense to call it a 16bit CPU if it can only ever physically handle 16 bits concurrently.

Admittedly though it's rather ambiguous when different components (and operations) have different bit widths (register bits/cpu&cache bus width/memory&device bus width/alu/fpu/...). Maybe in such circumstances it makes the most sense to call the Z80 a 4/8bit hybrid rather than either 4bit or 8bit.

Edited 2013-09-09 17:34 UTC

Reply Score: 3

Drumhellar Member since:
2005-07-12

Perhaps, but considering how limited using register pairs is compared to the rest of the architecture, I'd still say it's 8-bit. I mean, I wouldn't call the Pentium MMX a 64-bit chip simply because it can do 64-bit integer math - the conditions imposed to adding large numbers is quite extensive.
Also, if you look at the bitwise or logical operators, they are only capable of operating on one register at a time, with the exception of the HL pair.

Of course, I don't have any actual experience programming a Z-80, but everybody calls it 8-bit, and the instructions listed at http://bit.ly/14z9vLR show it's almost pure 8-bit instructions, with a couple special 16-bit instructions.

I do know the Nintendo Gameboy used the a variation of the Z-80, and that was considered an 8-bit system by Nintendo.

(According to Wikipedia, the chip in the Gameboy was somewhere between the 8080 and the Z-80, with none of the extra registers of the Z-80, but many of the extra instructions. It's not pure Z-80, but most sources I've seen consider it one)

Reply Score: 4

DeepThought Member since:
2010-07-17

IMHO neither the bus width nor the ALU was used to "define" the bitness of a CPU. It was the register width.

The 68k was seen as a 32bit CPU, but then people claim it a 32/16bit CPU because the data-bus with was only 16bit (or even 8 on a 68008 (Sinclair QL)).

But today this definition is also not that easy to use. For example, the e200 PowerPC cores have 64bit registers, but only for SIMD (SPE called by Freescale). So no real 64bit add possible. So it is a 32bit CPU.

So IHMO, today, the bitness is defined by the width of general purpose registers.

Reply Score: 2

ElCabri2 Member since:
2009-03-11

It's a fuzzy definition really. Bit-ness of technology can refer to the size of the address bus, the size of the word operated on by the arithmetic instructions, or actual architectural details ... When all this was fluctuating fast from generation to generation in the 1980s and 1990s, there were some creative labeling for marketing reasons. That's why we had for example a "16/32 bits" processors like the Motorola 68000, or why some claims of "128 bits" video game systems have emerged in the late 90s. Mainframe and HPC architectures were even more exotic. Let's mention also the "Saturn" architecture of HP's high-end calculators, which had 64bit registers, with sub-fields of various length aligned on 4-bit boundaries, 4-bit addressable RAM ("nybles") and 20-bits (four nybles) addresses...

Reply Score: 1

Uh
by peteo on Sat 7th Sep 2013 14:32 UTC
peteo
Member since:
2011-10-05

"This goes way over my head..." and yet you claim the blog is a must.

It's actually pretty badly written.

Reply Score: 1

RE: Uh
by kens on Sat 7th Sep 2013 19:19 UTC in reply to "Uh"
kens Member since:
2013-09-07

Wow, tough crowd here. Anything specific you'd like improved in the article?

Reply Score: 7

RE[2]: Uh
by viton on Sat 7th Sep 2013 20:08 UTC in reply to "RE: Uh"
viton Member since:
2005-08-09

Great article, Ken. But I'm just a programmer, not a literary critic :-)
I did a lot of Z80 coding and this discovery is pretty exciting for me.

Reply Score: 3

RE[2]: Uh
by kokara4a on Sun 8th Sep 2013 07:11 UTC in reply to "RE: Uh"
kokara4a Member since:
2005-09-16

Well, I would have liked to know why they did it like that. It seems to me that the additional logic is more than what would have been needed for a full 8-but ALU. Maybe I'm wrong, but the drawback is quite significant - you get half the performance. Granted, the Z80 was running on frequencies quite a bit higher than the original 6052 but it seems the 4-bit ALU eats most of that.

Interesting article though - I like reading about such things. Never did any assembly programming on the Z80. Never had access to any. In the early 80s in Bulgaria there were mostly locally produced Apple II compatibles. We did make one - Правец-8М - which incorporates a Z80 extension card on the mainboard. I'm not familiar of anyone else doing that. But these were rare.

Reply Score: 3

RE[3]: Uh
by xdev on Sun 8th Sep 2013 12:40 UTC in reply to "RE[2]: Uh"
xdev Member since:
2005-11-11

Well, I would have liked to know why they did it like that. It seems to me that the additional logic is more than what would have been needed for a full 8-but ALU. ...Granted, the Z80 was running on frequencies quite a bit higher than the original 6052 but it seems the 4-bit ALU eats most of that.


Wildly guessing, I would assume that carry propagation is the critical timing path in an otherwise simple CPU. That would mean for the possible clock rate of the remaining chip: Either use half clock rate for everything, or use full clock rate, where only a half width ALU runs for 2 clocks while the remaining chip runs at full speed.

Carry lookahead could help here, but there might be reasons not to use it (patents?).

It is interesting that the P4 has a "double pumped" ALU, too, but IIRC that rus at double chip speed.

Reply Score: 3

RE[3]: Uh
by viton on Mon 9th Sep 2013 20:21 UTC in reply to "RE[2]: Uh"
viton Member since:
2005-08-09

Maybe I'm wrong, but the drawback is quite significant - you get half the performance.

Minimal Z80 instruction execution time is 4 clock cycles = just a "M1 cycle" (basically opcode fetch time)

Reply Score: 4

RE[3]: Uh
by Snial on Tue 10th Sep 2013 08:51 UTC in reply to "RE[2]: Uh"
Snial Member since:
2011-12-30

The 4-bit ALU doesn't halve the performance of a Z80, because it's somewhat pipelined as Masatoshi Shima explains in the Z80 Oral History.

The high clocks/instruction occurs because of the Z80's bus logic: a minimal instruction fetch requires 4 cycles: 2 to fetch the op-code itself [Address setup, then Read data] and another 2 for DRAM refresh (while the instruction is executed).

Compare it with the 8080, which took 5 cycles to execute an 8-bit ALU operation and it had a full 8-bit ALU. Or compare it with the RCA 1802 (12 clocks/instruction); the Nat Semi SC/MP (7-20+ clocks/ instruction). The Z80's designers did pretty well for that era.

Also, the Z80 didn't have control logic as simple and direct as a 6502, which made the 6502's instruction execution more efficient. But that didn't mean the 6502's arithmetic was always faster*; a 16-bit Zero-page Add on a 6502 would take 20 cycles and 14 bytes (CLC;LDA;ADC;STA;LDA;ADC;STA) vs a Z80's 11 cycles / 1 byte (add hl,rr).

-cheers.

[*not to trash the 6502, it's an amazing 8-bit CPU in many respects]

Reply Score: 2

RE[4]: Uh
by Alfman on Tue 10th Sep 2013 13:23 UTC in reply to "RE[3]: Uh"
Alfman Member since:
2011-01-28

Snial,

"Also, the Z80 didn't have control logic as simple and direct as a 6502, which made the 6502's instruction execution more efficient. But that didn't mean the 6502's arithmetic was always faster*; a 16-bit Zero-page Add on a 6502 would take 20 cycles and 14 bytes (CLC;LDA;ADC;STA;LDA;ADC;STA) vs a Z80's 11 cycles / 1 byte (add hl,rr)."

It's the age old debate between RISC/CISC. From the looks of it here, the 6502 was a bit too RISC for such a basic operation, making simple operations cost more than they should. On the other hand Z80 (and x86 afterwards) were too CISC with regards to multiple instruction size encoding and memory addressing which demanded more complexity in the decode stage. Getting an optimal combination needs compromises between the two.

Edited 2013-09-10 13:32 UTC

Reply Score: 2

RE: Uh
by ferrels on Sun 8th Sep 2013 17:32 UTC in reply to "Uh"
ferrels Member since:
2006-08-15

What did you expect? It's a blog for goodness sakes, not a novel or hardback book that you purchased at the bookstore for $50. Technically speaking it's a darn good blog. If you want perfection go buy a James Joyce novel.

Edited 2013-09-08 17:34 UTC

Reply Score: 4

RE[2]: Uh
by henderson101 on Mon 9th Sep 2013 11:51 UTC in reply to "RE: Uh"
henderson101 Member since:
2006-05-30

Ah yes, there's a great section on the 6502 instruction set tucked in to the middle of the Dubliners, and totally I forgot about the Z80 primer in the last chapter of Ulysses! Thanks for your extremely helpful comment!

Reply Score: 4

Explanation
by transputer_guy on Sun 8th Sep 2013 18:54 UTC
transputer_guy
Member since:
2005-07-08

I reverse engineered a dozen of these 1979 NMOS processors to learn about how the ALUs, register files, ROMs, RAMs, PLAs, clocks, back bias circuits worked. I had forgotten how the Z80 worked though, but from the circuit given here there is a good reason why it was done this way.

In the 6800, 8080, 6502 the carry path used was usually a pass gate with a minimal delay per bit. 4 carry cells in series would look like a distributed 4 bit tree like nand gate with extra devices to steer 1s & 0s or to bypass to the output and then invert. This is a very slow gate with all the associated capacitance but it does cover 4 bits in one go with 2 logic delays, slooow + fast. So an 8 bit adder delay would look like a 1 bit adder with 4 extra carry gates in delay. Thats about the limit of the clock cycle.

This Z80 used 2 faster gates per bit, so 4 bit cells will add up to 8 gate delays on top of the basic adder cell. Thats about the limit of the clock cycle.

In another design, some designers paired off odd/even slices to achieve 2 bit carry in 2 gates of logic, allowing 8 bit design with 8 gate delay carry.

Its a swings vs roundabout issue. By the time the Z80 was built, the trend was moving away from the ripple pass dynamic gates to full static logic, and in the Z8000 the entire 16 bit path had a full custom design with full static carry look ahead asymmetric logic, all done in about 8-12 fast gates iirc, no precharging logic needed.

The 68000 and 8086 retained the ripple pass gate but enhanced it by using boot strapping on a floating pass gate with a precharge. In these schemes a clock would be rammed through 8 pass gates in short order and then buffered for the next 8 bit block. Boot strapping allowed 16 bit addition in a reasonable time but required a clock for precharge.

So in the dynamic style, 1 clock precharges and the next conditionally discharges or computes the gate values. In the static style, both clock phases can do useful work.

Even the Pentium 4 a 32b processor uses a double pumped 16 bit ALU too.

Reply Score: 4