Did anyone ever use the extra set of registers on the Z80?

Clash Royale CLAN TAG#URR8PPP
up vote
25
down vote
favorite
The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling, though I think if I were programming a Z80 retrocomputer, I would be more likely to use them for fast access to global variables.
Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs. Back in the day, I was on 6502 machines, so I never had occasion to write anything nontrivial on the Z80.
Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?
z80
 |Â
show 4 more comments
up vote
25
down vote
favorite
The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling, though I think if I were programming a Z80 retrocomputer, I would be more likely to use them for fast access to global variables.
Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs. Back in the day, I was on 6502 machines, so I never had occasion to write anything nontrivial on the Z80.
Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?
z80
2
I could see the alternative registers being used for fast context switching, but I'm not sure how efficient they would act as fast global variables. To use them, you would have to swap register sets, (BC, DE, HL with their prime counterpart (AF could also be swapped with a different instruction)). Then you would have to preserve a copy of that data, perhaps onto the stack or into an index register, then swap the sets back. It would probably be quicker just to grab the variable directly from memory.
â RichF
Sep 30 at 20:17
4
On TI-83 series calculators, the OS uses the commands for their intended purpose in the system interrupt code.
â Misha Lavrov
Sep 30 at 23:55
3
I don't know for what purpose but the operating system of the Sinclair ZX81 and the one of the Applied Technologies Microbee 16 used these register banks.
â Martin Rosenau
Oct 1 at 6:16
3
I used it all the time especially for graphics when I needed more register variables. But yes I used it sometimes also for comfortable ISR handling. IIRC some on screen monitors/debuggers used it to avoid stack usage (they where inside SCREEN VRAM and not changing rest of RAM). When I switch to x86 I was missing the extra register file a lot
â Spektre
Oct 1 at 8:38
2
It's been decades since I wrote code for Z80s (way back in the days of early STD bus). It seems to me that we'd switch registers as the first and last instructions in interrupt service routines. But, my memory from the early eighties is foggy.
â Flydog57
Oct 1 at 22:38
 |Â
show 4 more comments
up vote
25
down vote
favorite
up vote
25
down vote
favorite
The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling, though I think if I were programming a Z80 retrocomputer, I would be more likely to use them for fast access to global variables.
Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs. Back in the day, I was on 6502 machines, so I never had occasion to write anything nontrivial on the Z80.
Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?
z80
The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling, though I think if I were programming a Z80 retrocomputer, I would be more likely to use them for fast access to global variables.
Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs. Back in the day, I was on 6502 machines, so I never had occasion to write anything nontrivial on the Z80.
Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?
z80
z80
asked Sep 30 at 19:33
rwallace
7,370233101
7,370233101
2
I could see the alternative registers being used for fast context switching, but I'm not sure how efficient they would act as fast global variables. To use them, you would have to swap register sets, (BC, DE, HL with their prime counterpart (AF could also be swapped with a different instruction)). Then you would have to preserve a copy of that data, perhaps onto the stack or into an index register, then swap the sets back. It would probably be quicker just to grab the variable directly from memory.
â RichF
Sep 30 at 20:17
4
On TI-83 series calculators, the OS uses the commands for their intended purpose in the system interrupt code.
â Misha Lavrov
Sep 30 at 23:55
3
I don't know for what purpose but the operating system of the Sinclair ZX81 and the one of the Applied Technologies Microbee 16 used these register banks.
â Martin Rosenau
Oct 1 at 6:16
3
I used it all the time especially for graphics when I needed more register variables. But yes I used it sometimes also for comfortable ISR handling. IIRC some on screen monitors/debuggers used it to avoid stack usage (they where inside SCREEN VRAM and not changing rest of RAM). When I switch to x86 I was missing the extra register file a lot
â Spektre
Oct 1 at 8:38
2
It's been decades since I wrote code for Z80s (way back in the days of early STD bus). It seems to me that we'd switch registers as the first and last instructions in interrupt service routines. But, my memory from the early eighties is foggy.
â Flydog57
Oct 1 at 22:38
 |Â
show 4 more comments
2
I could see the alternative registers being used for fast context switching, but I'm not sure how efficient they would act as fast global variables. To use them, you would have to swap register sets, (BC, DE, HL with their prime counterpart (AF could also be swapped with a different instruction)). Then you would have to preserve a copy of that data, perhaps onto the stack or into an index register, then swap the sets back. It would probably be quicker just to grab the variable directly from memory.
â RichF
Sep 30 at 20:17
4
On TI-83 series calculators, the OS uses the commands for their intended purpose in the system interrupt code.
â Misha Lavrov
Sep 30 at 23:55
3
I don't know for what purpose but the operating system of the Sinclair ZX81 and the one of the Applied Technologies Microbee 16 used these register banks.
â Martin Rosenau
Oct 1 at 6:16
3
I used it all the time especially for graphics when I needed more register variables. But yes I used it sometimes also for comfortable ISR handling. IIRC some on screen monitors/debuggers used it to avoid stack usage (they where inside SCREEN VRAM and not changing rest of RAM). When I switch to x86 I was missing the extra register file a lot
â Spektre
Oct 1 at 8:38
2
It's been decades since I wrote code for Z80s (way back in the days of early STD bus). It seems to me that we'd switch registers as the first and last instructions in interrupt service routines. But, my memory from the early eighties is foggy.
â Flydog57
Oct 1 at 22:38
2
2
I could see the alternative registers being used for fast context switching, but I'm not sure how efficient they would act as fast global variables. To use them, you would have to swap register sets, (BC, DE, HL with their prime counterpart (AF could also be swapped with a different instruction)). Then you would have to preserve a copy of that data, perhaps onto the stack or into an index register, then swap the sets back. It would probably be quicker just to grab the variable directly from memory.
â RichF
Sep 30 at 20:17
I could see the alternative registers being used for fast context switching, but I'm not sure how efficient they would act as fast global variables. To use them, you would have to swap register sets, (BC, DE, HL with their prime counterpart (AF could also be swapped with a different instruction)). Then you would have to preserve a copy of that data, perhaps onto the stack or into an index register, then swap the sets back. It would probably be quicker just to grab the variable directly from memory.
â RichF
Sep 30 at 20:17
4
4
On TI-83 series calculators, the OS uses the commands for their intended purpose in the system interrupt code.
â Misha Lavrov
Sep 30 at 23:55
On TI-83 series calculators, the OS uses the commands for their intended purpose in the system interrupt code.
â Misha Lavrov
Sep 30 at 23:55
3
3
I don't know for what purpose but the operating system of the Sinclair ZX81 and the one of the Applied Technologies Microbee 16 used these register banks.
â Martin Rosenau
Oct 1 at 6:16
I don't know for what purpose but the operating system of the Sinclair ZX81 and the one of the Applied Technologies Microbee 16 used these register banks.
â Martin Rosenau
Oct 1 at 6:16
3
3
I used it all the time especially for graphics when I needed more register variables. But yes I used it sometimes also for comfortable ISR handling. IIRC some on screen monitors/debuggers used it to avoid stack usage (they where inside SCREEN VRAM and not changing rest of RAM). When I switch to x86 I was missing the extra register file a lot
â Spektre
Oct 1 at 8:38
I used it all the time especially for graphics when I needed more register variables. But yes I used it sometimes also for comfortable ISR handling. IIRC some on screen monitors/debuggers used it to avoid stack usage (they where inside SCREEN VRAM and not changing rest of RAM). When I switch to x86 I was missing the extra register file a lot
â Spektre
Oct 1 at 8:38
2
2
It's been decades since I wrote code for Z80s (way back in the days of early STD bus). It seems to me that we'd switch registers as the first and last instructions in interrupt service routines. But, my memory from the early eighties is foggy.
â Flydog57
Oct 1 at 22:38
It's been decades since I wrote code for Z80s (way back in the days of early STD bus). It seems to me that we'd switch registers as the first and last instructions in interrupt service routines. But, my memory from the early eighties is foggy.
â Flydog57
Oct 1 at 22:38
 |Â
show 4 more comments
10 Answers
10
active
oldest
votes
up vote
30
down vote
accepted
The key to efficient programming on Z80 is to use registers as much as possible. I can easily believe that designers of Z80 intended the use of the alternative set of registers as an efficient way of context switching. However, the context switching does not tend to happen often enough to use the alternative set of registers only for that; the gains are simply not worth it most of the time. Hence, the good practice of Z80 programming is typically about using as many registers as possible and still use stack for saving registers during the interrupts.
Now, let me give you several ideas on how one would benefit from having two sets of equivalent registers. A typical pixel scrolling for 16 byte wide bitmap can look e.g. as follows:
rl (hl) : dec l ; repeated 16 times
What if one needs to scroll by 2 pixels at a time?
rl (hl) : ex af,af' : rl (hl) : ex af,af' : dec l ; repeated 16 times
is the fastest way. OK, this is only using the second accumulator. Let us consider fast copying. The obvious
ld a,(hl) : ld (de),a : inc hl : inc de ; 26 t-states
which is actually very slow. Unrolled
ldi ; 16 t-states
is better and, in fact, is often acceptably fast. However, the fastest copiers are based on (semi-)unrolled code loading and saving the data via the stack, e.g. as follows:
ld sp,.. : pop af : pop bc : pop de : pop hl
exx : ex af,af' : pop af : pop bc : pop de : pop hl
ld sp,.. : push hl : push de : push bc : push af
exx : ex af,af' : push hl : push de : push bc : push af
; 10+10*4 + 4*2+10*4 + 10+11*4 + 4*2+11*4 = 204 t-states per 16 bytes
i.e. 12.75 t-states per byte. And note that this is not esoteric; variations of this idea were used in a huge number of commercial games on ZX Spectrum.
Much non-trivial code, e.g. fast polygon fillers or texture mappers are only possible with decent speed if one uses both sets of registers simultaneously.
3
I bet they did not, because if they did, the command set would have been very different. But the same can be said about pretty much every popular architechture: once it becomes popular, people find innovative ways to exploit it. And given how the software library for, say, ZX Spectrum, is absolutely dominated by game titles, I'd never call graphics fiddling esoteric...
â introspec
Oct 1 at 13:30
4
Interrupts don't have to happen often for there to be value in rapidly switching to the interrupt context - it's the latency of response that motivates the quick switch there. Of course, if your interrupts can tolerate more time saving state, then the alternate registers are available for your programs, and may increase your throughput of work.
â Toby Speight
Oct 2 at 10:15
2
By my count, the code you give to scroll by one pixel takes 19 cycles per byte, and the code to scroll by two takes 42. Usingld a,(hl) / rla / ld (hl),a / inc lwould take 22, and if there were a way to adapt that to shift two bits at an extra cost of 16 cycles or fewer that could be a win, but the fact that swapping flags also swapsAwould mean the cheapest approach I can see would cost 20 extra cycles.
â supercat
Oct 2 at 18:29
1
@TobySpeight, my answer is likely to be coloured by my experience of coding small home computers, mostly ZX Spectrum-compatibles. Some of the most basic ones did not have any sources of interrupts; ZX Spectrum compatibles have a ~50Hz frame interrupt that does not really require any kind of response from the coder. I would actually love to learn about common Z80-based architectures where the response latency mattered, because I do not know a single example of this situation.
â introspec
Oct 2 at 21:53
1
@cat, I do fair amount of Z80 coding even nowadays and I find traditional form of assembly code (one command per line) incredibly diluted. Thus, I use assembler that allows multiple commands per line (colon is a command separator, just like you guessed). I find that putting related groups of commands together onto a single line increases the readability, because you use the screen space better and can also group commands by their intended action. I recognise that some people find it off-putting. I hope you can recognise that I find the more traditional format just as off-putting.
â introspec
Oct 2 at 21:59
 |Â
show 5 more comments
up vote
17
down vote
The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling,
Indeed they were intended for fast interrupt reaction. In a simple, general way, this saved the time to push the main process' registers onto the stack and restore them again. they spend single byte opcodes to do so to get the absolute minimum execution time - like the Z80 Technical Manual states on p.26:
OP code 08H allows the programmer to switch between the two pairs of
accumulator flag registers while D9H allows the programmer to switch between
the duplicate set of six general purpose registers. These OP codes are only one
byte in length to absolutely minimize the time necessary to perform the
exchange so that the duplicate banks can be used to effect very fast interrupt
response times.
EX and EXX only take 4 T-cycles, while even just pushing a simple 16-bit register would take 11 cycles plus another 15 to load it again. 8 T-cycles instead of 25 or more cycles is a considerably faster reaction, isn't it?
That's also why there are two EX* instruction, as very simple routines may only (use and) need to preserve the flags and A. This leaves the whole second set (except AF) for other purposes. Like being used in normal software, or for even more speedup in I/O.
After all, the second set can not only be used for some kind of fast 'stack' but also be prepared for a certain I/O operation. Think maybe of a serial interface receiving at high speed. Loading things like the memory pointer where received data is to be placed, the number of bytes to receive and so on, does take quite some time (16 T-Cycles for a 16 Bit pointer, 13 for a byte value) - and they need to be stored later on as well.
If these values are placed in the second register set before the high speed interrupt driven routine gets active, no loads and stores are to be executed. Interrupt service time gets reduced to the absolute minimum, not only causing less interruption of the main process but also working up to higher speeds.
After all, the Z80 design was mainly focused on a more flexible, configurable and faster interrupt handling.
though I think if I were programming a Z80 retro computer, I would be more likely to use them for fast access to global variables.
I can't see much gain here. Sure, 6 additional bytes or 3 pointers, but at the same time, you can't access the other ones. So there are not many cases where the secondary register set is helpful - besides interrupts and 'dead end' subroutines.
Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs.
Well, it's exactly the region where they are useful - to speed up small functions.
Back in the day, I was on 6502 machines, so I never had occasion to write anything non-trivial on the Z80.
Did both, and while they need different approaches, the result is usually quite similar.
Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?
It was quite common to use them either for interrupt (mostly in embedded systems) or 'dead end' routines.
So within a single task, the most likely place to use them would be in leaf subroutines, so you can have the use of a full set of registers without having to spend cycles saving and restoring those used by the rest of the program. Okay, that makes sense.
â rwallace
Sep 30 at 20:51
1
@rwallace yes, except there's till the issue of parameter passing.
â Raffzahn
Sep 30 at 21:20
1
I miss writing code like this, with execution times and cycles in mind. So much code written today ignores what was such a fundamental concept then.
â coteyr
Oct 1 at 7:35
Can you elaborate what you mean with "dead end" routines (last sentence)?
â tofro
Oct 1 at 7:53
By "dead-end" subroutines, do you mean what compiler-writers sometimes call "leaf-functions", i.e. functions which do not call anything else?
â Wilson
Oct 1 at 8:07
 |Â
show 1 more comment
up vote
13
down vote
For my Aquarius Micro-Expander I used the alternate register set to speed up text display.
The Mattel Aquarius has a character based screen that is physically 40 characters by 25 lines, but to stay out of the overscan area it only uses 38 characters by 24 lines. With such odd dimensions calculating the cursor address takes a lot of code, and combined with other overhead makes printing to the screen quite slow. I wanted to reduce the overhead as much as possible, and also have more flexible line width and positioning.
Here's my core code for putting a character on screen. HL' holds the current screen address and D' holds the cursor position on the line. Higher level routines test and manipulate these registers to wrap at line ends and scroll text in windows.
WinPrtChr:
EXX
LD (HL),A ; poke char into screen memory
INC HL ; HL' = next screen address
INC D ; D' = next screen x
EXX
RET
BackSpace:
EXX
LD (HL)," " ; poke SPACE into screen memory
DEC HL ; HL' = previous screen address
DEC D ; D' = previous screen x
EXX
RET
Using alternate registers was much faster than storing the variables in RAM, and kept the normal registers free for other uses. The result was fewer memory accesses and faster operation in general. My code prints text into windows of arbitrary size about 20 times faster than the system routines.
Unfortunately the Aquarius system ROM also uses alternate registers when scanning the keyboard, so I had to wrap the keyboard routines with a pair of EXX's (to get back to the normal register set) and save the affected registers on the stack when getting keyboard input. However I didn't have to worry about interrupts (where the alternate register set is often used for fast context saving) because the Aquarius doesn't have any!
add a comment |Â
up vote
11
down vote
A specific example is the chess program Sargon (written in 1978 or a bit earlier), which used them in a couple of leaf subroutines.
Search the assembler listing at http://smecers.appspot.com/govs/Oldies/Sargon.htm for the EXX instruction. (It's in routines XCHNG and NEXTAD)
The code is well documented, if anyone wants to explore the cost of alternative coding techniques.
Apart from Sargon, I'm pretty sure they were used in the code examples in some "how to write interrupt routines" books of the period, as a "quick" way to save all the registers, but I don't have any references or that.
add a comment |Â
up vote
11
down vote
The question asks if the alternate register set was "ever" used ... this answer is about a modern use of it.
The z88dk C compiler, sccz80[1] uses a calling convention where the registers B', C', D', E', H' and L' are used to hold the first 48-bit floating point argument to a function, or a floating point return value. Benchmark results suggest that this is a very good strategy, as the results are similar in performance to the 32-bit floating point used by most other Z80 C compilers.
[1] - actually, one of the two C compilers that are part of the z88dk toolchain, the other being a patched version of sdcc that uses 32-bit floats with a different calling convention
As you say one of the float libraries uses the exx set to hold the main floating point accumulator which is 48-bit in BCDEHL. Two floating point values can be held at the same time, one in the main set BCDEHL and one in the exx set BCDEHL'. This is quite fast and allows the float library to be re-entrant on top of that.
â aralbrec
17 hours ago
The fastest integer math (multiply and divide) also involves the exx set. You'll see it used in 32-,64-,72-bit math and combinations with lower bit sizes. In stdio, the exx set is used to hold tallies like number of characters output and counters for sending buffers one char at a time when a device can't do a block on its own. The main set is then used by the driver to communicate with the device. The exx set is used in more places than that but those are the important ones. It improves performance so much that it's not worthwhile to reserve for fast interrupt response.
â aralbrec
17 hours ago
add a comment |Â
up vote
9
down vote
As usual, the Z80 Oral History provides some insight into the motivation behind the alternate registers and exchange instructions:
[Faggin]
So I wanted to
have a couple of index registers, more 16-bit operations, a better
interrupt structure. The whole idea of doubling the number of
registers. And I could exchange the register with an exchange
instruction, the whole register set. That was an idea that I had used
already on the Intel 4040. So that one could serve it up very fast if
that was a necessity. And on and on.
[Shima]
Thirdly, in order to support the highspeed
task switching, in the beginning I asked to complete two sets of register files including the program
counter. But it was too complicated for customer. Then we gave up on [the idea] of the two sets of
general purpose registers.
...
And two instruction codes were used for the exchange, the set of
general purpose register, and exchanging the set of accumulator and
flags. Also those support the high-speed task switching.
(And while it may be obvious, it's worth pointing out that in the silicon implementation, the exchange instructions merely modify the state of the register file addressing logic.)
2
I wonder how much it would have cost to have separate prime-select bits for BC, DE, and HL, and have 16 or 32 DD-prefix opcodes xor those bits along with AF/AF' (and perhaps the DE/HL selection bit) with bits from the second byte. That could have made the secondary registers a lot more useful.
â supercat
Oct 2 at 18:43
add a comment |Â
up vote
8
down vote
Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?
This being one occasion when a personal experience answer will do, EXX is ideal for the very specific task of multiplying a 16-bit 2d vector by a scalar, which makes it helpful for 2d vector graphics, and the projection part of 3d vector graphics.
Specifically:
- use
Afor the multiplier â rotate right from it into carry; - use
BCandBC'for the working copy of the multiplicands; these will need shifting left on each iteration; - use
HLandHL'to accumulate the result; performADD HL, BCif carry is set after theRRA.
So the specific convenient observations are:
- you're juggling four 16-bit quantities, but they interact only in pairs;
- and using
EXXlets you use the 16-bit arithmetic that's right there on the main instruction page.
add a comment |Â
up vote
7
down vote
Following the spirit of personal examples, here's one from when I was in high school: I, like everyone else, used a Ti 84 calculator for my Math classes. Said calculator used a Z80.
Out of a mixture of boredom and curiosity, I wrote a program that printed "WOZ IS ALWAYS THE ANSWER" any time the user hit the enter key. The program didn't need to be running for this to happen, and it could happen anywhere in the menu system.
To do that, I used the chip's interrupts to check the input every 1/140th second. The thing is, each time I checked, I had to corrupt the A register. The input methods simply couldn't get around that. So, if I had just used A, the A register would be switching values for every other program run. That wouldn't work.
Instead, I switched into the shadow registers, did the input checking, switched back, and waited for the next call. It worked out pretty well, and did the job!
New contributor
Kyle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
up vote
6
down vote
Back in the early 90s I used the alternate registers in 3 different ways:
- For task switching in a dual task "operating system".
- In a RAM-less control application, temperature sensors and relay control. (208 bits in total was enough).
- In a floating point library. The latter was also done by several others (and better than my attempt), see for example http://www.andreadrian.de/oldcpu/Z80_number_cruncher.html
New contributor
Baard is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
up vote
4
down vote
Sorry to rain on any parades but, while it's been a long time since I did any serious Z-80 programming, I distinctly remember the alternate registers being one of the major broken features of the Z-80.
You see, they could be used to either:
- Speed up interrupt handling.
- Store alternate data in your main code.
The thing was you could not do both.
So if I was writing a device driver with an interrupt service routine, I could not use the alternate registers to save pushing and popping the main registers. I of course would need to avoid doing this in case the application programmer was using those registers.
And if I was writing application code, I could not use the alternate registers because an ISR could trample all over my data.
The overall result was that most programmers stayed away from the alternate register sets so that there code would run on more machines without crashing and burning. Any savings was swamped by the risk of shipping non-working code.
Safe code may be slower, but broken code does not run at all.
Of course if you have the luxury of controlling all aspects of the system and its coding, you were free to do as you please. I never had that luxury.
2
Since some of the tricks in the other answers depend on pointingspinto non-stack data structures, I think it's implicit that one would disable interrupts on the way into those tight inner loops.
â Henning Makholm
Oct 3 at 14:26
1
@HenningMakholm, funnily enough, some people came up with register use conventions that allowed them to keep interrupts enabled while using SP pointing at non-stack data. Of course, this was happening on micros, where programmers had the luxury of controlling every aspect of the system :)
â introspec
Oct 3 at 18:03
1
I suppose that the luxury in question is a nice way of saying "Having to do all the work yourself." Does not sound so good when you put it that way!
â Peter Camilleri
2 days ago
2
It's hardly fair to describe a feature as broken just because it doesn't fit your particular use case. No, you can't have your cake and eat it; the alternate registers will suit either of the purposes you described but not both together.
â user3570736
2 days ago
add a comment |Â
10 Answers
10
active
oldest
votes
10 Answers
10
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
30
down vote
accepted
The key to efficient programming on Z80 is to use registers as much as possible. I can easily believe that designers of Z80 intended the use of the alternative set of registers as an efficient way of context switching. However, the context switching does not tend to happen often enough to use the alternative set of registers only for that; the gains are simply not worth it most of the time. Hence, the good practice of Z80 programming is typically about using as many registers as possible and still use stack for saving registers during the interrupts.
Now, let me give you several ideas on how one would benefit from having two sets of equivalent registers. A typical pixel scrolling for 16 byte wide bitmap can look e.g. as follows:
rl (hl) : dec l ; repeated 16 times
What if one needs to scroll by 2 pixels at a time?
rl (hl) : ex af,af' : rl (hl) : ex af,af' : dec l ; repeated 16 times
is the fastest way. OK, this is only using the second accumulator. Let us consider fast copying. The obvious
ld a,(hl) : ld (de),a : inc hl : inc de ; 26 t-states
which is actually very slow. Unrolled
ldi ; 16 t-states
is better and, in fact, is often acceptably fast. However, the fastest copiers are based on (semi-)unrolled code loading and saving the data via the stack, e.g. as follows:
ld sp,.. : pop af : pop bc : pop de : pop hl
exx : ex af,af' : pop af : pop bc : pop de : pop hl
ld sp,.. : push hl : push de : push bc : push af
exx : ex af,af' : push hl : push de : push bc : push af
; 10+10*4 + 4*2+10*4 + 10+11*4 + 4*2+11*4 = 204 t-states per 16 bytes
i.e. 12.75 t-states per byte. And note that this is not esoteric; variations of this idea were used in a huge number of commercial games on ZX Spectrum.
Much non-trivial code, e.g. fast polygon fillers or texture mappers are only possible with decent speed if one uses both sets of registers simultaneously.
3
I bet they did not, because if they did, the command set would have been very different. But the same can be said about pretty much every popular architechture: once it becomes popular, people find innovative ways to exploit it. And given how the software library for, say, ZX Spectrum, is absolutely dominated by game titles, I'd never call graphics fiddling esoteric...
â introspec
Oct 1 at 13:30
4
Interrupts don't have to happen often for there to be value in rapidly switching to the interrupt context - it's the latency of response that motivates the quick switch there. Of course, if your interrupts can tolerate more time saving state, then the alternate registers are available for your programs, and may increase your throughput of work.
â Toby Speight
Oct 2 at 10:15
2
By my count, the code you give to scroll by one pixel takes 19 cycles per byte, and the code to scroll by two takes 42. Usingld a,(hl) / rla / ld (hl),a / inc lwould take 22, and if there were a way to adapt that to shift two bits at an extra cost of 16 cycles or fewer that could be a win, but the fact that swapping flags also swapsAwould mean the cheapest approach I can see would cost 20 extra cycles.
â supercat
Oct 2 at 18:29
1
@TobySpeight, my answer is likely to be coloured by my experience of coding small home computers, mostly ZX Spectrum-compatibles. Some of the most basic ones did not have any sources of interrupts; ZX Spectrum compatibles have a ~50Hz frame interrupt that does not really require any kind of response from the coder. I would actually love to learn about common Z80-based architectures where the response latency mattered, because I do not know a single example of this situation.
â introspec
Oct 2 at 21:53
1
@cat, I do fair amount of Z80 coding even nowadays and I find traditional form of assembly code (one command per line) incredibly diluted. Thus, I use assembler that allows multiple commands per line (colon is a command separator, just like you guessed). I find that putting related groups of commands together onto a single line increases the readability, because you use the screen space better and can also group commands by their intended action. I recognise that some people find it off-putting. I hope you can recognise that I find the more traditional format just as off-putting.
â introspec
Oct 2 at 21:59
 |Â
show 5 more comments
up vote
30
down vote
accepted
The key to efficient programming on Z80 is to use registers as much as possible. I can easily believe that designers of Z80 intended the use of the alternative set of registers as an efficient way of context switching. However, the context switching does not tend to happen often enough to use the alternative set of registers only for that; the gains are simply not worth it most of the time. Hence, the good practice of Z80 programming is typically about using as many registers as possible and still use stack for saving registers during the interrupts.
Now, let me give you several ideas on how one would benefit from having two sets of equivalent registers. A typical pixel scrolling for 16 byte wide bitmap can look e.g. as follows:
rl (hl) : dec l ; repeated 16 times
What if one needs to scroll by 2 pixels at a time?
rl (hl) : ex af,af' : rl (hl) : ex af,af' : dec l ; repeated 16 times
is the fastest way. OK, this is only using the second accumulator. Let us consider fast copying. The obvious
ld a,(hl) : ld (de),a : inc hl : inc de ; 26 t-states
which is actually very slow. Unrolled
ldi ; 16 t-states
is better and, in fact, is often acceptably fast. However, the fastest copiers are based on (semi-)unrolled code loading and saving the data via the stack, e.g. as follows:
ld sp,.. : pop af : pop bc : pop de : pop hl
exx : ex af,af' : pop af : pop bc : pop de : pop hl
ld sp,.. : push hl : push de : push bc : push af
exx : ex af,af' : push hl : push de : push bc : push af
; 10+10*4 + 4*2+10*4 + 10+11*4 + 4*2+11*4 = 204 t-states per 16 bytes
i.e. 12.75 t-states per byte. And note that this is not esoteric; variations of this idea were used in a huge number of commercial games on ZX Spectrum.
Much non-trivial code, e.g. fast polygon fillers or texture mappers are only possible with decent speed if one uses both sets of registers simultaneously.
3
I bet they did not, because if they did, the command set would have been very different. But the same can be said about pretty much every popular architechture: once it becomes popular, people find innovative ways to exploit it. And given how the software library for, say, ZX Spectrum, is absolutely dominated by game titles, I'd never call graphics fiddling esoteric...
â introspec
Oct 1 at 13:30
4
Interrupts don't have to happen often for there to be value in rapidly switching to the interrupt context - it's the latency of response that motivates the quick switch there. Of course, if your interrupts can tolerate more time saving state, then the alternate registers are available for your programs, and may increase your throughput of work.
â Toby Speight
Oct 2 at 10:15
2
By my count, the code you give to scroll by one pixel takes 19 cycles per byte, and the code to scroll by two takes 42. Usingld a,(hl) / rla / ld (hl),a / inc lwould take 22, and if there were a way to adapt that to shift two bits at an extra cost of 16 cycles or fewer that could be a win, but the fact that swapping flags also swapsAwould mean the cheapest approach I can see would cost 20 extra cycles.
â supercat
Oct 2 at 18:29
1
@TobySpeight, my answer is likely to be coloured by my experience of coding small home computers, mostly ZX Spectrum-compatibles. Some of the most basic ones did not have any sources of interrupts; ZX Spectrum compatibles have a ~50Hz frame interrupt that does not really require any kind of response from the coder. I would actually love to learn about common Z80-based architectures where the response latency mattered, because I do not know a single example of this situation.
â introspec
Oct 2 at 21:53
1
@cat, I do fair amount of Z80 coding even nowadays and I find traditional form of assembly code (one command per line) incredibly diluted. Thus, I use assembler that allows multiple commands per line (colon is a command separator, just like you guessed). I find that putting related groups of commands together onto a single line increases the readability, because you use the screen space better and can also group commands by their intended action. I recognise that some people find it off-putting. I hope you can recognise that I find the more traditional format just as off-putting.
â introspec
Oct 2 at 21:59
 |Â
show 5 more comments
up vote
30
down vote
accepted
up vote
30
down vote
accepted
The key to efficient programming on Z80 is to use registers as much as possible. I can easily believe that designers of Z80 intended the use of the alternative set of registers as an efficient way of context switching. However, the context switching does not tend to happen often enough to use the alternative set of registers only for that; the gains are simply not worth it most of the time. Hence, the good practice of Z80 programming is typically about using as many registers as possible and still use stack for saving registers during the interrupts.
Now, let me give you several ideas on how one would benefit from having two sets of equivalent registers. A typical pixel scrolling for 16 byte wide bitmap can look e.g. as follows:
rl (hl) : dec l ; repeated 16 times
What if one needs to scroll by 2 pixels at a time?
rl (hl) : ex af,af' : rl (hl) : ex af,af' : dec l ; repeated 16 times
is the fastest way. OK, this is only using the second accumulator. Let us consider fast copying. The obvious
ld a,(hl) : ld (de),a : inc hl : inc de ; 26 t-states
which is actually very slow. Unrolled
ldi ; 16 t-states
is better and, in fact, is often acceptably fast. However, the fastest copiers are based on (semi-)unrolled code loading and saving the data via the stack, e.g. as follows:
ld sp,.. : pop af : pop bc : pop de : pop hl
exx : ex af,af' : pop af : pop bc : pop de : pop hl
ld sp,.. : push hl : push de : push bc : push af
exx : ex af,af' : push hl : push de : push bc : push af
; 10+10*4 + 4*2+10*4 + 10+11*4 + 4*2+11*4 = 204 t-states per 16 bytes
i.e. 12.75 t-states per byte. And note that this is not esoteric; variations of this idea were used in a huge number of commercial games on ZX Spectrum.
Much non-trivial code, e.g. fast polygon fillers or texture mappers are only possible with decent speed if one uses both sets of registers simultaneously.
The key to efficient programming on Z80 is to use registers as much as possible. I can easily believe that designers of Z80 intended the use of the alternative set of registers as an efficient way of context switching. However, the context switching does not tend to happen often enough to use the alternative set of registers only for that; the gains are simply not worth it most of the time. Hence, the good practice of Z80 programming is typically about using as many registers as possible and still use stack for saving registers during the interrupts.
Now, let me give you several ideas on how one would benefit from having two sets of equivalent registers. A typical pixel scrolling for 16 byte wide bitmap can look e.g. as follows:
rl (hl) : dec l ; repeated 16 times
What if one needs to scroll by 2 pixels at a time?
rl (hl) : ex af,af' : rl (hl) : ex af,af' : dec l ; repeated 16 times
is the fastest way. OK, this is only using the second accumulator. Let us consider fast copying. The obvious
ld a,(hl) : ld (de),a : inc hl : inc de ; 26 t-states
which is actually very slow. Unrolled
ldi ; 16 t-states
is better and, in fact, is often acceptably fast. However, the fastest copiers are based on (semi-)unrolled code loading and saving the data via the stack, e.g. as follows:
ld sp,.. : pop af : pop bc : pop de : pop hl
exx : ex af,af' : pop af : pop bc : pop de : pop hl
ld sp,.. : push hl : push de : push bc : push af
exx : ex af,af' : push hl : push de : push bc : push af
; 10+10*4 + 4*2+10*4 + 10+11*4 + 4*2+11*4 = 204 t-states per 16 bytes
i.e. 12.75 t-states per byte. And note that this is not esoteric; variations of this idea were used in a huge number of commercial games on ZX Spectrum.
Much non-trivial code, e.g. fast polygon fillers or texture mappers are only possible with decent speed if one uses both sets of registers simultaneously.
answered Sep 30 at 21:26
introspec
1,6711613
1,6711613
3
I bet they did not, because if they did, the command set would have been very different. But the same can be said about pretty much every popular architechture: once it becomes popular, people find innovative ways to exploit it. And given how the software library for, say, ZX Spectrum, is absolutely dominated by game titles, I'd never call graphics fiddling esoteric...
â introspec
Oct 1 at 13:30
4
Interrupts don't have to happen often for there to be value in rapidly switching to the interrupt context - it's the latency of response that motivates the quick switch there. Of course, if your interrupts can tolerate more time saving state, then the alternate registers are available for your programs, and may increase your throughput of work.
â Toby Speight
Oct 2 at 10:15
2
By my count, the code you give to scroll by one pixel takes 19 cycles per byte, and the code to scroll by two takes 42. Usingld a,(hl) / rla / ld (hl),a / inc lwould take 22, and if there were a way to adapt that to shift two bits at an extra cost of 16 cycles or fewer that could be a win, but the fact that swapping flags also swapsAwould mean the cheapest approach I can see would cost 20 extra cycles.
â supercat
Oct 2 at 18:29
1
@TobySpeight, my answer is likely to be coloured by my experience of coding small home computers, mostly ZX Spectrum-compatibles. Some of the most basic ones did not have any sources of interrupts; ZX Spectrum compatibles have a ~50Hz frame interrupt that does not really require any kind of response from the coder. I would actually love to learn about common Z80-based architectures where the response latency mattered, because I do not know a single example of this situation.
â introspec
Oct 2 at 21:53
1
@cat, I do fair amount of Z80 coding even nowadays and I find traditional form of assembly code (one command per line) incredibly diluted. Thus, I use assembler that allows multiple commands per line (colon is a command separator, just like you guessed). I find that putting related groups of commands together onto a single line increases the readability, because you use the screen space better and can also group commands by their intended action. I recognise that some people find it off-putting. I hope you can recognise that I find the more traditional format just as off-putting.
â introspec
Oct 2 at 21:59
 |Â
show 5 more comments
3
I bet they did not, because if they did, the command set would have been very different. But the same can be said about pretty much every popular architechture: once it becomes popular, people find innovative ways to exploit it. And given how the software library for, say, ZX Spectrum, is absolutely dominated by game titles, I'd never call graphics fiddling esoteric...
â introspec
Oct 1 at 13:30
4
Interrupts don't have to happen often for there to be value in rapidly switching to the interrupt context - it's the latency of response that motivates the quick switch there. Of course, if your interrupts can tolerate more time saving state, then the alternate registers are available for your programs, and may increase your throughput of work.
â Toby Speight
Oct 2 at 10:15
2
By my count, the code you give to scroll by one pixel takes 19 cycles per byte, and the code to scroll by two takes 42. Usingld a,(hl) / rla / ld (hl),a / inc lwould take 22, and if there were a way to adapt that to shift two bits at an extra cost of 16 cycles or fewer that could be a win, but the fact that swapping flags also swapsAwould mean the cheapest approach I can see would cost 20 extra cycles.
â supercat
Oct 2 at 18:29
1
@TobySpeight, my answer is likely to be coloured by my experience of coding small home computers, mostly ZX Spectrum-compatibles. Some of the most basic ones did not have any sources of interrupts; ZX Spectrum compatibles have a ~50Hz frame interrupt that does not really require any kind of response from the coder. I would actually love to learn about common Z80-based architectures where the response latency mattered, because I do not know a single example of this situation.
â introspec
Oct 2 at 21:53
1
@cat, I do fair amount of Z80 coding even nowadays and I find traditional form of assembly code (one command per line) incredibly diluted. Thus, I use assembler that allows multiple commands per line (colon is a command separator, just like you guessed). I find that putting related groups of commands together onto a single line increases the readability, because you use the screen space better and can also group commands by their intended action. I recognise that some people find it off-putting. I hope you can recognise that I find the more traditional format just as off-putting.
â introspec
Oct 2 at 21:59
3
3
I bet they did not, because if they did, the command set would have been very different. But the same can be said about pretty much every popular architechture: once it becomes popular, people find innovative ways to exploit it. And given how the software library for, say, ZX Spectrum, is absolutely dominated by game titles, I'd never call graphics fiddling esoteric...
â introspec
Oct 1 at 13:30
I bet they did not, because if they did, the command set would have been very different. But the same can be said about pretty much every popular architechture: once it becomes popular, people find innovative ways to exploit it. And given how the software library for, say, ZX Spectrum, is absolutely dominated by game titles, I'd never call graphics fiddling esoteric...
â introspec
Oct 1 at 13:30
4
4
Interrupts don't have to happen often for there to be value in rapidly switching to the interrupt context - it's the latency of response that motivates the quick switch there. Of course, if your interrupts can tolerate more time saving state, then the alternate registers are available for your programs, and may increase your throughput of work.
â Toby Speight
Oct 2 at 10:15
Interrupts don't have to happen often for there to be value in rapidly switching to the interrupt context - it's the latency of response that motivates the quick switch there. Of course, if your interrupts can tolerate more time saving state, then the alternate registers are available for your programs, and may increase your throughput of work.
â Toby Speight
Oct 2 at 10:15
2
2
By my count, the code you give to scroll by one pixel takes 19 cycles per byte, and the code to scroll by two takes 42. Using
ld a,(hl) / rla / ld (hl),a / inc l would take 22, and if there were a way to adapt that to shift two bits at an extra cost of 16 cycles or fewer that could be a win, but the fact that swapping flags also swaps A would mean the cheapest approach I can see would cost 20 extra cycles.â supercat
Oct 2 at 18:29
By my count, the code you give to scroll by one pixel takes 19 cycles per byte, and the code to scroll by two takes 42. Using
ld a,(hl) / rla / ld (hl),a / inc l would take 22, and if there were a way to adapt that to shift two bits at an extra cost of 16 cycles or fewer that could be a win, but the fact that swapping flags also swaps A would mean the cheapest approach I can see would cost 20 extra cycles.â supercat
Oct 2 at 18:29
1
1
@TobySpeight, my answer is likely to be coloured by my experience of coding small home computers, mostly ZX Spectrum-compatibles. Some of the most basic ones did not have any sources of interrupts; ZX Spectrum compatibles have a ~50Hz frame interrupt that does not really require any kind of response from the coder. I would actually love to learn about common Z80-based architectures where the response latency mattered, because I do not know a single example of this situation.
â introspec
Oct 2 at 21:53
@TobySpeight, my answer is likely to be coloured by my experience of coding small home computers, mostly ZX Spectrum-compatibles. Some of the most basic ones did not have any sources of interrupts; ZX Spectrum compatibles have a ~50Hz frame interrupt that does not really require any kind of response from the coder. I would actually love to learn about common Z80-based architectures where the response latency mattered, because I do not know a single example of this situation.
â introspec
Oct 2 at 21:53
1
1
@cat, I do fair amount of Z80 coding even nowadays and I find traditional form of assembly code (one command per line) incredibly diluted. Thus, I use assembler that allows multiple commands per line (colon is a command separator, just like you guessed). I find that putting related groups of commands together onto a single line increases the readability, because you use the screen space better and can also group commands by their intended action. I recognise that some people find it off-putting. I hope you can recognise that I find the more traditional format just as off-putting.
â introspec
Oct 2 at 21:59
@cat, I do fair amount of Z80 coding even nowadays and I find traditional form of assembly code (one command per line) incredibly diluted. Thus, I use assembler that allows multiple commands per line (colon is a command separator, just like you guessed). I find that putting related groups of commands together onto a single line increases the readability, because you use the screen space better and can also group commands by their intended action. I recognise that some people find it off-putting. I hope you can recognise that I find the more traditional format just as off-putting.
â introspec
Oct 2 at 21:59
 |Â
show 5 more comments
up vote
17
down vote
The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling,
Indeed they were intended for fast interrupt reaction. In a simple, general way, this saved the time to push the main process' registers onto the stack and restore them again. they spend single byte opcodes to do so to get the absolute minimum execution time - like the Z80 Technical Manual states on p.26:
OP code 08H allows the programmer to switch between the two pairs of
accumulator flag registers while D9H allows the programmer to switch between
the duplicate set of six general purpose registers. These OP codes are only one
byte in length to absolutely minimize the time necessary to perform the
exchange so that the duplicate banks can be used to effect very fast interrupt
response times.
EX and EXX only take 4 T-cycles, while even just pushing a simple 16-bit register would take 11 cycles plus another 15 to load it again. 8 T-cycles instead of 25 or more cycles is a considerably faster reaction, isn't it?
That's also why there are two EX* instruction, as very simple routines may only (use and) need to preserve the flags and A. This leaves the whole second set (except AF) for other purposes. Like being used in normal software, or for even more speedup in I/O.
After all, the second set can not only be used for some kind of fast 'stack' but also be prepared for a certain I/O operation. Think maybe of a serial interface receiving at high speed. Loading things like the memory pointer where received data is to be placed, the number of bytes to receive and so on, does take quite some time (16 T-Cycles for a 16 Bit pointer, 13 for a byte value) - and they need to be stored later on as well.
If these values are placed in the second register set before the high speed interrupt driven routine gets active, no loads and stores are to be executed. Interrupt service time gets reduced to the absolute minimum, not only causing less interruption of the main process but also working up to higher speeds.
After all, the Z80 design was mainly focused on a more flexible, configurable and faster interrupt handling.
though I think if I were programming a Z80 retro computer, I would be more likely to use them for fast access to global variables.
I can't see much gain here. Sure, 6 additional bytes or 3 pointers, but at the same time, you can't access the other ones. So there are not many cases where the secondary register set is helpful - besides interrupts and 'dead end' subroutines.
Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs.
Well, it's exactly the region where they are useful - to speed up small functions.
Back in the day, I was on 6502 machines, so I never had occasion to write anything non-trivial on the Z80.
Did both, and while they need different approaches, the result is usually quite similar.
Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?
It was quite common to use them either for interrupt (mostly in embedded systems) or 'dead end' routines.
So within a single task, the most likely place to use them would be in leaf subroutines, so you can have the use of a full set of registers without having to spend cycles saving and restoring those used by the rest of the program. Okay, that makes sense.
â rwallace
Sep 30 at 20:51
1
@rwallace yes, except there's till the issue of parameter passing.
â Raffzahn
Sep 30 at 21:20
1
I miss writing code like this, with execution times and cycles in mind. So much code written today ignores what was such a fundamental concept then.
â coteyr
Oct 1 at 7:35
Can you elaborate what you mean with "dead end" routines (last sentence)?
â tofro
Oct 1 at 7:53
By "dead-end" subroutines, do you mean what compiler-writers sometimes call "leaf-functions", i.e. functions which do not call anything else?
â Wilson
Oct 1 at 8:07
 |Â
show 1 more comment
up vote
17
down vote
The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling,
Indeed they were intended for fast interrupt reaction. In a simple, general way, this saved the time to push the main process' registers onto the stack and restore them again. they spend single byte opcodes to do so to get the absolute minimum execution time - like the Z80 Technical Manual states on p.26:
OP code 08H allows the programmer to switch between the two pairs of
accumulator flag registers while D9H allows the programmer to switch between
the duplicate set of six general purpose registers. These OP codes are only one
byte in length to absolutely minimize the time necessary to perform the
exchange so that the duplicate banks can be used to effect very fast interrupt
response times.
EX and EXX only take 4 T-cycles, while even just pushing a simple 16-bit register would take 11 cycles plus another 15 to load it again. 8 T-cycles instead of 25 or more cycles is a considerably faster reaction, isn't it?
That's also why there are two EX* instruction, as very simple routines may only (use and) need to preserve the flags and A. This leaves the whole second set (except AF) for other purposes. Like being used in normal software, or for even more speedup in I/O.
After all, the second set can not only be used for some kind of fast 'stack' but also be prepared for a certain I/O operation. Think maybe of a serial interface receiving at high speed. Loading things like the memory pointer where received data is to be placed, the number of bytes to receive and so on, does take quite some time (16 T-Cycles for a 16 Bit pointer, 13 for a byte value) - and they need to be stored later on as well.
If these values are placed in the second register set before the high speed interrupt driven routine gets active, no loads and stores are to be executed. Interrupt service time gets reduced to the absolute minimum, not only causing less interruption of the main process but also working up to higher speeds.
After all, the Z80 design was mainly focused on a more flexible, configurable and faster interrupt handling.
though I think if I were programming a Z80 retro computer, I would be more likely to use them for fast access to global variables.
I can't see much gain here. Sure, 6 additional bytes or 3 pointers, but at the same time, you can't access the other ones. So there are not many cases where the secondary register set is helpful - besides interrupts and 'dead end' subroutines.
Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs.
Well, it's exactly the region where they are useful - to speed up small functions.
Back in the day, I was on 6502 machines, so I never had occasion to write anything non-trivial on the Z80.
Did both, and while they need different approaches, the result is usually quite similar.
Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?
It was quite common to use them either for interrupt (mostly in embedded systems) or 'dead end' routines.
So within a single task, the most likely place to use them would be in leaf subroutines, so you can have the use of a full set of registers without having to spend cycles saving and restoring those used by the rest of the program. Okay, that makes sense.
â rwallace
Sep 30 at 20:51
1
@rwallace yes, except there's till the issue of parameter passing.
â Raffzahn
Sep 30 at 21:20
1
I miss writing code like this, with execution times and cycles in mind. So much code written today ignores what was such a fundamental concept then.
â coteyr
Oct 1 at 7:35
Can you elaborate what you mean with "dead end" routines (last sentence)?
â tofro
Oct 1 at 7:53
By "dead-end" subroutines, do you mean what compiler-writers sometimes call "leaf-functions", i.e. functions which do not call anything else?
â Wilson
Oct 1 at 8:07
 |Â
show 1 more comment
up vote
17
down vote
up vote
17
down vote
The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling,
Indeed they were intended for fast interrupt reaction. In a simple, general way, this saved the time to push the main process' registers onto the stack and restore them again. they spend single byte opcodes to do so to get the absolute minimum execution time - like the Z80 Technical Manual states on p.26:
OP code 08H allows the programmer to switch between the two pairs of
accumulator flag registers while D9H allows the programmer to switch between
the duplicate set of six general purpose registers. These OP codes are only one
byte in length to absolutely minimize the time necessary to perform the
exchange so that the duplicate banks can be used to effect very fast interrupt
response times.
EX and EXX only take 4 T-cycles, while even just pushing a simple 16-bit register would take 11 cycles plus another 15 to load it again. 8 T-cycles instead of 25 or more cycles is a considerably faster reaction, isn't it?
That's also why there are two EX* instruction, as very simple routines may only (use and) need to preserve the flags and A. This leaves the whole second set (except AF) for other purposes. Like being used in normal software, or for even more speedup in I/O.
After all, the second set can not only be used for some kind of fast 'stack' but also be prepared for a certain I/O operation. Think maybe of a serial interface receiving at high speed. Loading things like the memory pointer where received data is to be placed, the number of bytes to receive and so on, does take quite some time (16 T-Cycles for a 16 Bit pointer, 13 for a byte value) - and they need to be stored later on as well.
If these values are placed in the second register set before the high speed interrupt driven routine gets active, no loads and stores are to be executed. Interrupt service time gets reduced to the absolute minimum, not only causing less interruption of the main process but also working up to higher speeds.
After all, the Z80 design was mainly focused on a more flexible, configurable and faster interrupt handling.
though I think if I were programming a Z80 retro computer, I would be more likely to use them for fast access to global variables.
I can't see much gain here. Sure, 6 additional bytes or 3 pointers, but at the same time, you can't access the other ones. So there are not many cases where the secondary register set is helpful - besides interrupts and 'dead end' subroutines.
Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs.
Well, it's exactly the region where they are useful - to speed up small functions.
Back in the day, I was on 6502 machines, so I never had occasion to write anything non-trivial on the Z80.
Did both, and while they need different approaches, the result is usually quite similar.
Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?
It was quite common to use them either for interrupt (mostly in embedded systems) or 'dead end' routines.
The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling,
Indeed they were intended for fast interrupt reaction. In a simple, general way, this saved the time to push the main process' registers onto the stack and restore them again. they spend single byte opcodes to do so to get the absolute minimum execution time - like the Z80 Technical Manual states on p.26:
OP code 08H allows the programmer to switch between the two pairs of
accumulator flag registers while D9H allows the programmer to switch between
the duplicate set of six general purpose registers. These OP codes are only one
byte in length to absolutely minimize the time necessary to perform the
exchange so that the duplicate banks can be used to effect very fast interrupt
response times.
EX and EXX only take 4 T-cycles, while even just pushing a simple 16-bit register would take 11 cycles plus another 15 to load it again. 8 T-cycles instead of 25 or more cycles is a considerably faster reaction, isn't it?
That's also why there are two EX* instruction, as very simple routines may only (use and) need to preserve the flags and A. This leaves the whole second set (except AF) for other purposes. Like being used in normal software, or for even more speedup in I/O.
After all, the second set can not only be used for some kind of fast 'stack' but also be prepared for a certain I/O operation. Think maybe of a serial interface receiving at high speed. Loading things like the memory pointer where received data is to be placed, the number of bytes to receive and so on, does take quite some time (16 T-Cycles for a 16 Bit pointer, 13 for a byte value) - and they need to be stored later on as well.
If these values are placed in the second register set before the high speed interrupt driven routine gets active, no loads and stores are to be executed. Interrupt service time gets reduced to the absolute minimum, not only causing less interruption of the main process but also working up to higher speeds.
After all, the Z80 design was mainly focused on a more flexible, configurable and faster interrupt handling.
though I think if I were programming a Z80 retro computer, I would be more likely to use them for fast access to global variables.
I can't see much gain here. Sure, 6 additional bytes or 3 pointers, but at the same time, you can't access the other ones. So there are not many cases where the secondary register set is helpful - besides interrupts and 'dead end' subroutines.
Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs.
Well, it's exactly the region where they are useful - to speed up small functions.
Back in the day, I was on 6502 machines, so I never had occasion to write anything non-trivial on the Z80.
Did both, and while they need different approaches, the result is usually quite similar.
Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?
It was quite common to use them either for interrupt (mostly in embedded systems) or 'dead end' routines.
edited Oct 1 at 11:43
answered Sep 30 at 20:45
Raffzahn
37k482148
37k482148
So within a single task, the most likely place to use them would be in leaf subroutines, so you can have the use of a full set of registers without having to spend cycles saving and restoring those used by the rest of the program. Okay, that makes sense.
â rwallace
Sep 30 at 20:51
1
@rwallace yes, except there's till the issue of parameter passing.
â Raffzahn
Sep 30 at 21:20
1
I miss writing code like this, with execution times and cycles in mind. So much code written today ignores what was such a fundamental concept then.
â coteyr
Oct 1 at 7:35
Can you elaborate what you mean with "dead end" routines (last sentence)?
â tofro
Oct 1 at 7:53
By "dead-end" subroutines, do you mean what compiler-writers sometimes call "leaf-functions", i.e. functions which do not call anything else?
â Wilson
Oct 1 at 8:07
 |Â
show 1 more comment
So within a single task, the most likely place to use them would be in leaf subroutines, so you can have the use of a full set of registers without having to spend cycles saving and restoring those used by the rest of the program. Okay, that makes sense.
â rwallace
Sep 30 at 20:51
1
@rwallace yes, except there's till the issue of parameter passing.
â Raffzahn
Sep 30 at 21:20
1
I miss writing code like this, with execution times and cycles in mind. So much code written today ignores what was such a fundamental concept then.
â coteyr
Oct 1 at 7:35
Can you elaborate what you mean with "dead end" routines (last sentence)?
â tofro
Oct 1 at 7:53
By "dead-end" subroutines, do you mean what compiler-writers sometimes call "leaf-functions", i.e. functions which do not call anything else?
â Wilson
Oct 1 at 8:07
So within a single task, the most likely place to use them would be in leaf subroutines, so you can have the use of a full set of registers without having to spend cycles saving and restoring those used by the rest of the program. Okay, that makes sense.
â rwallace
Sep 30 at 20:51
So within a single task, the most likely place to use them would be in leaf subroutines, so you can have the use of a full set of registers without having to spend cycles saving and restoring those used by the rest of the program. Okay, that makes sense.
â rwallace
Sep 30 at 20:51
1
1
@rwallace yes, except there's till the issue of parameter passing.
â Raffzahn
Sep 30 at 21:20
@rwallace yes, except there's till the issue of parameter passing.
â Raffzahn
Sep 30 at 21:20
1
1
I miss writing code like this, with execution times and cycles in mind. So much code written today ignores what was such a fundamental concept then.
â coteyr
Oct 1 at 7:35
I miss writing code like this, with execution times and cycles in mind. So much code written today ignores what was such a fundamental concept then.
â coteyr
Oct 1 at 7:35
Can you elaborate what you mean with "dead end" routines (last sentence)?
â tofro
Oct 1 at 7:53
Can you elaborate what you mean with "dead end" routines (last sentence)?
â tofro
Oct 1 at 7:53
By "dead-end" subroutines, do you mean what compiler-writers sometimes call "leaf-functions", i.e. functions which do not call anything else?
â Wilson
Oct 1 at 8:07
By "dead-end" subroutines, do you mean what compiler-writers sometimes call "leaf-functions", i.e. functions which do not call anything else?
â Wilson
Oct 1 at 8:07
 |Â
show 1 more comment
up vote
13
down vote
For my Aquarius Micro-Expander I used the alternate register set to speed up text display.
The Mattel Aquarius has a character based screen that is physically 40 characters by 25 lines, but to stay out of the overscan area it only uses 38 characters by 24 lines. With such odd dimensions calculating the cursor address takes a lot of code, and combined with other overhead makes printing to the screen quite slow. I wanted to reduce the overhead as much as possible, and also have more flexible line width and positioning.
Here's my core code for putting a character on screen. HL' holds the current screen address and D' holds the cursor position on the line. Higher level routines test and manipulate these registers to wrap at line ends and scroll text in windows.
WinPrtChr:
EXX
LD (HL),A ; poke char into screen memory
INC HL ; HL' = next screen address
INC D ; D' = next screen x
EXX
RET
BackSpace:
EXX
LD (HL)," " ; poke SPACE into screen memory
DEC HL ; HL' = previous screen address
DEC D ; D' = previous screen x
EXX
RET
Using alternate registers was much faster than storing the variables in RAM, and kept the normal registers free for other uses. The result was fewer memory accesses and faster operation in general. My code prints text into windows of arbitrary size about 20 times faster than the system routines.
Unfortunately the Aquarius system ROM also uses alternate registers when scanning the keyboard, so I had to wrap the keyboard routines with a pair of EXX's (to get back to the normal register set) and save the affected registers on the stack when getting keyboard input. However I didn't have to worry about interrupts (where the alternate register set is often used for fast context saving) because the Aquarius doesn't have any!
add a comment |Â
up vote
13
down vote
For my Aquarius Micro-Expander I used the alternate register set to speed up text display.
The Mattel Aquarius has a character based screen that is physically 40 characters by 25 lines, but to stay out of the overscan area it only uses 38 characters by 24 lines. With such odd dimensions calculating the cursor address takes a lot of code, and combined with other overhead makes printing to the screen quite slow. I wanted to reduce the overhead as much as possible, and also have more flexible line width and positioning.
Here's my core code for putting a character on screen. HL' holds the current screen address and D' holds the cursor position on the line. Higher level routines test and manipulate these registers to wrap at line ends and scroll text in windows.
WinPrtChr:
EXX
LD (HL),A ; poke char into screen memory
INC HL ; HL' = next screen address
INC D ; D' = next screen x
EXX
RET
BackSpace:
EXX
LD (HL)," " ; poke SPACE into screen memory
DEC HL ; HL' = previous screen address
DEC D ; D' = previous screen x
EXX
RET
Using alternate registers was much faster than storing the variables in RAM, and kept the normal registers free for other uses. The result was fewer memory accesses and faster operation in general. My code prints text into windows of arbitrary size about 20 times faster than the system routines.
Unfortunately the Aquarius system ROM also uses alternate registers when scanning the keyboard, so I had to wrap the keyboard routines with a pair of EXX's (to get back to the normal register set) and save the affected registers on the stack when getting keyboard input. However I didn't have to worry about interrupts (where the alternate register set is often used for fast context saving) because the Aquarius doesn't have any!
add a comment |Â
up vote
13
down vote
up vote
13
down vote
For my Aquarius Micro-Expander I used the alternate register set to speed up text display.
The Mattel Aquarius has a character based screen that is physically 40 characters by 25 lines, but to stay out of the overscan area it only uses 38 characters by 24 lines. With such odd dimensions calculating the cursor address takes a lot of code, and combined with other overhead makes printing to the screen quite slow. I wanted to reduce the overhead as much as possible, and also have more flexible line width and positioning.
Here's my core code for putting a character on screen. HL' holds the current screen address and D' holds the cursor position on the line. Higher level routines test and manipulate these registers to wrap at line ends and scroll text in windows.
WinPrtChr:
EXX
LD (HL),A ; poke char into screen memory
INC HL ; HL' = next screen address
INC D ; D' = next screen x
EXX
RET
BackSpace:
EXX
LD (HL)," " ; poke SPACE into screen memory
DEC HL ; HL' = previous screen address
DEC D ; D' = previous screen x
EXX
RET
Using alternate registers was much faster than storing the variables in RAM, and kept the normal registers free for other uses. The result was fewer memory accesses and faster operation in general. My code prints text into windows of arbitrary size about 20 times faster than the system routines.
Unfortunately the Aquarius system ROM also uses alternate registers when scanning the keyboard, so I had to wrap the keyboard routines with a pair of EXX's (to get back to the normal register set) and save the affected registers on the stack when getting keyboard input. However I didn't have to worry about interrupts (where the alternate register set is often used for fast context saving) because the Aquarius doesn't have any!
For my Aquarius Micro-Expander I used the alternate register set to speed up text display.
The Mattel Aquarius has a character based screen that is physically 40 characters by 25 lines, but to stay out of the overscan area it only uses 38 characters by 24 lines. With such odd dimensions calculating the cursor address takes a lot of code, and combined with other overhead makes printing to the screen quite slow. I wanted to reduce the overhead as much as possible, and also have more flexible line width and positioning.
Here's my core code for putting a character on screen. HL' holds the current screen address and D' holds the cursor position on the line. Higher level routines test and manipulate these registers to wrap at line ends and scroll text in windows.
WinPrtChr:
EXX
LD (HL),A ; poke char into screen memory
INC HL ; HL' = next screen address
INC D ; D' = next screen x
EXX
RET
BackSpace:
EXX
LD (HL)," " ; poke SPACE into screen memory
DEC HL ; HL' = previous screen address
DEC D ; D' = previous screen x
EXX
RET
Using alternate registers was much faster than storing the variables in RAM, and kept the normal registers free for other uses. The result was fewer memory accesses and faster operation in general. My code prints text into windows of arbitrary size about 20 times faster than the system routines.
Unfortunately the Aquarius system ROM also uses alternate registers when scanning the keyboard, so I had to wrap the keyboard routines with a pair of EXX's (to get back to the normal register set) and save the affected registers on the stack when getting keyboard input. However I didn't have to worry about interrupts (where the alternate register set is often used for fast context saving) because the Aquarius doesn't have any!
answered Oct 1 at 2:13
Bruce Abbott
64116
64116
add a comment |Â
add a comment |Â
up vote
11
down vote
A specific example is the chess program Sargon (written in 1978 or a bit earlier), which used them in a couple of leaf subroutines.
Search the assembler listing at http://smecers.appspot.com/govs/Oldies/Sargon.htm for the EXX instruction. (It's in routines XCHNG and NEXTAD)
The code is well documented, if anyone wants to explore the cost of alternative coding techniques.
Apart from Sargon, I'm pretty sure they were used in the code examples in some "how to write interrupt routines" books of the period, as a "quick" way to save all the registers, but I don't have any references or that.
add a comment |Â
up vote
11
down vote
A specific example is the chess program Sargon (written in 1978 or a bit earlier), which used them in a couple of leaf subroutines.
Search the assembler listing at http://smecers.appspot.com/govs/Oldies/Sargon.htm for the EXX instruction. (It's in routines XCHNG and NEXTAD)
The code is well documented, if anyone wants to explore the cost of alternative coding techniques.
Apart from Sargon, I'm pretty sure they were used in the code examples in some "how to write interrupt routines" books of the period, as a "quick" way to save all the registers, but I don't have any references or that.
add a comment |Â
up vote
11
down vote
up vote
11
down vote
A specific example is the chess program Sargon (written in 1978 or a bit earlier), which used them in a couple of leaf subroutines.
Search the assembler listing at http://smecers.appspot.com/govs/Oldies/Sargon.htm for the EXX instruction. (It's in routines XCHNG and NEXTAD)
The code is well documented, if anyone wants to explore the cost of alternative coding techniques.
Apart from Sargon, I'm pretty sure they were used in the code examples in some "how to write interrupt routines" books of the period, as a "quick" way to save all the registers, but I don't have any references or that.
A specific example is the chess program Sargon (written in 1978 or a bit earlier), which used them in a couple of leaf subroutines.
Search the assembler listing at http://smecers.appspot.com/govs/Oldies/Sargon.htm for the EXX instruction. (It's in routines XCHNG and NEXTAD)
The code is well documented, if anyone wants to explore the cost of alternative coding techniques.
Apart from Sargon, I'm pretty sure they were used in the code examples in some "how to write interrupt routines" books of the period, as a "quick" way to save all the registers, but I don't have any references or that.
answered Sep 30 at 22:25
alephzero
84829
84829
add a comment |Â
add a comment |Â
up vote
11
down vote
The question asks if the alternate register set was "ever" used ... this answer is about a modern use of it.
The z88dk C compiler, sccz80[1] uses a calling convention where the registers B', C', D', E', H' and L' are used to hold the first 48-bit floating point argument to a function, or a floating point return value. Benchmark results suggest that this is a very good strategy, as the results are similar in performance to the 32-bit floating point used by most other Z80 C compilers.
[1] - actually, one of the two C compilers that are part of the z88dk toolchain, the other being a patched version of sdcc that uses 32-bit floats with a different calling convention
As you say one of the float libraries uses the exx set to hold the main floating point accumulator which is 48-bit in BCDEHL. Two floating point values can be held at the same time, one in the main set BCDEHL and one in the exx set BCDEHL'. This is quite fast and allows the float library to be re-entrant on top of that.
â aralbrec
17 hours ago
The fastest integer math (multiply and divide) also involves the exx set. You'll see it used in 32-,64-,72-bit math and combinations with lower bit sizes. In stdio, the exx set is used to hold tallies like number of characters output and counters for sending buffers one char at a time when a device can't do a block on its own. The main set is then used by the driver to communicate with the device. The exx set is used in more places than that but those are the important ones. It improves performance so much that it's not worthwhile to reserve for fast interrupt response.
â aralbrec
17 hours ago
add a comment |Â
up vote
11
down vote
The question asks if the alternate register set was "ever" used ... this answer is about a modern use of it.
The z88dk C compiler, sccz80[1] uses a calling convention where the registers B', C', D', E', H' and L' are used to hold the first 48-bit floating point argument to a function, or a floating point return value. Benchmark results suggest that this is a very good strategy, as the results are similar in performance to the 32-bit floating point used by most other Z80 C compilers.
[1] - actually, one of the two C compilers that are part of the z88dk toolchain, the other being a patched version of sdcc that uses 32-bit floats with a different calling convention
As you say one of the float libraries uses the exx set to hold the main floating point accumulator which is 48-bit in BCDEHL. Two floating point values can be held at the same time, one in the main set BCDEHL and one in the exx set BCDEHL'. This is quite fast and allows the float library to be re-entrant on top of that.
â aralbrec
17 hours ago
The fastest integer math (multiply and divide) also involves the exx set. You'll see it used in 32-,64-,72-bit math and combinations with lower bit sizes. In stdio, the exx set is used to hold tallies like number of characters output and counters for sending buffers one char at a time when a device can't do a block on its own. The main set is then used by the driver to communicate with the device. The exx set is used in more places than that but those are the important ones. It improves performance so much that it's not worthwhile to reserve for fast interrupt response.
â aralbrec
17 hours ago
add a comment |Â
up vote
11
down vote
up vote
11
down vote
The question asks if the alternate register set was "ever" used ... this answer is about a modern use of it.
The z88dk C compiler, sccz80[1] uses a calling convention where the registers B', C', D', E', H' and L' are used to hold the first 48-bit floating point argument to a function, or a floating point return value. Benchmark results suggest that this is a very good strategy, as the results are similar in performance to the 32-bit floating point used by most other Z80 C compilers.
[1] - actually, one of the two C compilers that are part of the z88dk toolchain, the other being a patched version of sdcc that uses 32-bit floats with a different calling convention
The question asks if the alternate register set was "ever" used ... this answer is about a modern use of it.
The z88dk C compiler, sccz80[1] uses a calling convention where the registers B', C', D', E', H' and L' are used to hold the first 48-bit floating point argument to a function, or a floating point return value. Benchmark results suggest that this is a very good strategy, as the results are similar in performance to the 32-bit floating point used by most other Z80 C compilers.
[1] - actually, one of the two C compilers that are part of the z88dk toolchain, the other being a patched version of sdcc that uses 32-bit floats with a different calling convention
answered Sep 30 at 22:30
Jules
7,94012141
7,94012141
As you say one of the float libraries uses the exx set to hold the main floating point accumulator which is 48-bit in BCDEHL. Two floating point values can be held at the same time, one in the main set BCDEHL and one in the exx set BCDEHL'. This is quite fast and allows the float library to be re-entrant on top of that.
â aralbrec
17 hours ago
The fastest integer math (multiply and divide) also involves the exx set. You'll see it used in 32-,64-,72-bit math and combinations with lower bit sizes. In stdio, the exx set is used to hold tallies like number of characters output and counters for sending buffers one char at a time when a device can't do a block on its own. The main set is then used by the driver to communicate with the device. The exx set is used in more places than that but those are the important ones. It improves performance so much that it's not worthwhile to reserve for fast interrupt response.
â aralbrec
17 hours ago
add a comment |Â
As you say one of the float libraries uses the exx set to hold the main floating point accumulator which is 48-bit in BCDEHL. Two floating point values can be held at the same time, one in the main set BCDEHL and one in the exx set BCDEHL'. This is quite fast and allows the float library to be re-entrant on top of that.
â aralbrec
17 hours ago
The fastest integer math (multiply and divide) also involves the exx set. You'll see it used in 32-,64-,72-bit math and combinations with lower bit sizes. In stdio, the exx set is used to hold tallies like number of characters output and counters for sending buffers one char at a time when a device can't do a block on its own. The main set is then used by the driver to communicate with the device. The exx set is used in more places than that but those are the important ones. It improves performance so much that it's not worthwhile to reserve for fast interrupt response.
â aralbrec
17 hours ago
As you say one of the float libraries uses the exx set to hold the main floating point accumulator which is 48-bit in BCDEHL. Two floating point values can be held at the same time, one in the main set BCDEHL and one in the exx set BCDEHL'. This is quite fast and allows the float library to be re-entrant on top of that.
â aralbrec
17 hours ago
As you say one of the float libraries uses the exx set to hold the main floating point accumulator which is 48-bit in BCDEHL. Two floating point values can be held at the same time, one in the main set BCDEHL and one in the exx set BCDEHL'. This is quite fast and allows the float library to be re-entrant on top of that.
â aralbrec
17 hours ago
The fastest integer math (multiply and divide) also involves the exx set. You'll see it used in 32-,64-,72-bit math and combinations with lower bit sizes. In stdio, the exx set is used to hold tallies like number of characters output and counters for sending buffers one char at a time when a device can't do a block on its own. The main set is then used by the driver to communicate with the device. The exx set is used in more places than that but those are the important ones. It improves performance so much that it's not worthwhile to reserve for fast interrupt response.
â aralbrec
17 hours ago
The fastest integer math (multiply and divide) also involves the exx set. You'll see it used in 32-,64-,72-bit math and combinations with lower bit sizes. In stdio, the exx set is used to hold tallies like number of characters output and counters for sending buffers one char at a time when a device can't do a block on its own. The main set is then used by the driver to communicate with the device. The exx set is used in more places than that but those are the important ones. It improves performance so much that it's not worthwhile to reserve for fast interrupt response.
â aralbrec
17 hours ago
add a comment |Â
up vote
9
down vote
As usual, the Z80 Oral History provides some insight into the motivation behind the alternate registers and exchange instructions:
[Faggin]
So I wanted to
have a couple of index registers, more 16-bit operations, a better
interrupt structure. The whole idea of doubling the number of
registers. And I could exchange the register with an exchange
instruction, the whole register set. That was an idea that I had used
already on the Intel 4040. So that one could serve it up very fast if
that was a necessity. And on and on.
[Shima]
Thirdly, in order to support the highspeed
task switching, in the beginning I asked to complete two sets of register files including the program
counter. But it was too complicated for customer. Then we gave up on [the idea] of the two sets of
general purpose registers.
...
And two instruction codes were used for the exchange, the set of
general purpose register, and exchanging the set of accumulator and
flags. Also those support the high-speed task switching.
(And while it may be obvious, it's worth pointing out that in the silicon implementation, the exchange instructions merely modify the state of the register file addressing logic.)
2
I wonder how much it would have cost to have separate prime-select bits for BC, DE, and HL, and have 16 or 32 DD-prefix opcodes xor those bits along with AF/AF' (and perhaps the DE/HL selection bit) with bits from the second byte. That could have made the secondary registers a lot more useful.
â supercat
Oct 2 at 18:43
add a comment |Â
up vote
9
down vote
As usual, the Z80 Oral History provides some insight into the motivation behind the alternate registers and exchange instructions:
[Faggin]
So I wanted to
have a couple of index registers, more 16-bit operations, a better
interrupt structure. The whole idea of doubling the number of
registers. And I could exchange the register with an exchange
instruction, the whole register set. That was an idea that I had used
already on the Intel 4040. So that one could serve it up very fast if
that was a necessity. And on and on.
[Shima]
Thirdly, in order to support the highspeed
task switching, in the beginning I asked to complete two sets of register files including the program
counter. But it was too complicated for customer. Then we gave up on [the idea] of the two sets of
general purpose registers.
...
And two instruction codes were used for the exchange, the set of
general purpose register, and exchanging the set of accumulator and
flags. Also those support the high-speed task switching.
(And while it may be obvious, it's worth pointing out that in the silicon implementation, the exchange instructions merely modify the state of the register file addressing logic.)
2
I wonder how much it would have cost to have separate prime-select bits for BC, DE, and HL, and have 16 or 32 DD-prefix opcodes xor those bits along with AF/AF' (and perhaps the DE/HL selection bit) with bits from the second byte. That could have made the secondary registers a lot more useful.
â supercat
Oct 2 at 18:43
add a comment |Â
up vote
9
down vote
up vote
9
down vote
As usual, the Z80 Oral History provides some insight into the motivation behind the alternate registers and exchange instructions:
[Faggin]
So I wanted to
have a couple of index registers, more 16-bit operations, a better
interrupt structure. The whole idea of doubling the number of
registers. And I could exchange the register with an exchange
instruction, the whole register set. That was an idea that I had used
already on the Intel 4040. So that one could serve it up very fast if
that was a necessity. And on and on.
[Shima]
Thirdly, in order to support the highspeed
task switching, in the beginning I asked to complete two sets of register files including the program
counter. But it was too complicated for customer. Then we gave up on [the idea] of the two sets of
general purpose registers.
...
And two instruction codes were used for the exchange, the set of
general purpose register, and exchanging the set of accumulator and
flags. Also those support the high-speed task switching.
(And while it may be obvious, it's worth pointing out that in the silicon implementation, the exchange instructions merely modify the state of the register file addressing logic.)
As usual, the Z80 Oral History provides some insight into the motivation behind the alternate registers and exchange instructions:
[Faggin]
So I wanted to
have a couple of index registers, more 16-bit operations, a better
interrupt structure. The whole idea of doubling the number of
registers. And I could exchange the register with an exchange
instruction, the whole register set. That was an idea that I had used
already on the Intel 4040. So that one could serve it up very fast if
that was a necessity. And on and on.
[Shima]
Thirdly, in order to support the highspeed
task switching, in the beginning I asked to complete two sets of register files including the program
counter. But it was too complicated for customer. Then we gave up on [the idea] of the two sets of
general purpose registers.
...
And two instruction codes were used for the exchange, the set of
general purpose register, and exchanging the set of accumulator and
flags. Also those support the high-speed task switching.
(And while it may be obvious, it's worth pointing out that in the silicon implementation, the exchange instructions merely modify the state of the register file addressing logic.)
answered Oct 2 at 8:35
Jeremy
3267
3267
2
I wonder how much it would have cost to have separate prime-select bits for BC, DE, and HL, and have 16 or 32 DD-prefix opcodes xor those bits along with AF/AF' (and perhaps the DE/HL selection bit) with bits from the second byte. That could have made the secondary registers a lot more useful.
â supercat
Oct 2 at 18:43
add a comment |Â
2
I wonder how much it would have cost to have separate prime-select bits for BC, DE, and HL, and have 16 or 32 DD-prefix opcodes xor those bits along with AF/AF' (and perhaps the DE/HL selection bit) with bits from the second byte. That could have made the secondary registers a lot more useful.
â supercat
Oct 2 at 18:43
2
2
I wonder how much it would have cost to have separate prime-select bits for BC, DE, and HL, and have 16 or 32 DD-prefix opcodes xor those bits along with AF/AF' (and perhaps the DE/HL selection bit) with bits from the second byte. That could have made the secondary registers a lot more useful.
â supercat
Oct 2 at 18:43
I wonder how much it would have cost to have separate prime-select bits for BC, DE, and HL, and have 16 or 32 DD-prefix opcodes xor those bits along with AF/AF' (and perhaps the DE/HL selection bit) with bits from the second byte. That could have made the secondary registers a lot more useful.
â supercat
Oct 2 at 18:43
add a comment |Â
up vote
8
down vote
Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?
This being one occasion when a personal experience answer will do, EXX is ideal for the very specific task of multiplying a 16-bit 2d vector by a scalar, which makes it helpful for 2d vector graphics, and the projection part of 3d vector graphics.
Specifically:
- use
Afor the multiplier â rotate right from it into carry; - use
BCandBC'for the working copy of the multiplicands; these will need shifting left on each iteration; - use
HLandHL'to accumulate the result; performADD HL, BCif carry is set after theRRA.
So the specific convenient observations are:
- you're juggling four 16-bit quantities, but they interact only in pairs;
- and using
EXXlets you use the 16-bit arithmetic that's right there on the main instruction page.
add a comment |Â
up vote
8
down vote
Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?
This being one occasion when a personal experience answer will do, EXX is ideal for the very specific task of multiplying a 16-bit 2d vector by a scalar, which makes it helpful for 2d vector graphics, and the projection part of 3d vector graphics.
Specifically:
- use
Afor the multiplier â rotate right from it into carry; - use
BCandBC'for the working copy of the multiplicands; these will need shifting left on each iteration; - use
HLandHL'to accumulate the result; performADD HL, BCif carry is set after theRRA.
So the specific convenient observations are:
- you're juggling four 16-bit quantities, but they interact only in pairs;
- and using
EXXlets you use the 16-bit arithmetic that's right there on the main instruction page.
add a comment |Â
up vote
8
down vote
up vote
8
down vote
Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?
This being one occasion when a personal experience answer will do, EXX is ideal for the very specific task of multiplying a 16-bit 2d vector by a scalar, which makes it helpful for 2d vector graphics, and the projection part of 3d vector graphics.
Specifically:
- use
Afor the multiplier â rotate right from it into carry; - use
BCandBC'for the working copy of the multiplicands; these will need shifting left on each iteration; - use
HLandHL'to accumulate the result; performADD HL, BCif carry is set after theRRA.
So the specific convenient observations are:
- you're juggling four 16-bit quantities, but they interact only in pairs;
- and using
EXXlets you use the 16-bit arithmetic that's right there on the main instruction page.
Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?
This being one occasion when a personal experience answer will do, EXX is ideal for the very specific task of multiplying a 16-bit 2d vector by a scalar, which makes it helpful for 2d vector graphics, and the projection part of 3d vector graphics.
Specifically:
- use
Afor the multiplier â rotate right from it into carry; - use
BCandBC'for the working copy of the multiplicands; these will need shifting left on each iteration; - use
HLandHL'to accumulate the result; performADD HL, BCif carry is set after theRRA.
So the specific convenient observations are:
- you're juggling four 16-bit quantities, but they interact only in pairs;
- and using
EXXlets you use the 16-bit arithmetic that's right there on the main instruction page.
answered Sep 30 at 21:42
Tommy
12.6k13264
12.6k13264
add a comment |Â
add a comment |Â
up vote
7
down vote
Following the spirit of personal examples, here's one from when I was in high school: I, like everyone else, used a Ti 84 calculator for my Math classes. Said calculator used a Z80.
Out of a mixture of boredom and curiosity, I wrote a program that printed "WOZ IS ALWAYS THE ANSWER" any time the user hit the enter key. The program didn't need to be running for this to happen, and it could happen anywhere in the menu system.
To do that, I used the chip's interrupts to check the input every 1/140th second. The thing is, each time I checked, I had to corrupt the A register. The input methods simply couldn't get around that. So, if I had just used A, the A register would be switching values for every other program run. That wouldn't work.
Instead, I switched into the shadow registers, did the input checking, switched back, and waited for the next call. It worked out pretty well, and did the job!
New contributor
Kyle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
up vote
7
down vote
Following the spirit of personal examples, here's one from when I was in high school: I, like everyone else, used a Ti 84 calculator for my Math classes. Said calculator used a Z80.
Out of a mixture of boredom and curiosity, I wrote a program that printed "WOZ IS ALWAYS THE ANSWER" any time the user hit the enter key. The program didn't need to be running for this to happen, and it could happen anywhere in the menu system.
To do that, I used the chip's interrupts to check the input every 1/140th second. The thing is, each time I checked, I had to corrupt the A register. The input methods simply couldn't get around that. So, if I had just used A, the A register would be switching values for every other program run. That wouldn't work.
Instead, I switched into the shadow registers, did the input checking, switched back, and waited for the next call. It worked out pretty well, and did the job!
New contributor
Kyle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
up vote
7
down vote
up vote
7
down vote
Following the spirit of personal examples, here's one from when I was in high school: I, like everyone else, used a Ti 84 calculator for my Math classes. Said calculator used a Z80.
Out of a mixture of boredom and curiosity, I wrote a program that printed "WOZ IS ALWAYS THE ANSWER" any time the user hit the enter key. The program didn't need to be running for this to happen, and it could happen anywhere in the menu system.
To do that, I used the chip's interrupts to check the input every 1/140th second. The thing is, each time I checked, I had to corrupt the A register. The input methods simply couldn't get around that. So, if I had just used A, the A register would be switching values for every other program run. That wouldn't work.
Instead, I switched into the shadow registers, did the input checking, switched back, and waited for the next call. It worked out pretty well, and did the job!
New contributor
Kyle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Following the spirit of personal examples, here's one from when I was in high school: I, like everyone else, used a Ti 84 calculator for my Math classes. Said calculator used a Z80.
Out of a mixture of boredom and curiosity, I wrote a program that printed "WOZ IS ALWAYS THE ANSWER" any time the user hit the enter key. The program didn't need to be running for this to happen, and it could happen anywhere in the menu system.
To do that, I used the chip's interrupts to check the input every 1/140th second. The thing is, each time I checked, I had to corrupt the A register. The input methods simply couldn't get around that. So, if I had just used A, the A register would be switching values for every other program run. That wouldn't work.
Instead, I switched into the shadow registers, did the input checking, switched back, and waited for the next call. It worked out pretty well, and did the job!
New contributor
Kyle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Kyle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
answered Oct 2 at 5:21
Kyle
711
711
New contributor
Kyle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Kyle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Kyle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
add a comment |Â
up vote
6
down vote
Back in the early 90s I used the alternate registers in 3 different ways:
- For task switching in a dual task "operating system".
- In a RAM-less control application, temperature sensors and relay control. (208 bits in total was enough).
- In a floating point library. The latter was also done by several others (and better than my attempt), see for example http://www.andreadrian.de/oldcpu/Z80_number_cruncher.html
New contributor
Baard is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
up vote
6
down vote
Back in the early 90s I used the alternate registers in 3 different ways:
- For task switching in a dual task "operating system".
- In a RAM-less control application, temperature sensors and relay control. (208 bits in total was enough).
- In a floating point library. The latter was also done by several others (and better than my attempt), see for example http://www.andreadrian.de/oldcpu/Z80_number_cruncher.html
New contributor
Baard is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
up vote
6
down vote
up vote
6
down vote
Back in the early 90s I used the alternate registers in 3 different ways:
- For task switching in a dual task "operating system".
- In a RAM-less control application, temperature sensors and relay control. (208 bits in total was enough).
- In a floating point library. The latter was also done by several others (and better than my attempt), see for example http://www.andreadrian.de/oldcpu/Z80_number_cruncher.html
New contributor
Baard is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Back in the early 90s I used the alternate registers in 3 different ways:
- For task switching in a dual task "operating system".
- In a RAM-less control application, temperature sensors and relay control. (208 bits in total was enough).
- In a floating point library. The latter was also done by several others (and better than my attempt), see for example http://www.andreadrian.de/oldcpu/Z80_number_cruncher.html
New contributor
Baard is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Baard is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
answered 2 days ago
Baard
1612
1612
New contributor
Baard is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Baard is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Baard is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
add a comment |Â
up vote
4
down vote
Sorry to rain on any parades but, while it's been a long time since I did any serious Z-80 programming, I distinctly remember the alternate registers being one of the major broken features of the Z-80.
You see, they could be used to either:
- Speed up interrupt handling.
- Store alternate data in your main code.
The thing was you could not do both.
So if I was writing a device driver with an interrupt service routine, I could not use the alternate registers to save pushing and popping the main registers. I of course would need to avoid doing this in case the application programmer was using those registers.
And if I was writing application code, I could not use the alternate registers because an ISR could trample all over my data.
The overall result was that most programmers stayed away from the alternate register sets so that there code would run on more machines without crashing and burning. Any savings was swamped by the risk of shipping non-working code.
Safe code may be slower, but broken code does not run at all.
Of course if you have the luxury of controlling all aspects of the system and its coding, you were free to do as you please. I never had that luxury.
2
Since some of the tricks in the other answers depend on pointingspinto non-stack data structures, I think it's implicit that one would disable interrupts on the way into those tight inner loops.
â Henning Makholm
Oct 3 at 14:26
1
@HenningMakholm, funnily enough, some people came up with register use conventions that allowed them to keep interrupts enabled while using SP pointing at non-stack data. Of course, this was happening on micros, where programmers had the luxury of controlling every aspect of the system :)
â introspec
Oct 3 at 18:03
1
I suppose that the luxury in question is a nice way of saying "Having to do all the work yourself." Does not sound so good when you put it that way!
â Peter Camilleri
2 days ago
2
It's hardly fair to describe a feature as broken just because it doesn't fit your particular use case. No, you can't have your cake and eat it; the alternate registers will suit either of the purposes you described but not both together.
â user3570736
2 days ago
add a comment |Â
up vote
4
down vote
Sorry to rain on any parades but, while it's been a long time since I did any serious Z-80 programming, I distinctly remember the alternate registers being one of the major broken features of the Z-80.
You see, they could be used to either:
- Speed up interrupt handling.
- Store alternate data in your main code.
The thing was you could not do both.
So if I was writing a device driver with an interrupt service routine, I could not use the alternate registers to save pushing and popping the main registers. I of course would need to avoid doing this in case the application programmer was using those registers.
And if I was writing application code, I could not use the alternate registers because an ISR could trample all over my data.
The overall result was that most programmers stayed away from the alternate register sets so that there code would run on more machines without crashing and burning. Any savings was swamped by the risk of shipping non-working code.
Safe code may be slower, but broken code does not run at all.
Of course if you have the luxury of controlling all aspects of the system and its coding, you were free to do as you please. I never had that luxury.
2
Since some of the tricks in the other answers depend on pointingspinto non-stack data structures, I think it's implicit that one would disable interrupts on the way into those tight inner loops.
â Henning Makholm
Oct 3 at 14:26
1
@HenningMakholm, funnily enough, some people came up with register use conventions that allowed them to keep interrupts enabled while using SP pointing at non-stack data. Of course, this was happening on micros, where programmers had the luxury of controlling every aspect of the system :)
â introspec
Oct 3 at 18:03
1
I suppose that the luxury in question is a nice way of saying "Having to do all the work yourself." Does not sound so good when you put it that way!
â Peter Camilleri
2 days ago
2
It's hardly fair to describe a feature as broken just because it doesn't fit your particular use case. No, you can't have your cake and eat it; the alternate registers will suit either of the purposes you described but not both together.
â user3570736
2 days ago
add a comment |Â
up vote
4
down vote
up vote
4
down vote
Sorry to rain on any parades but, while it's been a long time since I did any serious Z-80 programming, I distinctly remember the alternate registers being one of the major broken features of the Z-80.
You see, they could be used to either:
- Speed up interrupt handling.
- Store alternate data in your main code.
The thing was you could not do both.
So if I was writing a device driver with an interrupt service routine, I could not use the alternate registers to save pushing and popping the main registers. I of course would need to avoid doing this in case the application programmer was using those registers.
And if I was writing application code, I could not use the alternate registers because an ISR could trample all over my data.
The overall result was that most programmers stayed away from the alternate register sets so that there code would run on more machines without crashing and burning. Any savings was swamped by the risk of shipping non-working code.
Safe code may be slower, but broken code does not run at all.
Of course if you have the luxury of controlling all aspects of the system and its coding, you were free to do as you please. I never had that luxury.
Sorry to rain on any parades but, while it's been a long time since I did any serious Z-80 programming, I distinctly remember the alternate registers being one of the major broken features of the Z-80.
You see, they could be used to either:
- Speed up interrupt handling.
- Store alternate data in your main code.
The thing was you could not do both.
So if I was writing a device driver with an interrupt service routine, I could not use the alternate registers to save pushing and popping the main registers. I of course would need to avoid doing this in case the application programmer was using those registers.
And if I was writing application code, I could not use the alternate registers because an ISR could trample all over my data.
The overall result was that most programmers stayed away from the alternate register sets so that there code would run on more machines without crashing and burning. Any savings was swamped by the risk of shipping non-working code.
Safe code may be slower, but broken code does not run at all.
Of course if you have the luxury of controlling all aspects of the system and its coding, you were free to do as you please. I never had that luxury.
answered Oct 3 at 0:42
Peter Camilleri
71429
71429
2
Since some of the tricks in the other answers depend on pointingspinto non-stack data structures, I think it's implicit that one would disable interrupts on the way into those tight inner loops.
â Henning Makholm
Oct 3 at 14:26
1
@HenningMakholm, funnily enough, some people came up with register use conventions that allowed them to keep interrupts enabled while using SP pointing at non-stack data. Of course, this was happening on micros, where programmers had the luxury of controlling every aspect of the system :)
â introspec
Oct 3 at 18:03
1
I suppose that the luxury in question is a nice way of saying "Having to do all the work yourself." Does not sound so good when you put it that way!
â Peter Camilleri
2 days ago
2
It's hardly fair to describe a feature as broken just because it doesn't fit your particular use case. No, you can't have your cake and eat it; the alternate registers will suit either of the purposes you described but not both together.
â user3570736
2 days ago
add a comment |Â
2
Since some of the tricks in the other answers depend on pointingspinto non-stack data structures, I think it's implicit that one would disable interrupts on the way into those tight inner loops.
â Henning Makholm
Oct 3 at 14:26
1
@HenningMakholm, funnily enough, some people came up with register use conventions that allowed them to keep interrupts enabled while using SP pointing at non-stack data. Of course, this was happening on micros, where programmers had the luxury of controlling every aspect of the system :)
â introspec
Oct 3 at 18:03
1
I suppose that the luxury in question is a nice way of saying "Having to do all the work yourself." Does not sound so good when you put it that way!
â Peter Camilleri
2 days ago
2
It's hardly fair to describe a feature as broken just because it doesn't fit your particular use case. No, you can't have your cake and eat it; the alternate registers will suit either of the purposes you described but not both together.
â user3570736
2 days ago
2
2
Since some of the tricks in the other answers depend on pointing
sp into non-stack data structures, I think it's implicit that one would disable interrupts on the way into those tight inner loops.â Henning Makholm
Oct 3 at 14:26
Since some of the tricks in the other answers depend on pointing
sp into non-stack data structures, I think it's implicit that one would disable interrupts on the way into those tight inner loops.â Henning Makholm
Oct 3 at 14:26
1
1
@HenningMakholm, funnily enough, some people came up with register use conventions that allowed them to keep interrupts enabled while using SP pointing at non-stack data. Of course, this was happening on micros, where programmers had the luxury of controlling every aspect of the system :)
â introspec
Oct 3 at 18:03
@HenningMakholm, funnily enough, some people came up with register use conventions that allowed them to keep interrupts enabled while using SP pointing at non-stack data. Of course, this was happening on micros, where programmers had the luxury of controlling every aspect of the system :)
â introspec
Oct 3 at 18:03
1
1
I suppose that the luxury in question is a nice way of saying "Having to do all the work yourself." Does not sound so good when you put it that way!
â Peter Camilleri
2 days ago
I suppose that the luxury in question is a nice way of saying "Having to do all the work yourself." Does not sound so good when you put it that way!
â Peter Camilleri
2 days ago
2
2
It's hardly fair to describe a feature as broken just because it doesn't fit your particular use case. No, you can't have your cake and eat it; the alternate registers will suit either of the purposes you described but not both together.
â user3570736
2 days ago
It's hardly fair to describe a feature as broken just because it doesn't fit your particular use case. No, you can't have your cake and eat it; the alternate registers will suit either of the purposes you described but not both together.
â user3570736
2 days ago
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fretrocomputing.stackexchange.com%2fquestions%2f7794%2fdid-anyone-ever-use-the-extra-set-of-registers-on-the-z80%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
2
I could see the alternative registers being used for fast context switching, but I'm not sure how efficient they would act as fast global variables. To use them, you would have to swap register sets, (BC, DE, HL with their prime counterpart (AF could also be swapped with a different instruction)). Then you would have to preserve a copy of that data, perhaps onto the stack or into an index register, then swap the sets back. It would probably be quicker just to grab the variable directly from memory.
â RichF
Sep 30 at 20:17
4
On TI-83 series calculators, the OS uses the commands for their intended purpose in the system interrupt code.
â Misha Lavrov
Sep 30 at 23:55
3
I don't know for what purpose but the operating system of the Sinclair ZX81 and the one of the Applied Technologies Microbee 16 used these register banks.
â Martin Rosenau
Oct 1 at 6:16
3
I used it all the time especially for graphics when I needed more register variables. But yes I used it sometimes also for comfortable ISR handling. IIRC some on screen monitors/debuggers used it to avoid stack usage (they where inside SCREEN VRAM and not changing rest of RAM). When I switch to x86 I was missing the extra register file a lot
â Spektre
Oct 1 at 8:38
2
It's been decades since I wrote code for Z80s (way back in the days of early STD bus). It seems to me that we'd switch registers as the first and last instructions in interrupt service routines. But, my memory from the early eighties is foggy.
â Flydog57
Oct 1 at 22:38