Did anyone ever use the extra set of registers on the Z80?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
25
down vote

favorite
6












The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling, though I think if I were programming a Z80 retrocomputer, I would be more likely to use them for fast access to global variables.



Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs. Back in the day, I was on 6502 machines, so I never had occasion to write anything nontrivial on the Z80.



Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?










share|improve this question

















  • 2




    I could see the alternative registers being used for fast context switching, but I'm not sure how efficient they would act as fast global variables. To use them, you would have to swap register sets, (BC, DE, HL with their prime counterpart (AF could also be swapped with a different instruction)). Then you would have to preserve a copy of that data, perhaps onto the stack or into an index register, then swap the sets back. It would probably be quicker just to grab the variable directly from memory.
    – RichF
    Sep 30 at 20:17






  • 4




    On TI-83 series calculators, the OS uses the commands for their intended purpose in the system interrupt code.
    – Misha Lavrov
    Sep 30 at 23:55






  • 3




    I don't know for what purpose but the operating system of the Sinclair ZX81 and the one of the Applied Technologies Microbee 16 used these register banks.
    – Martin Rosenau
    Oct 1 at 6:16






  • 3




    I used it all the time especially for graphics when I needed more register variables. But yes I used it sometimes also for comfortable ISR handling. IIRC some on screen monitors/debuggers used it to avoid stack usage (they where inside SCREEN VRAM and not changing rest of RAM). When I switch to x86 I was missing the extra register file a lot
    – Spektre
    Oct 1 at 8:38







  • 2




    It's been decades since I wrote code for Z80s (way back in the days of early STD bus). It seems to me that we'd switch registers as the first and last instructions in interrupt service routines. But, my memory from the early eighties is foggy.
    – Flydog57
    Oct 1 at 22:38














up vote
25
down vote

favorite
6












The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling, though I think if I were programming a Z80 retrocomputer, I would be more likely to use them for fast access to global variables.



Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs. Back in the day, I was on 6502 machines, so I never had occasion to write anything nontrivial on the Z80.



Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?










share|improve this question

















  • 2




    I could see the alternative registers being used for fast context switching, but I'm not sure how efficient they would act as fast global variables. To use them, you would have to swap register sets, (BC, DE, HL with their prime counterpart (AF could also be swapped with a different instruction)). Then you would have to preserve a copy of that data, perhaps onto the stack or into an index register, then swap the sets back. It would probably be quicker just to grab the variable directly from memory.
    – RichF
    Sep 30 at 20:17






  • 4




    On TI-83 series calculators, the OS uses the commands for their intended purpose in the system interrupt code.
    – Misha Lavrov
    Sep 30 at 23:55






  • 3




    I don't know for what purpose but the operating system of the Sinclair ZX81 and the one of the Applied Technologies Microbee 16 used these register banks.
    – Martin Rosenau
    Oct 1 at 6:16






  • 3




    I used it all the time especially for graphics when I needed more register variables. But yes I used it sometimes also for comfortable ISR handling. IIRC some on screen monitors/debuggers used it to avoid stack usage (they where inside SCREEN VRAM and not changing rest of RAM). When I switch to x86 I was missing the extra register file a lot
    – Spektre
    Oct 1 at 8:38







  • 2




    It's been decades since I wrote code for Z80s (way back in the days of early STD bus). It seems to me that we'd switch registers as the first and last instructions in interrupt service routines. But, my memory from the early eighties is foggy.
    – Flydog57
    Oct 1 at 22:38












up vote
25
down vote

favorite
6









up vote
25
down vote

favorite
6






6





The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling, though I think if I were programming a Z80 retrocomputer, I would be more likely to use them for fast access to global variables.



Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs. Back in the day, I was on 6502 machines, so I never had occasion to write anything nontrivial on the Z80.



Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?










share|improve this question













The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling, though I think if I were programming a Z80 retrocomputer, I would be more likely to use them for fast access to global variables.



Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs. Back in the day, I was on 6502 machines, so I never had occasion to write anything nontrivial on the Z80.



Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?







z80






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Sep 30 at 19:33









rwallace

7,370233101




7,370233101







  • 2




    I could see the alternative registers being used for fast context switching, but I'm not sure how efficient they would act as fast global variables. To use them, you would have to swap register sets, (BC, DE, HL with their prime counterpart (AF could also be swapped with a different instruction)). Then you would have to preserve a copy of that data, perhaps onto the stack or into an index register, then swap the sets back. It would probably be quicker just to grab the variable directly from memory.
    – RichF
    Sep 30 at 20:17






  • 4




    On TI-83 series calculators, the OS uses the commands for their intended purpose in the system interrupt code.
    – Misha Lavrov
    Sep 30 at 23:55






  • 3




    I don't know for what purpose but the operating system of the Sinclair ZX81 and the one of the Applied Technologies Microbee 16 used these register banks.
    – Martin Rosenau
    Oct 1 at 6:16






  • 3




    I used it all the time especially for graphics when I needed more register variables. But yes I used it sometimes also for comfortable ISR handling. IIRC some on screen monitors/debuggers used it to avoid stack usage (they where inside SCREEN VRAM and not changing rest of RAM). When I switch to x86 I was missing the extra register file a lot
    – Spektre
    Oct 1 at 8:38







  • 2




    It's been decades since I wrote code for Z80s (way back in the days of early STD bus). It seems to me that we'd switch registers as the first and last instructions in interrupt service routines. But, my memory from the early eighties is foggy.
    – Flydog57
    Oct 1 at 22:38












  • 2




    I could see the alternative registers being used for fast context switching, but I'm not sure how efficient they would act as fast global variables. To use them, you would have to swap register sets, (BC, DE, HL with their prime counterpart (AF could also be swapped with a different instruction)). Then you would have to preserve a copy of that data, perhaps onto the stack or into an index register, then swap the sets back. It would probably be quicker just to grab the variable directly from memory.
    – RichF
    Sep 30 at 20:17






  • 4




    On TI-83 series calculators, the OS uses the commands for their intended purpose in the system interrupt code.
    – Misha Lavrov
    Sep 30 at 23:55






  • 3




    I don't know for what purpose but the operating system of the Sinclair ZX81 and the one of the Applied Technologies Microbee 16 used these register banks.
    – Martin Rosenau
    Oct 1 at 6:16






  • 3




    I used it all the time especially for graphics when I needed more register variables. But yes I used it sometimes also for comfortable ISR handling. IIRC some on screen monitors/debuggers used it to avoid stack usage (they where inside SCREEN VRAM and not changing rest of RAM). When I switch to x86 I was missing the extra register file a lot
    – Spektre
    Oct 1 at 8:38







  • 2




    It's been decades since I wrote code for Z80s (way back in the days of early STD bus). It seems to me that we'd switch registers as the first and last instructions in interrupt service routines. But, my memory from the early eighties is foggy.
    – Flydog57
    Oct 1 at 22:38







2




2




I could see the alternative registers being used for fast context switching, but I'm not sure how efficient they would act as fast global variables. To use them, you would have to swap register sets, (BC, DE, HL with their prime counterpart (AF could also be swapped with a different instruction)). Then you would have to preserve a copy of that data, perhaps onto the stack or into an index register, then swap the sets back. It would probably be quicker just to grab the variable directly from memory.
– RichF
Sep 30 at 20:17




I could see the alternative registers being used for fast context switching, but I'm not sure how efficient they would act as fast global variables. To use them, you would have to swap register sets, (BC, DE, HL with their prime counterpart (AF could also be swapped with a different instruction)). Then you would have to preserve a copy of that data, perhaps onto the stack or into an index register, then swap the sets back. It would probably be quicker just to grab the variable directly from memory.
– RichF
Sep 30 at 20:17




4




4




On TI-83 series calculators, the OS uses the commands for their intended purpose in the system interrupt code.
– Misha Lavrov
Sep 30 at 23:55




On TI-83 series calculators, the OS uses the commands for their intended purpose in the system interrupt code.
– Misha Lavrov
Sep 30 at 23:55




3




3




I don't know for what purpose but the operating system of the Sinclair ZX81 and the one of the Applied Technologies Microbee 16 used these register banks.
– Martin Rosenau
Oct 1 at 6:16




I don't know for what purpose but the operating system of the Sinclair ZX81 and the one of the Applied Technologies Microbee 16 used these register banks.
– Martin Rosenau
Oct 1 at 6:16




3




3




I used it all the time especially for graphics when I needed more register variables. But yes I used it sometimes also for comfortable ISR handling. IIRC some on screen monitors/debuggers used it to avoid stack usage (they where inside SCREEN VRAM and not changing rest of RAM). When I switch to x86 I was missing the extra register file a lot
– Spektre
Oct 1 at 8:38





I used it all the time especially for graphics when I needed more register variables. But yes I used it sometimes also for comfortable ISR handling. IIRC some on screen monitors/debuggers used it to avoid stack usage (they where inside SCREEN VRAM and not changing rest of RAM). When I switch to x86 I was missing the extra register file a lot
– Spektre
Oct 1 at 8:38





2




2




It's been decades since I wrote code for Z80s (way back in the days of early STD bus). It seems to me that we'd switch registers as the first and last instructions in interrupt service routines. But, my memory from the early eighties is foggy.
– Flydog57
Oct 1 at 22:38




It's been decades since I wrote code for Z80s (way back in the days of early STD bus). It seems to me that we'd switch registers as the first and last instructions in interrupt service routines. But, my memory from the early eighties is foggy.
– Flydog57
Oct 1 at 22:38










10 Answers
10






active

oldest

votes

















up vote
30
down vote



accepted










The key to efficient programming on Z80 is to use registers as much as possible. I can easily believe that designers of Z80 intended the use of the alternative set of registers as an efficient way of context switching. However, the context switching does not tend to happen often enough to use the alternative set of registers only for that; the gains are simply not worth it most of the time. Hence, the good practice of Z80 programming is typically about using as many registers as possible and still use stack for saving registers during the interrupts.



Now, let me give you several ideas on how one would benefit from having two sets of equivalent registers. A typical pixel scrolling for 16 byte wide bitmap can look e.g. as follows:



rl (hl) : dec l ; repeated 16 times


What if one needs to scroll by 2 pixels at a time?



rl (hl) : ex af,af' : rl (hl) : ex af,af' : dec l ; repeated 16 times


is the fastest way. OK, this is only using the second accumulator. Let us consider fast copying. The obvious



ld a,(hl) : ld (de),a : inc hl : inc de ; 26 t-states


which is actually very slow. Unrolled



ldi ; 16 t-states


is better and, in fact, is often acceptably fast. However, the fastest copiers are based on (semi-)unrolled code loading and saving the data via the stack, e.g. as follows:



ld sp,.. : pop af : pop bc : pop de : pop hl
exx : ex af,af' : pop af : pop bc : pop de : pop hl
ld sp,.. : push hl : push de : push bc : push af
exx : ex af,af' : push hl : push de : push bc : push af
; 10+10*4 + 4*2+10*4 + 10+11*4 + 4*2+11*4 = 204 t-states per 16 bytes


i.e. 12.75 t-states per byte. And note that this is not esoteric; variations of this idea were used in a huge number of commercial games on ZX Spectrum.



Much non-trivial code, e.g. fast polygon fillers or texture mappers are only possible with decent speed if one uses both sets of registers simultaneously.






share|improve this answer
















  • 3




    I bet they did not, because if they did, the command set would have been very different. But the same can be said about pretty much every popular architechture: once it becomes popular, people find innovative ways to exploit it. And given how the software library for, say, ZX Spectrum, is absolutely dominated by game titles, I'd never call graphics fiddling esoteric...
    – introspec
    Oct 1 at 13:30






  • 4




    Interrupts don't have to happen often for there to be value in rapidly switching to the interrupt context - it's the latency of response that motivates the quick switch there. Of course, if your interrupts can tolerate more time saving state, then the alternate registers are available for your programs, and may increase your throughput of work.
    – Toby Speight
    Oct 2 at 10:15






  • 2




    By my count, the code you give to scroll by one pixel takes 19 cycles per byte, and the code to scroll by two takes 42. Using ld a,(hl) / rla / ld (hl),a / inc l would take 22, and if there were a way to adapt that to shift two bits at an extra cost of 16 cycles or fewer that could be a win, but the fact that swapping flags also swaps A would mean the cheapest approach I can see would cost 20 extra cycles.
    – supercat
    Oct 2 at 18:29






  • 1




    @TobySpeight, my answer is likely to be coloured by my experience of coding small home computers, mostly ZX Spectrum-compatibles. Some of the most basic ones did not have any sources of interrupts; ZX Spectrum compatibles have a ~50Hz frame interrupt that does not really require any kind of response from the coder. I would actually love to learn about common Z80-based architectures where the response latency mattered, because I do not know a single example of this situation.
    – introspec
    Oct 2 at 21:53






  • 1




    @cat, I do fair amount of Z80 coding even nowadays and I find traditional form of assembly code (one command per line) incredibly diluted. Thus, I use assembler that allows multiple commands per line (colon is a command separator, just like you guessed). I find that putting related groups of commands together onto a single line increases the readability, because you use the screen space better and can also group commands by their intended action. I recognise that some people find it off-putting. I hope you can recognise that I find the more traditional format just as off-putting.
    – introspec
    Oct 2 at 21:59

















up vote
17
down vote














The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling,




Indeed they were intended for fast interrupt reaction. In a simple, general way, this saved the time to push the main process' registers onto the stack and restore them again. they spend single byte opcodes to do so to get the absolute minimum execution time - like the Z80 Technical Manual states on p.26:



OP code 08H allows the programmer to switch between the two pairs of 
accumulator flag registers while D9H allows the programmer to switch between
the duplicate set of six general purpose registers. These OP codes are only one
byte in length to absolutely minimize the time necessary to perform the
exchange so that the duplicate banks can be used to effect very fast interrupt
response times.


EX and EXX only take 4 T-cycles, while even just pushing a simple 16-bit register would take 11 cycles plus another 15 to load it again. 8 T-cycles instead of 25 or more cycles is a considerably faster reaction, isn't it?



That's also why there are two EX* instruction, as very simple routines may only (use and) need to preserve the flags and A. This leaves the whole second set (except AF) for other purposes. Like being used in normal software, or for even more speedup in I/O.



After all, the second set can not only be used for some kind of fast 'stack' but also be prepared for a certain I/O operation. Think maybe of a serial interface receiving at high speed. Loading things like the memory pointer where received data is to be placed, the number of bytes to receive and so on, does take quite some time (16 T-Cycles for a 16 Bit pointer, 13 for a byte value) - and they need to be stored later on as well.



If these values are placed in the second register set before the high speed interrupt driven routine gets active, no loads and stores are to be executed. Interrupt service time gets reduced to the absolute minimum, not only causing less interruption of the main process but also working up to higher speeds.



After all, the Z80 design was mainly focused on a more flexible, configurable and faster interrupt handling.




though I think if I were programming a Z80 retro computer, I would be more likely to use them for fast access to global variables.




I can't see much gain here. Sure, 6 additional bytes or 3 pointers, but at the same time, you can't access the other ones. So there are not many cases where the secondary register set is helpful - besides interrupts and 'dead end' subroutines.




Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs.




Well, it's exactly the region where they are useful - to speed up small functions.




Back in the day, I was on 6502 machines, so I never had occasion to write anything non-trivial on the Z80.




Did both, and while they need different approaches, the result is usually quite similar.




Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?




It was quite common to use them either for interrupt (mostly in embedded systems) or 'dead end' routines.






share|improve this answer






















  • So within a single task, the most likely place to use them would be in leaf subroutines, so you can have the use of a full set of registers without having to spend cycles saving and restoring those used by the rest of the program. Okay, that makes sense.
    – rwallace
    Sep 30 at 20:51






  • 1




    @rwallace yes, except there's till the issue of parameter passing.
    – Raffzahn
    Sep 30 at 21:20






  • 1




    I miss writing code like this, with execution times and cycles in mind. So much code written today ignores what was such a fundamental concept then.
    – coteyr
    Oct 1 at 7:35










  • Can you elaborate what you mean with "dead end" routines (last sentence)?
    – tofro
    Oct 1 at 7:53










  • By "dead-end" subroutines, do you mean what compiler-writers sometimes call "leaf-functions", i.e. functions which do not call anything else?
    – Wilson
    Oct 1 at 8:07

















up vote
13
down vote













For my Aquarius Micro-Expander I used the alternate register set to speed up text display.



The Mattel Aquarius has a character based screen that is physically 40 characters by 25 lines, but to stay out of the overscan area it only uses 38 characters by 24 lines. With such odd dimensions calculating the cursor address takes a lot of code, and combined with other overhead makes printing to the screen quite slow. I wanted to reduce the overhead as much as possible, and also have more flexible line width and positioning.



Here's my core code for putting a character on screen. HL' holds the current screen address and D' holds the cursor position on the line. Higher level routines test and manipulate these registers to wrap at line ends and scroll text in windows.



WinPrtChr:
EXX
LD (HL),A ; poke char into screen memory
INC HL ; HL' = next screen address
INC D ; D' = next screen x
EXX
RET

BackSpace:
EXX
LD (HL)," " ; poke SPACE into screen memory
DEC HL ; HL' = previous screen address
DEC D ; D' = previous screen x
EXX
RET


Using alternate registers was much faster than storing the variables in RAM, and kept the normal registers free for other uses. The result was fewer memory accesses and faster operation in general. My code prints text into windows of arbitrary size about 20 times faster than the system routines.



Unfortunately the Aquarius system ROM also uses alternate registers when scanning the keyboard, so I had to wrap the keyboard routines with a pair of EXX's (to get back to the normal register set) and save the affected registers on the stack when getting keyboard input. However I didn't have to worry about interrupts (where the alternate register set is often used for fast context saving) because the Aquarius doesn't have any!






share|improve this answer



























    up vote
    11
    down vote













    A specific example is the chess program Sargon (written in 1978 or a bit earlier), which used them in a couple of leaf subroutines.



    Search the assembler listing at http://smecers.appspot.com/govs/Oldies/Sargon.htm for the EXX instruction. (It's in routines XCHNG and NEXTAD)



    The code is well documented, if anyone wants to explore the cost of alternative coding techniques.



    Apart from Sargon, I'm pretty sure they were used in the code examples in some "how to write interrupt routines" books of the period, as a "quick" way to save all the registers, but I don't have any references or that.






    share|improve this answer



























      up vote
      11
      down vote













      The question asks if the alternate register set was "ever" used ... this answer is about a modern use of it.



      The z88dk C compiler, sccz80[1] uses a calling convention where the registers B', C', D', E', H' and L' are used to hold the first 48-bit floating point argument to a function, or a floating point return value. Benchmark results suggest that this is a very good strategy, as the results are similar in performance to the 32-bit floating point used by most other Z80 C compilers.



      [1] - actually, one of the two C compilers that are part of the z88dk toolchain, the other being a patched version of sdcc that uses 32-bit floats with a different calling convention






      share|improve this answer




















      • As you say one of the float libraries uses the exx set to hold the main floating point accumulator which is 48-bit in BCDEHL. Two floating point values can be held at the same time, one in the main set BCDEHL and one in the exx set BCDEHL'. This is quite fast and allows the float library to be re-entrant on top of that.
        – aralbrec
        17 hours ago











      • The fastest integer math (multiply and divide) also involves the exx set. You'll see it used in 32-,64-,72-bit math and combinations with lower bit sizes. In stdio, the exx set is used to hold tallies like number of characters output and counters for sending buffers one char at a time when a device can't do a block on its own. The main set is then used by the driver to communicate with the device. The exx set is used in more places than that but those are the important ones. It improves performance so much that it's not worthwhile to reserve for fast interrupt response.
        – aralbrec
        17 hours ago

















      up vote
      9
      down vote













      As usual, the Z80 Oral History provides some insight into the motivation behind the alternate registers and exchange instructions:



      [Faggin]




      So I wanted to
      have a couple of index registers, more 16-bit operations, a better
      interrupt structure. The whole idea of doubling the number of
      registers. And I could exchange the register with an exchange
      instruction, the whole register set. That was an idea that I had used
      already on the Intel 4040. So that one could serve it up very fast if
      that was a necessity. And on and on.




      [Shima]




      Thirdly, in order to support the highspeed
      task switching, in the beginning I asked to complete two sets of register files including the program
      counter. But it was too complicated for customer. Then we gave up on [the idea] of the two sets of
      general purpose registers.



      ...



      And two instruction codes were used for the exchange, the set of
      general purpose register, and exchanging the set of accumulator and
      flags. Also those support the high-speed task switching.




      (And while it may be obvious, it's worth pointing out that in the silicon implementation, the exchange instructions merely modify the state of the register file addressing logic.)






      share|improve this answer
















      • 2




        I wonder how much it would have cost to have separate prime-select bits for BC, DE, and HL, and have 16 or 32 DD-prefix opcodes xor those bits along with AF/AF' (and perhaps the DE/HL selection bit) with bits from the second byte. That could have made the secondary registers a lot more useful.
        – supercat
        Oct 2 at 18:43

















      up vote
      8
      down vote














      Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?




      This being one occasion when a personal experience answer will do, EXX is ideal for the very specific task of multiplying a 16-bit 2d vector by a scalar, which makes it helpful for 2d vector graphics, and the projection part of 3d vector graphics.



      Specifically:



      • use A for the multiplier — rotate right from it into carry;

      • use BC and BC' for the working copy of the multiplicands; these will need shifting left on each iteration;

      • use HL and HL' to accumulate the result; perform ADD HL, BC if carry is set after the RRA.

      So the specific convenient observations are:



      • you're juggling four 16-bit quantities, but they interact only in pairs;

      • and using EXX lets you use the 16-bit arithmetic that's right there on the main instruction page.





      share|improve this answer



























        up vote
        7
        down vote













        Following the spirit of personal examples, here's one from when I was in high school: I, like everyone else, used a Ti 84 calculator for my Math classes. Said calculator used a Z80.



        Out of a mixture of boredom and curiosity, I wrote a program that printed "WOZ IS ALWAYS THE ANSWER" any time the user hit the enter key. The program didn't need to be running for this to happen, and it could happen anywhere in the menu system.



        To do that, I used the chip's interrupts to check the input every 1/140th second. The thing is, each time I checked, I had to corrupt the A register. The input methods simply couldn't get around that. So, if I had just used A, the A register would be switching values for every other program run. That wouldn't work.



        Instead, I switched into the shadow registers, did the input checking, switched back, and waited for the next call. It worked out pretty well, and did the job!






        share|improve this answer








        New contributor




        Kyle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.
























          up vote
          6
          down vote













          Back in the early 90s I used the alternate registers in 3 different ways:



          1. For task switching in a dual task "operating system".

          2. In a RAM-less control application, temperature sensors and relay control. (208 bits in total was enough).

          3. In a floating point library. The latter was also done by several others (and better than my attempt), see for example http://www.andreadrian.de/oldcpu/Z80_number_cruncher.html





          share|improve this answer








          New contributor




          Baard is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.
























            up vote
            4
            down vote













            Sorry to rain on any parades but, while it's been a long time since I did any serious Z-80 programming, I distinctly remember the alternate registers being one of the major broken features of the Z-80.



            You see, they could be used to either:



            • Speed up interrupt handling.

            • Store alternate data in your main code.

            The thing was you could not do both.



            So if I was writing a device driver with an interrupt service routine, I could not use the alternate registers to save pushing and popping the main registers. I of course would need to avoid doing this in case the application programmer was using those registers.



            And if I was writing application code, I could not use the alternate registers because an ISR could trample all over my data.



            The overall result was that most programmers stayed away from the alternate register sets so that there code would run on more machines without crashing and burning. Any savings was swamped by the risk of shipping non-working code.



            Safe code may be slower, but broken code does not run at all.



            Of course if you have the luxury of controlling all aspects of the system and its coding, you were free to do as you please. I never had that luxury.






            share|improve this answer
















            • 2




              Since some of the tricks in the other answers depend on pointing sp into non-stack data structures, I think it's implicit that one would disable interrupts on the way into those tight inner loops.
              – Henning Makholm
              Oct 3 at 14:26






            • 1




              @HenningMakholm, funnily enough, some people came up with register use conventions that allowed them to keep interrupts enabled while using SP pointing at non-stack data. Of course, this was happening on micros, where programmers had the luxury of controlling every aspect of the system :)
              – introspec
              Oct 3 at 18:03






            • 1




              I suppose that the luxury in question is a nice way of saying "Having to do all the work yourself." Does not sound so good when you put it that way!
              – Peter Camilleri
              2 days ago






            • 2




              It's hardly fair to describe a feature as broken just because it doesn't fit your particular use case. No, you can't have your cake and eat it; the alternate registers will suit either of the purposes you described but not both together.
              – user3570736
              2 days ago










            Your Answer







            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "648"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: false,
            noModals: false,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            noCode: true, onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fretrocomputing.stackexchange.com%2fquestions%2f7794%2fdid-anyone-ever-use-the-extra-set-of-registers-on-the-z80%23new-answer', 'question_page');

            );

            Post as a guest






























            10 Answers
            10






            active

            oldest

            votes








            10 Answers
            10






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            30
            down vote



            accepted










            The key to efficient programming on Z80 is to use registers as much as possible. I can easily believe that designers of Z80 intended the use of the alternative set of registers as an efficient way of context switching. However, the context switching does not tend to happen often enough to use the alternative set of registers only for that; the gains are simply not worth it most of the time. Hence, the good practice of Z80 programming is typically about using as many registers as possible and still use stack for saving registers during the interrupts.



            Now, let me give you several ideas on how one would benefit from having two sets of equivalent registers. A typical pixel scrolling for 16 byte wide bitmap can look e.g. as follows:



            rl (hl) : dec l ; repeated 16 times


            What if one needs to scroll by 2 pixels at a time?



            rl (hl) : ex af,af' : rl (hl) : ex af,af' : dec l ; repeated 16 times


            is the fastest way. OK, this is only using the second accumulator. Let us consider fast copying. The obvious



            ld a,(hl) : ld (de),a : inc hl : inc de ; 26 t-states


            which is actually very slow. Unrolled



            ldi ; 16 t-states


            is better and, in fact, is often acceptably fast. However, the fastest copiers are based on (semi-)unrolled code loading and saving the data via the stack, e.g. as follows:



            ld sp,.. : pop af : pop bc : pop de : pop hl
            exx : ex af,af' : pop af : pop bc : pop de : pop hl
            ld sp,.. : push hl : push de : push bc : push af
            exx : ex af,af' : push hl : push de : push bc : push af
            ; 10+10*4 + 4*2+10*4 + 10+11*4 + 4*2+11*4 = 204 t-states per 16 bytes


            i.e. 12.75 t-states per byte. And note that this is not esoteric; variations of this idea were used in a huge number of commercial games on ZX Spectrum.



            Much non-trivial code, e.g. fast polygon fillers or texture mappers are only possible with decent speed if one uses both sets of registers simultaneously.






            share|improve this answer
















            • 3




              I bet they did not, because if they did, the command set would have been very different. But the same can be said about pretty much every popular architechture: once it becomes popular, people find innovative ways to exploit it. And given how the software library for, say, ZX Spectrum, is absolutely dominated by game titles, I'd never call graphics fiddling esoteric...
              – introspec
              Oct 1 at 13:30






            • 4




              Interrupts don't have to happen often for there to be value in rapidly switching to the interrupt context - it's the latency of response that motivates the quick switch there. Of course, if your interrupts can tolerate more time saving state, then the alternate registers are available for your programs, and may increase your throughput of work.
              – Toby Speight
              Oct 2 at 10:15






            • 2




              By my count, the code you give to scroll by one pixel takes 19 cycles per byte, and the code to scroll by two takes 42. Using ld a,(hl) / rla / ld (hl),a / inc l would take 22, and if there were a way to adapt that to shift two bits at an extra cost of 16 cycles or fewer that could be a win, but the fact that swapping flags also swaps A would mean the cheapest approach I can see would cost 20 extra cycles.
              – supercat
              Oct 2 at 18:29






            • 1




              @TobySpeight, my answer is likely to be coloured by my experience of coding small home computers, mostly ZX Spectrum-compatibles. Some of the most basic ones did not have any sources of interrupts; ZX Spectrum compatibles have a ~50Hz frame interrupt that does not really require any kind of response from the coder. I would actually love to learn about common Z80-based architectures where the response latency mattered, because I do not know a single example of this situation.
              – introspec
              Oct 2 at 21:53






            • 1




              @cat, I do fair amount of Z80 coding even nowadays and I find traditional form of assembly code (one command per line) incredibly diluted. Thus, I use assembler that allows multiple commands per line (colon is a command separator, just like you guessed). I find that putting related groups of commands together onto a single line increases the readability, because you use the screen space better and can also group commands by their intended action. I recognise that some people find it off-putting. I hope you can recognise that I find the more traditional format just as off-putting.
              – introspec
              Oct 2 at 21:59














            up vote
            30
            down vote



            accepted










            The key to efficient programming on Z80 is to use registers as much as possible. I can easily believe that designers of Z80 intended the use of the alternative set of registers as an efficient way of context switching. However, the context switching does not tend to happen often enough to use the alternative set of registers only for that; the gains are simply not worth it most of the time. Hence, the good practice of Z80 programming is typically about using as many registers as possible and still use stack for saving registers during the interrupts.



            Now, let me give you several ideas on how one would benefit from having two sets of equivalent registers. A typical pixel scrolling for 16 byte wide bitmap can look e.g. as follows:



            rl (hl) : dec l ; repeated 16 times


            What if one needs to scroll by 2 pixels at a time?



            rl (hl) : ex af,af' : rl (hl) : ex af,af' : dec l ; repeated 16 times


            is the fastest way. OK, this is only using the second accumulator. Let us consider fast copying. The obvious



            ld a,(hl) : ld (de),a : inc hl : inc de ; 26 t-states


            which is actually very slow. Unrolled



            ldi ; 16 t-states


            is better and, in fact, is often acceptably fast. However, the fastest copiers are based on (semi-)unrolled code loading and saving the data via the stack, e.g. as follows:



            ld sp,.. : pop af : pop bc : pop de : pop hl
            exx : ex af,af' : pop af : pop bc : pop de : pop hl
            ld sp,.. : push hl : push de : push bc : push af
            exx : ex af,af' : push hl : push de : push bc : push af
            ; 10+10*4 + 4*2+10*4 + 10+11*4 + 4*2+11*4 = 204 t-states per 16 bytes


            i.e. 12.75 t-states per byte. And note that this is not esoteric; variations of this idea were used in a huge number of commercial games on ZX Spectrum.



            Much non-trivial code, e.g. fast polygon fillers or texture mappers are only possible with decent speed if one uses both sets of registers simultaneously.






            share|improve this answer
















            • 3




              I bet they did not, because if they did, the command set would have been very different. But the same can be said about pretty much every popular architechture: once it becomes popular, people find innovative ways to exploit it. And given how the software library for, say, ZX Spectrum, is absolutely dominated by game titles, I'd never call graphics fiddling esoteric...
              – introspec
              Oct 1 at 13:30






            • 4




              Interrupts don't have to happen often for there to be value in rapidly switching to the interrupt context - it's the latency of response that motivates the quick switch there. Of course, if your interrupts can tolerate more time saving state, then the alternate registers are available for your programs, and may increase your throughput of work.
              – Toby Speight
              Oct 2 at 10:15






            • 2




              By my count, the code you give to scroll by one pixel takes 19 cycles per byte, and the code to scroll by two takes 42. Using ld a,(hl) / rla / ld (hl),a / inc l would take 22, and if there were a way to adapt that to shift two bits at an extra cost of 16 cycles or fewer that could be a win, but the fact that swapping flags also swaps A would mean the cheapest approach I can see would cost 20 extra cycles.
              – supercat
              Oct 2 at 18:29






            • 1




              @TobySpeight, my answer is likely to be coloured by my experience of coding small home computers, mostly ZX Spectrum-compatibles. Some of the most basic ones did not have any sources of interrupts; ZX Spectrum compatibles have a ~50Hz frame interrupt that does not really require any kind of response from the coder. I would actually love to learn about common Z80-based architectures where the response latency mattered, because I do not know a single example of this situation.
              – introspec
              Oct 2 at 21:53






            • 1




              @cat, I do fair amount of Z80 coding even nowadays and I find traditional form of assembly code (one command per line) incredibly diluted. Thus, I use assembler that allows multiple commands per line (colon is a command separator, just like you guessed). I find that putting related groups of commands together onto a single line increases the readability, because you use the screen space better and can also group commands by their intended action. I recognise that some people find it off-putting. I hope you can recognise that I find the more traditional format just as off-putting.
              – introspec
              Oct 2 at 21:59












            up vote
            30
            down vote



            accepted







            up vote
            30
            down vote



            accepted






            The key to efficient programming on Z80 is to use registers as much as possible. I can easily believe that designers of Z80 intended the use of the alternative set of registers as an efficient way of context switching. However, the context switching does not tend to happen often enough to use the alternative set of registers only for that; the gains are simply not worth it most of the time. Hence, the good practice of Z80 programming is typically about using as many registers as possible and still use stack for saving registers during the interrupts.



            Now, let me give you several ideas on how one would benefit from having two sets of equivalent registers. A typical pixel scrolling for 16 byte wide bitmap can look e.g. as follows:



            rl (hl) : dec l ; repeated 16 times


            What if one needs to scroll by 2 pixels at a time?



            rl (hl) : ex af,af' : rl (hl) : ex af,af' : dec l ; repeated 16 times


            is the fastest way. OK, this is only using the second accumulator. Let us consider fast copying. The obvious



            ld a,(hl) : ld (de),a : inc hl : inc de ; 26 t-states


            which is actually very slow. Unrolled



            ldi ; 16 t-states


            is better and, in fact, is often acceptably fast. However, the fastest copiers are based on (semi-)unrolled code loading and saving the data via the stack, e.g. as follows:



            ld sp,.. : pop af : pop bc : pop de : pop hl
            exx : ex af,af' : pop af : pop bc : pop de : pop hl
            ld sp,.. : push hl : push de : push bc : push af
            exx : ex af,af' : push hl : push de : push bc : push af
            ; 10+10*4 + 4*2+10*4 + 10+11*4 + 4*2+11*4 = 204 t-states per 16 bytes


            i.e. 12.75 t-states per byte. And note that this is not esoteric; variations of this idea were used in a huge number of commercial games on ZX Spectrum.



            Much non-trivial code, e.g. fast polygon fillers or texture mappers are only possible with decent speed if one uses both sets of registers simultaneously.






            share|improve this answer












            The key to efficient programming on Z80 is to use registers as much as possible. I can easily believe that designers of Z80 intended the use of the alternative set of registers as an efficient way of context switching. However, the context switching does not tend to happen often enough to use the alternative set of registers only for that; the gains are simply not worth it most of the time. Hence, the good practice of Z80 programming is typically about using as many registers as possible and still use stack for saving registers during the interrupts.



            Now, let me give you several ideas on how one would benefit from having two sets of equivalent registers. A typical pixel scrolling for 16 byte wide bitmap can look e.g. as follows:



            rl (hl) : dec l ; repeated 16 times


            What if one needs to scroll by 2 pixels at a time?



            rl (hl) : ex af,af' : rl (hl) : ex af,af' : dec l ; repeated 16 times


            is the fastest way. OK, this is only using the second accumulator. Let us consider fast copying. The obvious



            ld a,(hl) : ld (de),a : inc hl : inc de ; 26 t-states


            which is actually very slow. Unrolled



            ldi ; 16 t-states


            is better and, in fact, is often acceptably fast. However, the fastest copiers are based on (semi-)unrolled code loading and saving the data via the stack, e.g. as follows:



            ld sp,.. : pop af : pop bc : pop de : pop hl
            exx : ex af,af' : pop af : pop bc : pop de : pop hl
            ld sp,.. : push hl : push de : push bc : push af
            exx : ex af,af' : push hl : push de : push bc : push af
            ; 10+10*4 + 4*2+10*4 + 10+11*4 + 4*2+11*4 = 204 t-states per 16 bytes


            i.e. 12.75 t-states per byte. And note that this is not esoteric; variations of this idea were used in a huge number of commercial games on ZX Spectrum.



            Much non-trivial code, e.g. fast polygon fillers or texture mappers are only possible with decent speed if one uses both sets of registers simultaneously.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Sep 30 at 21:26









            introspec

            1,6711613




            1,6711613







            • 3




              I bet they did not, because if they did, the command set would have been very different. But the same can be said about pretty much every popular architechture: once it becomes popular, people find innovative ways to exploit it. And given how the software library for, say, ZX Spectrum, is absolutely dominated by game titles, I'd never call graphics fiddling esoteric...
              – introspec
              Oct 1 at 13:30






            • 4




              Interrupts don't have to happen often for there to be value in rapidly switching to the interrupt context - it's the latency of response that motivates the quick switch there. Of course, if your interrupts can tolerate more time saving state, then the alternate registers are available for your programs, and may increase your throughput of work.
              – Toby Speight
              Oct 2 at 10:15






            • 2




              By my count, the code you give to scroll by one pixel takes 19 cycles per byte, and the code to scroll by two takes 42. Using ld a,(hl) / rla / ld (hl),a / inc l would take 22, and if there were a way to adapt that to shift two bits at an extra cost of 16 cycles or fewer that could be a win, but the fact that swapping flags also swaps A would mean the cheapest approach I can see would cost 20 extra cycles.
              – supercat
              Oct 2 at 18:29






            • 1




              @TobySpeight, my answer is likely to be coloured by my experience of coding small home computers, mostly ZX Spectrum-compatibles. Some of the most basic ones did not have any sources of interrupts; ZX Spectrum compatibles have a ~50Hz frame interrupt that does not really require any kind of response from the coder. I would actually love to learn about common Z80-based architectures where the response latency mattered, because I do not know a single example of this situation.
              – introspec
              Oct 2 at 21:53






            • 1




              @cat, I do fair amount of Z80 coding even nowadays and I find traditional form of assembly code (one command per line) incredibly diluted. Thus, I use assembler that allows multiple commands per line (colon is a command separator, just like you guessed). I find that putting related groups of commands together onto a single line increases the readability, because you use the screen space better and can also group commands by their intended action. I recognise that some people find it off-putting. I hope you can recognise that I find the more traditional format just as off-putting.
              – introspec
              Oct 2 at 21:59












            • 3




              I bet they did not, because if they did, the command set would have been very different. But the same can be said about pretty much every popular architechture: once it becomes popular, people find innovative ways to exploit it. And given how the software library for, say, ZX Spectrum, is absolutely dominated by game titles, I'd never call graphics fiddling esoteric...
              – introspec
              Oct 1 at 13:30






            • 4




              Interrupts don't have to happen often for there to be value in rapidly switching to the interrupt context - it's the latency of response that motivates the quick switch there. Of course, if your interrupts can tolerate more time saving state, then the alternate registers are available for your programs, and may increase your throughput of work.
              – Toby Speight
              Oct 2 at 10:15






            • 2




              By my count, the code you give to scroll by one pixel takes 19 cycles per byte, and the code to scroll by two takes 42. Using ld a,(hl) / rla / ld (hl),a / inc l would take 22, and if there were a way to adapt that to shift two bits at an extra cost of 16 cycles or fewer that could be a win, but the fact that swapping flags also swaps A would mean the cheapest approach I can see would cost 20 extra cycles.
              – supercat
              Oct 2 at 18:29






            • 1




              @TobySpeight, my answer is likely to be coloured by my experience of coding small home computers, mostly ZX Spectrum-compatibles. Some of the most basic ones did not have any sources of interrupts; ZX Spectrum compatibles have a ~50Hz frame interrupt that does not really require any kind of response from the coder. I would actually love to learn about common Z80-based architectures where the response latency mattered, because I do not know a single example of this situation.
              – introspec
              Oct 2 at 21:53






            • 1




              @cat, I do fair amount of Z80 coding even nowadays and I find traditional form of assembly code (one command per line) incredibly diluted. Thus, I use assembler that allows multiple commands per line (colon is a command separator, just like you guessed). I find that putting related groups of commands together onto a single line increases the readability, because you use the screen space better and can also group commands by their intended action. I recognise that some people find it off-putting. I hope you can recognise that I find the more traditional format just as off-putting.
              – introspec
              Oct 2 at 21:59







            3




            3




            I bet they did not, because if they did, the command set would have been very different. But the same can be said about pretty much every popular architechture: once it becomes popular, people find innovative ways to exploit it. And given how the software library for, say, ZX Spectrum, is absolutely dominated by game titles, I'd never call graphics fiddling esoteric...
            – introspec
            Oct 1 at 13:30




            I bet they did not, because if they did, the command set would have been very different. But the same can be said about pretty much every popular architechture: once it becomes popular, people find innovative ways to exploit it. And given how the software library for, say, ZX Spectrum, is absolutely dominated by game titles, I'd never call graphics fiddling esoteric...
            – introspec
            Oct 1 at 13:30




            4




            4




            Interrupts don't have to happen often for there to be value in rapidly switching to the interrupt context - it's the latency of response that motivates the quick switch there. Of course, if your interrupts can tolerate more time saving state, then the alternate registers are available for your programs, and may increase your throughput of work.
            – Toby Speight
            Oct 2 at 10:15




            Interrupts don't have to happen often for there to be value in rapidly switching to the interrupt context - it's the latency of response that motivates the quick switch there. Of course, if your interrupts can tolerate more time saving state, then the alternate registers are available for your programs, and may increase your throughput of work.
            – Toby Speight
            Oct 2 at 10:15




            2




            2




            By my count, the code you give to scroll by one pixel takes 19 cycles per byte, and the code to scroll by two takes 42. Using ld a,(hl) / rla / ld (hl),a / inc l would take 22, and if there were a way to adapt that to shift two bits at an extra cost of 16 cycles or fewer that could be a win, but the fact that swapping flags also swaps A would mean the cheapest approach I can see would cost 20 extra cycles.
            – supercat
            Oct 2 at 18:29




            By my count, the code you give to scroll by one pixel takes 19 cycles per byte, and the code to scroll by two takes 42. Using ld a,(hl) / rla / ld (hl),a / inc l would take 22, and if there were a way to adapt that to shift two bits at an extra cost of 16 cycles or fewer that could be a win, but the fact that swapping flags also swaps A would mean the cheapest approach I can see would cost 20 extra cycles.
            – supercat
            Oct 2 at 18:29




            1




            1




            @TobySpeight, my answer is likely to be coloured by my experience of coding small home computers, mostly ZX Spectrum-compatibles. Some of the most basic ones did not have any sources of interrupts; ZX Spectrum compatibles have a ~50Hz frame interrupt that does not really require any kind of response from the coder. I would actually love to learn about common Z80-based architectures where the response latency mattered, because I do not know a single example of this situation.
            – introspec
            Oct 2 at 21:53




            @TobySpeight, my answer is likely to be coloured by my experience of coding small home computers, mostly ZX Spectrum-compatibles. Some of the most basic ones did not have any sources of interrupts; ZX Spectrum compatibles have a ~50Hz frame interrupt that does not really require any kind of response from the coder. I would actually love to learn about common Z80-based architectures where the response latency mattered, because I do not know a single example of this situation.
            – introspec
            Oct 2 at 21:53




            1




            1




            @cat, I do fair amount of Z80 coding even nowadays and I find traditional form of assembly code (one command per line) incredibly diluted. Thus, I use assembler that allows multiple commands per line (colon is a command separator, just like you guessed). I find that putting related groups of commands together onto a single line increases the readability, because you use the screen space better and can also group commands by their intended action. I recognise that some people find it off-putting. I hope you can recognise that I find the more traditional format just as off-putting.
            – introspec
            Oct 2 at 21:59




            @cat, I do fair amount of Z80 coding even nowadays and I find traditional form of assembly code (one command per line) incredibly diluted. Thus, I use assembler that allows multiple commands per line (colon is a command separator, just like you guessed). I find that putting related groups of commands together onto a single line increases the readability, because you use the screen space better and can also group commands by their intended action. I recognise that some people find it off-putting. I hope you can recognise that I find the more traditional format just as off-putting.
            – introspec
            Oct 2 at 21:59










            up vote
            17
            down vote














            The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling,




            Indeed they were intended for fast interrupt reaction. In a simple, general way, this saved the time to push the main process' registers onto the stack and restore them again. they spend single byte opcodes to do so to get the absolute minimum execution time - like the Z80 Technical Manual states on p.26:



            OP code 08H allows the programmer to switch between the two pairs of 
            accumulator flag registers while D9H allows the programmer to switch between
            the duplicate set of six general purpose registers. These OP codes are only one
            byte in length to absolutely minimize the time necessary to perform the
            exchange so that the duplicate banks can be used to effect very fast interrupt
            response times.


            EX and EXX only take 4 T-cycles, while even just pushing a simple 16-bit register would take 11 cycles plus another 15 to load it again. 8 T-cycles instead of 25 or more cycles is a considerably faster reaction, isn't it?



            That's also why there are two EX* instruction, as very simple routines may only (use and) need to preserve the flags and A. This leaves the whole second set (except AF) for other purposes. Like being used in normal software, or for even more speedup in I/O.



            After all, the second set can not only be used for some kind of fast 'stack' but also be prepared for a certain I/O operation. Think maybe of a serial interface receiving at high speed. Loading things like the memory pointer where received data is to be placed, the number of bytes to receive and so on, does take quite some time (16 T-Cycles for a 16 Bit pointer, 13 for a byte value) - and they need to be stored later on as well.



            If these values are placed in the second register set before the high speed interrupt driven routine gets active, no loads and stores are to be executed. Interrupt service time gets reduced to the absolute minimum, not only causing less interruption of the main process but also working up to higher speeds.



            After all, the Z80 design was mainly focused on a more flexible, configurable and faster interrupt handling.




            though I think if I were programming a Z80 retro computer, I would be more likely to use them for fast access to global variables.




            I can't see much gain here. Sure, 6 additional bytes or 3 pointers, but at the same time, you can't access the other ones. So there are not many cases where the secondary register set is helpful - besides interrupts and 'dead end' subroutines.




            Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs.




            Well, it's exactly the region where they are useful - to speed up small functions.




            Back in the day, I was on 6502 machines, so I never had occasion to write anything non-trivial on the Z80.




            Did both, and while they need different approaches, the result is usually quite similar.




            Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?




            It was quite common to use them either for interrupt (mostly in embedded systems) or 'dead end' routines.






            share|improve this answer






















            • So within a single task, the most likely place to use them would be in leaf subroutines, so you can have the use of a full set of registers without having to spend cycles saving and restoring those used by the rest of the program. Okay, that makes sense.
              – rwallace
              Sep 30 at 20:51






            • 1




              @rwallace yes, except there's till the issue of parameter passing.
              – Raffzahn
              Sep 30 at 21:20






            • 1




              I miss writing code like this, with execution times and cycles in mind. So much code written today ignores what was such a fundamental concept then.
              – coteyr
              Oct 1 at 7:35










            • Can you elaborate what you mean with "dead end" routines (last sentence)?
              – tofro
              Oct 1 at 7:53










            • By "dead-end" subroutines, do you mean what compiler-writers sometimes call "leaf-functions", i.e. functions which do not call anything else?
              – Wilson
              Oct 1 at 8:07














            up vote
            17
            down vote














            The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling,




            Indeed they were intended for fast interrupt reaction. In a simple, general way, this saved the time to push the main process' registers onto the stack and restore them again. they spend single byte opcodes to do so to get the absolute minimum execution time - like the Z80 Technical Manual states on p.26:



            OP code 08H allows the programmer to switch between the two pairs of 
            accumulator flag registers while D9H allows the programmer to switch between
            the duplicate set of six general purpose registers. These OP codes are only one
            byte in length to absolutely minimize the time necessary to perform the
            exchange so that the duplicate banks can be used to effect very fast interrupt
            response times.


            EX and EXX only take 4 T-cycles, while even just pushing a simple 16-bit register would take 11 cycles plus another 15 to load it again. 8 T-cycles instead of 25 or more cycles is a considerably faster reaction, isn't it?



            That's also why there are two EX* instruction, as very simple routines may only (use and) need to preserve the flags and A. This leaves the whole second set (except AF) for other purposes. Like being used in normal software, or for even more speedup in I/O.



            After all, the second set can not only be used for some kind of fast 'stack' but also be prepared for a certain I/O operation. Think maybe of a serial interface receiving at high speed. Loading things like the memory pointer where received data is to be placed, the number of bytes to receive and so on, does take quite some time (16 T-Cycles for a 16 Bit pointer, 13 for a byte value) - and they need to be stored later on as well.



            If these values are placed in the second register set before the high speed interrupt driven routine gets active, no loads and stores are to be executed. Interrupt service time gets reduced to the absolute minimum, not only causing less interruption of the main process but also working up to higher speeds.



            After all, the Z80 design was mainly focused on a more flexible, configurable and faster interrupt handling.




            though I think if I were programming a Z80 retro computer, I would be more likely to use them for fast access to global variables.




            I can't see much gain here. Sure, 6 additional bytes or 3 pointers, but at the same time, you can't access the other ones. So there are not many cases where the secondary register set is helpful - besides interrupts and 'dead end' subroutines.




            Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs.




            Well, it's exactly the region where they are useful - to speed up small functions.




            Back in the day, I was on 6502 machines, so I never had occasion to write anything non-trivial on the Z80.




            Did both, and while they need different approaches, the result is usually quite similar.




            Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?




            It was quite common to use them either for interrupt (mostly in embedded systems) or 'dead end' routines.






            share|improve this answer






















            • So within a single task, the most likely place to use them would be in leaf subroutines, so you can have the use of a full set of registers without having to spend cycles saving and restoring those used by the rest of the program. Okay, that makes sense.
              – rwallace
              Sep 30 at 20:51






            • 1




              @rwallace yes, except there's till the issue of parameter passing.
              – Raffzahn
              Sep 30 at 21:20






            • 1




              I miss writing code like this, with execution times and cycles in mind. So much code written today ignores what was such a fundamental concept then.
              – coteyr
              Oct 1 at 7:35










            • Can you elaborate what you mean with "dead end" routines (last sentence)?
              – tofro
              Oct 1 at 7:53










            • By "dead-end" subroutines, do you mean what compiler-writers sometimes call "leaf-functions", i.e. functions which do not call anything else?
              – Wilson
              Oct 1 at 8:07












            up vote
            17
            down vote










            up vote
            17
            down vote










            The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling,




            Indeed they were intended for fast interrupt reaction. In a simple, general way, this saved the time to push the main process' registers onto the stack and restore them again. they spend single byte opcodes to do so to get the absolute minimum execution time - like the Z80 Technical Manual states on p.26:



            OP code 08H allows the programmer to switch between the two pairs of 
            accumulator flag registers while D9H allows the programmer to switch between
            the duplicate set of six general purpose registers. These OP codes are only one
            byte in length to absolutely minimize the time necessary to perform the
            exchange so that the duplicate banks can be used to effect very fast interrupt
            response times.


            EX and EXX only take 4 T-cycles, while even just pushing a simple 16-bit register would take 11 cycles plus another 15 to load it again. 8 T-cycles instead of 25 or more cycles is a considerably faster reaction, isn't it?



            That's also why there are two EX* instruction, as very simple routines may only (use and) need to preserve the flags and A. This leaves the whole second set (except AF) for other purposes. Like being used in normal software, or for even more speedup in I/O.



            After all, the second set can not only be used for some kind of fast 'stack' but also be prepared for a certain I/O operation. Think maybe of a serial interface receiving at high speed. Loading things like the memory pointer where received data is to be placed, the number of bytes to receive and so on, does take quite some time (16 T-Cycles for a 16 Bit pointer, 13 for a byte value) - and they need to be stored later on as well.



            If these values are placed in the second register set before the high speed interrupt driven routine gets active, no loads and stores are to be executed. Interrupt service time gets reduced to the absolute minimum, not only causing less interruption of the main process but also working up to higher speeds.



            After all, the Z80 design was mainly focused on a more flexible, configurable and faster interrupt handling.




            though I think if I were programming a Z80 retro computer, I would be more likely to use them for fast access to global variables.




            I can't see much gain here. Sure, 6 additional bytes or 3 pointers, but at the same time, you can't access the other ones. So there are not many cases where the secondary register set is helpful - besides interrupts and 'dead end' subroutines.




            Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs.




            Well, it's exactly the region where they are useful - to speed up small functions.




            Back in the day, I was on 6502 machines, so I never had occasion to write anything non-trivial on the Z80.




            Did both, and while they need different approaches, the result is usually quite similar.




            Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?




            It was quite common to use them either for interrupt (mostly in embedded systems) or 'dead end' routines.






            share|improve this answer















            The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling,




            Indeed they were intended for fast interrupt reaction. In a simple, general way, this saved the time to push the main process' registers onto the stack and restore them again. they spend single byte opcodes to do so to get the absolute minimum execution time - like the Z80 Technical Manual states on p.26:



            OP code 08H allows the programmer to switch between the two pairs of 
            accumulator flag registers while D9H allows the programmer to switch between
            the duplicate set of six general purpose registers. These OP codes are only one
            byte in length to absolutely minimize the time necessary to perform the
            exchange so that the duplicate banks can be used to effect very fast interrupt
            response times.


            EX and EXX only take 4 T-cycles, while even just pushing a simple 16-bit register would take 11 cycles plus another 15 to load it again. 8 T-cycles instead of 25 or more cycles is a considerably faster reaction, isn't it?



            That's also why there are two EX* instruction, as very simple routines may only (use and) need to preserve the flags and A. This leaves the whole second set (except AF) for other purposes. Like being used in normal software, or for even more speedup in I/O.



            After all, the second set can not only be used for some kind of fast 'stack' but also be prepared for a certain I/O operation. Think maybe of a serial interface receiving at high speed. Loading things like the memory pointer where received data is to be placed, the number of bytes to receive and so on, does take quite some time (16 T-Cycles for a 16 Bit pointer, 13 for a byte value) - and they need to be stored later on as well.



            If these values are placed in the second register set before the high speed interrupt driven routine gets active, no loads and stores are to be executed. Interrupt service time gets reduced to the absolute minimum, not only causing less interruption of the main process but also working up to higher speeds.



            After all, the Z80 design was mainly focused on a more flexible, configurable and faster interrupt handling.




            though I think if I were programming a Z80 retro computer, I would be more likely to use them for fast access to global variables.




            I can't see much gain here. Sure, 6 additional bytes or 3 pointers, but at the same time, you can't access the other ones. So there are not many cases where the secondary register set is helpful - besides interrupts and 'dead end' subroutines.




            Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs.




            Well, it's exactly the region where they are useful - to speed up small functions.




            Back in the day, I was on 6502 machines, so I never had occasion to write anything non-trivial on the Z80.




            Did both, and while they need different approaches, the result is usually quite similar.




            Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?




            It was quite common to use them either for interrupt (mostly in embedded systems) or 'dead end' routines.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Oct 1 at 11:43

























            answered Sep 30 at 20:45









            Raffzahn

            37k482148




            37k482148











            • So within a single task, the most likely place to use them would be in leaf subroutines, so you can have the use of a full set of registers without having to spend cycles saving and restoring those used by the rest of the program. Okay, that makes sense.
              – rwallace
              Sep 30 at 20:51






            • 1




              @rwallace yes, except there's till the issue of parameter passing.
              – Raffzahn
              Sep 30 at 21:20






            • 1




              I miss writing code like this, with execution times and cycles in mind. So much code written today ignores what was such a fundamental concept then.
              – coteyr
              Oct 1 at 7:35










            • Can you elaborate what you mean with "dead end" routines (last sentence)?
              – tofro
              Oct 1 at 7:53










            • By "dead-end" subroutines, do you mean what compiler-writers sometimes call "leaf-functions", i.e. functions which do not call anything else?
              – Wilson
              Oct 1 at 8:07
















            • So within a single task, the most likely place to use them would be in leaf subroutines, so you can have the use of a full set of registers without having to spend cycles saving and restoring those used by the rest of the program. Okay, that makes sense.
              – rwallace
              Sep 30 at 20:51






            • 1




              @rwallace yes, except there's till the issue of parameter passing.
              – Raffzahn
              Sep 30 at 21:20






            • 1




              I miss writing code like this, with execution times and cycles in mind. So much code written today ignores what was such a fundamental concept then.
              – coteyr
              Oct 1 at 7:35










            • Can you elaborate what you mean with "dead end" routines (last sentence)?
              – tofro
              Oct 1 at 7:53










            • By "dead-end" subroutines, do you mean what compiler-writers sometimes call "leaf-functions", i.e. functions which do not call anything else?
              – Wilson
              Oct 1 at 8:07















            So within a single task, the most likely place to use them would be in leaf subroutines, so you can have the use of a full set of registers without having to spend cycles saving and restoring those used by the rest of the program. Okay, that makes sense.
            – rwallace
            Sep 30 at 20:51




            So within a single task, the most likely place to use them would be in leaf subroutines, so you can have the use of a full set of registers without having to spend cycles saving and restoring those used by the rest of the program. Okay, that makes sense.
            – rwallace
            Sep 30 at 20:51




            1




            1




            @rwallace yes, except there's till the issue of parameter passing.
            – Raffzahn
            Sep 30 at 21:20




            @rwallace yes, except there's till the issue of parameter passing.
            – Raffzahn
            Sep 30 at 21:20




            1




            1




            I miss writing code like this, with execution times and cycles in mind. So much code written today ignores what was such a fundamental concept then.
            – coteyr
            Oct 1 at 7:35




            I miss writing code like this, with execution times and cycles in mind. So much code written today ignores what was such a fundamental concept then.
            – coteyr
            Oct 1 at 7:35












            Can you elaborate what you mean with "dead end" routines (last sentence)?
            – tofro
            Oct 1 at 7:53




            Can you elaborate what you mean with "dead end" routines (last sentence)?
            – tofro
            Oct 1 at 7:53












            By "dead-end" subroutines, do you mean what compiler-writers sometimes call "leaf-functions", i.e. functions which do not call anything else?
            – Wilson
            Oct 1 at 8:07




            By "dead-end" subroutines, do you mean what compiler-writers sometimes call "leaf-functions", i.e. functions which do not call anything else?
            – Wilson
            Oct 1 at 8:07










            up vote
            13
            down vote













            For my Aquarius Micro-Expander I used the alternate register set to speed up text display.



            The Mattel Aquarius has a character based screen that is physically 40 characters by 25 lines, but to stay out of the overscan area it only uses 38 characters by 24 lines. With such odd dimensions calculating the cursor address takes a lot of code, and combined with other overhead makes printing to the screen quite slow. I wanted to reduce the overhead as much as possible, and also have more flexible line width and positioning.



            Here's my core code for putting a character on screen. HL' holds the current screen address and D' holds the cursor position on the line. Higher level routines test and manipulate these registers to wrap at line ends and scroll text in windows.



            WinPrtChr:
            EXX
            LD (HL),A ; poke char into screen memory
            INC HL ; HL' = next screen address
            INC D ; D' = next screen x
            EXX
            RET

            BackSpace:
            EXX
            LD (HL)," " ; poke SPACE into screen memory
            DEC HL ; HL' = previous screen address
            DEC D ; D' = previous screen x
            EXX
            RET


            Using alternate registers was much faster than storing the variables in RAM, and kept the normal registers free for other uses. The result was fewer memory accesses and faster operation in general. My code prints text into windows of arbitrary size about 20 times faster than the system routines.



            Unfortunately the Aquarius system ROM also uses alternate registers when scanning the keyboard, so I had to wrap the keyboard routines with a pair of EXX's (to get back to the normal register set) and save the affected registers on the stack when getting keyboard input. However I didn't have to worry about interrupts (where the alternate register set is often used for fast context saving) because the Aquarius doesn't have any!






            share|improve this answer
























              up vote
              13
              down vote













              For my Aquarius Micro-Expander I used the alternate register set to speed up text display.



              The Mattel Aquarius has a character based screen that is physically 40 characters by 25 lines, but to stay out of the overscan area it only uses 38 characters by 24 lines. With such odd dimensions calculating the cursor address takes a lot of code, and combined with other overhead makes printing to the screen quite slow. I wanted to reduce the overhead as much as possible, and also have more flexible line width and positioning.



              Here's my core code for putting a character on screen. HL' holds the current screen address and D' holds the cursor position on the line. Higher level routines test and manipulate these registers to wrap at line ends and scroll text in windows.



              WinPrtChr:
              EXX
              LD (HL),A ; poke char into screen memory
              INC HL ; HL' = next screen address
              INC D ; D' = next screen x
              EXX
              RET

              BackSpace:
              EXX
              LD (HL)," " ; poke SPACE into screen memory
              DEC HL ; HL' = previous screen address
              DEC D ; D' = previous screen x
              EXX
              RET


              Using alternate registers was much faster than storing the variables in RAM, and kept the normal registers free for other uses. The result was fewer memory accesses and faster operation in general. My code prints text into windows of arbitrary size about 20 times faster than the system routines.



              Unfortunately the Aquarius system ROM also uses alternate registers when scanning the keyboard, so I had to wrap the keyboard routines with a pair of EXX's (to get back to the normal register set) and save the affected registers on the stack when getting keyboard input. However I didn't have to worry about interrupts (where the alternate register set is often used for fast context saving) because the Aquarius doesn't have any!






              share|improve this answer






















                up vote
                13
                down vote










                up vote
                13
                down vote









                For my Aquarius Micro-Expander I used the alternate register set to speed up text display.



                The Mattel Aquarius has a character based screen that is physically 40 characters by 25 lines, but to stay out of the overscan area it only uses 38 characters by 24 lines. With such odd dimensions calculating the cursor address takes a lot of code, and combined with other overhead makes printing to the screen quite slow. I wanted to reduce the overhead as much as possible, and also have more flexible line width and positioning.



                Here's my core code for putting a character on screen. HL' holds the current screen address and D' holds the cursor position on the line. Higher level routines test and manipulate these registers to wrap at line ends and scroll text in windows.



                WinPrtChr:
                EXX
                LD (HL),A ; poke char into screen memory
                INC HL ; HL' = next screen address
                INC D ; D' = next screen x
                EXX
                RET

                BackSpace:
                EXX
                LD (HL)," " ; poke SPACE into screen memory
                DEC HL ; HL' = previous screen address
                DEC D ; D' = previous screen x
                EXX
                RET


                Using alternate registers was much faster than storing the variables in RAM, and kept the normal registers free for other uses. The result was fewer memory accesses and faster operation in general. My code prints text into windows of arbitrary size about 20 times faster than the system routines.



                Unfortunately the Aquarius system ROM also uses alternate registers when scanning the keyboard, so I had to wrap the keyboard routines with a pair of EXX's (to get back to the normal register set) and save the affected registers on the stack when getting keyboard input. However I didn't have to worry about interrupts (where the alternate register set is often used for fast context saving) because the Aquarius doesn't have any!






                share|improve this answer












                For my Aquarius Micro-Expander I used the alternate register set to speed up text display.



                The Mattel Aquarius has a character based screen that is physically 40 characters by 25 lines, but to stay out of the overscan area it only uses 38 characters by 24 lines. With such odd dimensions calculating the cursor address takes a lot of code, and combined with other overhead makes printing to the screen quite slow. I wanted to reduce the overhead as much as possible, and also have more flexible line width and positioning.



                Here's my core code for putting a character on screen. HL' holds the current screen address and D' holds the cursor position on the line. Higher level routines test and manipulate these registers to wrap at line ends and scroll text in windows.



                WinPrtChr:
                EXX
                LD (HL),A ; poke char into screen memory
                INC HL ; HL' = next screen address
                INC D ; D' = next screen x
                EXX
                RET

                BackSpace:
                EXX
                LD (HL)," " ; poke SPACE into screen memory
                DEC HL ; HL' = previous screen address
                DEC D ; D' = previous screen x
                EXX
                RET


                Using alternate registers was much faster than storing the variables in RAM, and kept the normal registers free for other uses. The result was fewer memory accesses and faster operation in general. My code prints text into windows of arbitrary size about 20 times faster than the system routines.



                Unfortunately the Aquarius system ROM also uses alternate registers when scanning the keyboard, so I had to wrap the keyboard routines with a pair of EXX's (to get back to the normal register set) and save the affected registers on the stack when getting keyboard input. However I didn't have to worry about interrupts (where the alternate register set is often used for fast context saving) because the Aquarius doesn't have any!







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Oct 1 at 2:13









                Bruce Abbott

                64116




                64116




















                    up vote
                    11
                    down vote













                    A specific example is the chess program Sargon (written in 1978 or a bit earlier), which used them in a couple of leaf subroutines.



                    Search the assembler listing at http://smecers.appspot.com/govs/Oldies/Sargon.htm for the EXX instruction. (It's in routines XCHNG and NEXTAD)



                    The code is well documented, if anyone wants to explore the cost of alternative coding techniques.



                    Apart from Sargon, I'm pretty sure they were used in the code examples in some "how to write interrupt routines" books of the period, as a "quick" way to save all the registers, but I don't have any references or that.






                    share|improve this answer
























                      up vote
                      11
                      down vote













                      A specific example is the chess program Sargon (written in 1978 or a bit earlier), which used them in a couple of leaf subroutines.



                      Search the assembler listing at http://smecers.appspot.com/govs/Oldies/Sargon.htm for the EXX instruction. (It's in routines XCHNG and NEXTAD)



                      The code is well documented, if anyone wants to explore the cost of alternative coding techniques.



                      Apart from Sargon, I'm pretty sure they were used in the code examples in some "how to write interrupt routines" books of the period, as a "quick" way to save all the registers, but I don't have any references or that.






                      share|improve this answer






















                        up vote
                        11
                        down vote










                        up vote
                        11
                        down vote









                        A specific example is the chess program Sargon (written in 1978 or a bit earlier), which used them in a couple of leaf subroutines.



                        Search the assembler listing at http://smecers.appspot.com/govs/Oldies/Sargon.htm for the EXX instruction. (It's in routines XCHNG and NEXTAD)



                        The code is well documented, if anyone wants to explore the cost of alternative coding techniques.



                        Apart from Sargon, I'm pretty sure they were used in the code examples in some "how to write interrupt routines" books of the period, as a "quick" way to save all the registers, but I don't have any references or that.






                        share|improve this answer












                        A specific example is the chess program Sargon (written in 1978 or a bit earlier), which used them in a couple of leaf subroutines.



                        Search the assembler listing at http://smecers.appspot.com/govs/Oldies/Sargon.htm for the EXX instruction. (It's in routines XCHNG and NEXTAD)



                        The code is well documented, if anyone wants to explore the cost of alternative coding techniques.



                        Apart from Sargon, I'm pretty sure they were used in the code examples in some "how to write interrupt routines" books of the period, as a "quick" way to save all the registers, but I don't have any references or that.







                        share|improve this answer












                        share|improve this answer



                        share|improve this answer










                        answered Sep 30 at 22:25









                        alephzero

                        84829




                        84829




















                            up vote
                            11
                            down vote













                            The question asks if the alternate register set was "ever" used ... this answer is about a modern use of it.



                            The z88dk C compiler, sccz80[1] uses a calling convention where the registers B', C', D', E', H' and L' are used to hold the first 48-bit floating point argument to a function, or a floating point return value. Benchmark results suggest that this is a very good strategy, as the results are similar in performance to the 32-bit floating point used by most other Z80 C compilers.



                            [1] - actually, one of the two C compilers that are part of the z88dk toolchain, the other being a patched version of sdcc that uses 32-bit floats with a different calling convention






                            share|improve this answer




















                            • As you say one of the float libraries uses the exx set to hold the main floating point accumulator which is 48-bit in BCDEHL. Two floating point values can be held at the same time, one in the main set BCDEHL and one in the exx set BCDEHL'. This is quite fast and allows the float library to be re-entrant on top of that.
                              – aralbrec
                              17 hours ago











                            • The fastest integer math (multiply and divide) also involves the exx set. You'll see it used in 32-,64-,72-bit math and combinations with lower bit sizes. In stdio, the exx set is used to hold tallies like number of characters output and counters for sending buffers one char at a time when a device can't do a block on its own. The main set is then used by the driver to communicate with the device. The exx set is used in more places than that but those are the important ones. It improves performance so much that it's not worthwhile to reserve for fast interrupt response.
                              – aralbrec
                              17 hours ago














                            up vote
                            11
                            down vote













                            The question asks if the alternate register set was "ever" used ... this answer is about a modern use of it.



                            The z88dk C compiler, sccz80[1] uses a calling convention where the registers B', C', D', E', H' and L' are used to hold the first 48-bit floating point argument to a function, or a floating point return value. Benchmark results suggest that this is a very good strategy, as the results are similar in performance to the 32-bit floating point used by most other Z80 C compilers.



                            [1] - actually, one of the two C compilers that are part of the z88dk toolchain, the other being a patched version of sdcc that uses 32-bit floats with a different calling convention






                            share|improve this answer




















                            • As you say one of the float libraries uses the exx set to hold the main floating point accumulator which is 48-bit in BCDEHL. Two floating point values can be held at the same time, one in the main set BCDEHL and one in the exx set BCDEHL'. This is quite fast and allows the float library to be re-entrant on top of that.
                              – aralbrec
                              17 hours ago











                            • The fastest integer math (multiply and divide) also involves the exx set. You'll see it used in 32-,64-,72-bit math and combinations with lower bit sizes. In stdio, the exx set is used to hold tallies like number of characters output and counters for sending buffers one char at a time when a device can't do a block on its own. The main set is then used by the driver to communicate with the device. The exx set is used in more places than that but those are the important ones. It improves performance so much that it's not worthwhile to reserve for fast interrupt response.
                              – aralbrec
                              17 hours ago












                            up vote
                            11
                            down vote










                            up vote
                            11
                            down vote









                            The question asks if the alternate register set was "ever" used ... this answer is about a modern use of it.



                            The z88dk C compiler, sccz80[1] uses a calling convention where the registers B', C', D', E', H' and L' are used to hold the first 48-bit floating point argument to a function, or a floating point return value. Benchmark results suggest that this is a very good strategy, as the results are similar in performance to the 32-bit floating point used by most other Z80 C compilers.



                            [1] - actually, one of the two C compilers that are part of the z88dk toolchain, the other being a patched version of sdcc that uses 32-bit floats with a different calling convention






                            share|improve this answer












                            The question asks if the alternate register set was "ever" used ... this answer is about a modern use of it.



                            The z88dk C compiler, sccz80[1] uses a calling convention where the registers B', C', D', E', H' and L' are used to hold the first 48-bit floating point argument to a function, or a floating point return value. Benchmark results suggest that this is a very good strategy, as the results are similar in performance to the 32-bit floating point used by most other Z80 C compilers.



                            [1] - actually, one of the two C compilers that are part of the z88dk toolchain, the other being a patched version of sdcc that uses 32-bit floats with a different calling convention







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Sep 30 at 22:30









                            Jules

                            7,94012141




                            7,94012141











                            • As you say one of the float libraries uses the exx set to hold the main floating point accumulator which is 48-bit in BCDEHL. Two floating point values can be held at the same time, one in the main set BCDEHL and one in the exx set BCDEHL'. This is quite fast and allows the float library to be re-entrant on top of that.
                              – aralbrec
                              17 hours ago











                            • The fastest integer math (multiply and divide) also involves the exx set. You'll see it used in 32-,64-,72-bit math and combinations with lower bit sizes. In stdio, the exx set is used to hold tallies like number of characters output and counters for sending buffers one char at a time when a device can't do a block on its own. The main set is then used by the driver to communicate with the device. The exx set is used in more places than that but those are the important ones. It improves performance so much that it's not worthwhile to reserve for fast interrupt response.
                              – aralbrec
                              17 hours ago
















                            • As you say one of the float libraries uses the exx set to hold the main floating point accumulator which is 48-bit in BCDEHL. Two floating point values can be held at the same time, one in the main set BCDEHL and one in the exx set BCDEHL'. This is quite fast and allows the float library to be re-entrant on top of that.
                              – aralbrec
                              17 hours ago











                            • The fastest integer math (multiply and divide) also involves the exx set. You'll see it used in 32-,64-,72-bit math and combinations with lower bit sizes. In stdio, the exx set is used to hold tallies like number of characters output and counters for sending buffers one char at a time when a device can't do a block on its own. The main set is then used by the driver to communicate with the device. The exx set is used in more places than that but those are the important ones. It improves performance so much that it's not worthwhile to reserve for fast interrupt response.
                              – aralbrec
                              17 hours ago















                            As you say one of the float libraries uses the exx set to hold the main floating point accumulator which is 48-bit in BCDEHL. Two floating point values can be held at the same time, one in the main set BCDEHL and one in the exx set BCDEHL'. This is quite fast and allows the float library to be re-entrant on top of that.
                            – aralbrec
                            17 hours ago





                            As you say one of the float libraries uses the exx set to hold the main floating point accumulator which is 48-bit in BCDEHL. Two floating point values can be held at the same time, one in the main set BCDEHL and one in the exx set BCDEHL'. This is quite fast and allows the float library to be re-entrant on top of that.
                            – aralbrec
                            17 hours ago













                            The fastest integer math (multiply and divide) also involves the exx set. You'll see it used in 32-,64-,72-bit math and combinations with lower bit sizes. In stdio, the exx set is used to hold tallies like number of characters output and counters for sending buffers one char at a time when a device can't do a block on its own. The main set is then used by the driver to communicate with the device. The exx set is used in more places than that but those are the important ones. It improves performance so much that it's not worthwhile to reserve for fast interrupt response.
                            – aralbrec
                            17 hours ago




                            The fastest integer math (multiply and divide) also involves the exx set. You'll see it used in 32-,64-,72-bit math and combinations with lower bit sizes. In stdio, the exx set is used to hold tallies like number of characters output and counters for sending buffers one char at a time when a device can't do a block on its own. The main set is then used by the driver to communicate with the device. The exx set is used in more places than that but those are the important ones. It improves performance so much that it's not worthwhile to reserve for fast interrupt response.
                            – aralbrec
                            17 hours ago










                            up vote
                            9
                            down vote













                            As usual, the Z80 Oral History provides some insight into the motivation behind the alternate registers and exchange instructions:



                            [Faggin]




                            So I wanted to
                            have a couple of index registers, more 16-bit operations, a better
                            interrupt structure. The whole idea of doubling the number of
                            registers. And I could exchange the register with an exchange
                            instruction, the whole register set. That was an idea that I had used
                            already on the Intel 4040. So that one could serve it up very fast if
                            that was a necessity. And on and on.




                            [Shima]




                            Thirdly, in order to support the highspeed
                            task switching, in the beginning I asked to complete two sets of register files including the program
                            counter. But it was too complicated for customer. Then we gave up on [the idea] of the two sets of
                            general purpose registers.



                            ...



                            And two instruction codes were used for the exchange, the set of
                            general purpose register, and exchanging the set of accumulator and
                            flags. Also those support the high-speed task switching.




                            (And while it may be obvious, it's worth pointing out that in the silicon implementation, the exchange instructions merely modify the state of the register file addressing logic.)






                            share|improve this answer
















                            • 2




                              I wonder how much it would have cost to have separate prime-select bits for BC, DE, and HL, and have 16 or 32 DD-prefix opcodes xor those bits along with AF/AF' (and perhaps the DE/HL selection bit) with bits from the second byte. That could have made the secondary registers a lot more useful.
                              – supercat
                              Oct 2 at 18:43














                            up vote
                            9
                            down vote













                            As usual, the Z80 Oral History provides some insight into the motivation behind the alternate registers and exchange instructions:



                            [Faggin]




                            So I wanted to
                            have a couple of index registers, more 16-bit operations, a better
                            interrupt structure. The whole idea of doubling the number of
                            registers. And I could exchange the register with an exchange
                            instruction, the whole register set. That was an idea that I had used
                            already on the Intel 4040. So that one could serve it up very fast if
                            that was a necessity. And on and on.




                            [Shima]




                            Thirdly, in order to support the highspeed
                            task switching, in the beginning I asked to complete two sets of register files including the program
                            counter. But it was too complicated for customer. Then we gave up on [the idea] of the two sets of
                            general purpose registers.



                            ...



                            And two instruction codes were used for the exchange, the set of
                            general purpose register, and exchanging the set of accumulator and
                            flags. Also those support the high-speed task switching.




                            (And while it may be obvious, it's worth pointing out that in the silicon implementation, the exchange instructions merely modify the state of the register file addressing logic.)






                            share|improve this answer
















                            • 2




                              I wonder how much it would have cost to have separate prime-select bits for BC, DE, and HL, and have 16 or 32 DD-prefix opcodes xor those bits along with AF/AF' (and perhaps the DE/HL selection bit) with bits from the second byte. That could have made the secondary registers a lot more useful.
                              – supercat
                              Oct 2 at 18:43












                            up vote
                            9
                            down vote










                            up vote
                            9
                            down vote









                            As usual, the Z80 Oral History provides some insight into the motivation behind the alternate registers and exchange instructions:



                            [Faggin]




                            So I wanted to
                            have a couple of index registers, more 16-bit operations, a better
                            interrupt structure. The whole idea of doubling the number of
                            registers. And I could exchange the register with an exchange
                            instruction, the whole register set. That was an idea that I had used
                            already on the Intel 4040. So that one could serve it up very fast if
                            that was a necessity. And on and on.




                            [Shima]




                            Thirdly, in order to support the highspeed
                            task switching, in the beginning I asked to complete two sets of register files including the program
                            counter. But it was too complicated for customer. Then we gave up on [the idea] of the two sets of
                            general purpose registers.



                            ...



                            And two instruction codes were used for the exchange, the set of
                            general purpose register, and exchanging the set of accumulator and
                            flags. Also those support the high-speed task switching.




                            (And while it may be obvious, it's worth pointing out that in the silicon implementation, the exchange instructions merely modify the state of the register file addressing logic.)






                            share|improve this answer












                            As usual, the Z80 Oral History provides some insight into the motivation behind the alternate registers and exchange instructions:



                            [Faggin]




                            So I wanted to
                            have a couple of index registers, more 16-bit operations, a better
                            interrupt structure. The whole idea of doubling the number of
                            registers. And I could exchange the register with an exchange
                            instruction, the whole register set. That was an idea that I had used
                            already on the Intel 4040. So that one could serve it up very fast if
                            that was a necessity. And on and on.




                            [Shima]




                            Thirdly, in order to support the highspeed
                            task switching, in the beginning I asked to complete two sets of register files including the program
                            counter. But it was too complicated for customer. Then we gave up on [the idea] of the two sets of
                            general purpose registers.



                            ...



                            And two instruction codes were used for the exchange, the set of
                            general purpose register, and exchanging the set of accumulator and
                            flags. Also those support the high-speed task switching.




                            (And while it may be obvious, it's worth pointing out that in the silicon implementation, the exchange instructions merely modify the state of the register file addressing logic.)







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Oct 2 at 8:35









                            Jeremy

                            3267




                            3267







                            • 2




                              I wonder how much it would have cost to have separate prime-select bits for BC, DE, and HL, and have 16 or 32 DD-prefix opcodes xor those bits along with AF/AF' (and perhaps the DE/HL selection bit) with bits from the second byte. That could have made the secondary registers a lot more useful.
                              – supercat
                              Oct 2 at 18:43












                            • 2




                              I wonder how much it would have cost to have separate prime-select bits for BC, DE, and HL, and have 16 or 32 DD-prefix opcodes xor those bits along with AF/AF' (and perhaps the DE/HL selection bit) with bits from the second byte. That could have made the secondary registers a lot more useful.
                              – supercat
                              Oct 2 at 18:43







                            2




                            2




                            I wonder how much it would have cost to have separate prime-select bits for BC, DE, and HL, and have 16 or 32 DD-prefix opcodes xor those bits along with AF/AF' (and perhaps the DE/HL selection bit) with bits from the second byte. That could have made the secondary registers a lot more useful.
                            – supercat
                            Oct 2 at 18:43




                            I wonder how much it would have cost to have separate prime-select bits for BC, DE, and HL, and have 16 or 32 DD-prefix opcodes xor those bits along with AF/AF' (and perhaps the DE/HL selection bit) with bits from the second byte. That could have made the secondary registers a lot more useful.
                            – supercat
                            Oct 2 at 18:43










                            up vote
                            8
                            down vote














                            Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?




                            This being one occasion when a personal experience answer will do, EXX is ideal for the very specific task of multiplying a 16-bit 2d vector by a scalar, which makes it helpful for 2d vector graphics, and the projection part of 3d vector graphics.



                            Specifically:



                            • use A for the multiplier — rotate right from it into carry;

                            • use BC and BC' for the working copy of the multiplicands; these will need shifting left on each iteration;

                            • use HL and HL' to accumulate the result; perform ADD HL, BC if carry is set after the RRA.

                            So the specific convenient observations are:



                            • you're juggling four 16-bit quantities, but they interact only in pairs;

                            • and using EXX lets you use the 16-bit arithmetic that's right there on the main instruction page.





                            share|improve this answer
























                              up vote
                              8
                              down vote














                              Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?




                              This being one occasion when a personal experience answer will do, EXX is ideal for the very specific task of multiplying a 16-bit 2d vector by a scalar, which makes it helpful for 2d vector graphics, and the projection part of 3d vector graphics.



                              Specifically:



                              • use A for the multiplier — rotate right from it into carry;

                              • use BC and BC' for the working copy of the multiplicands; these will need shifting left on each iteration;

                              • use HL and HL' to accumulate the result; perform ADD HL, BC if carry is set after the RRA.

                              So the specific convenient observations are:



                              • you're juggling four 16-bit quantities, but they interact only in pairs;

                              • and using EXX lets you use the 16-bit arithmetic that's right there on the main instruction page.





                              share|improve this answer






















                                up vote
                                8
                                down vote










                                up vote
                                8
                                down vote










                                Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?




                                This being one occasion when a personal experience answer will do, EXX is ideal for the very specific task of multiplying a 16-bit 2d vector by a scalar, which makes it helpful for 2d vector graphics, and the projection part of 3d vector graphics.



                                Specifically:



                                • use A for the multiplier — rotate right from it into carry;

                                • use BC and BC' for the working copy of the multiplicands; these will need shifting left on each iteration;

                                • use HL and HL' to accumulate the result; perform ADD HL, BC if carry is set after the RRA.

                                So the specific convenient observations are:



                                • you're juggling four 16-bit quantities, but they interact only in pairs;

                                • and using EXX lets you use the 16-bit arithmetic that's right there on the main instruction page.





                                share|improve this answer













                                Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?




                                This being one occasion when a personal experience answer will do, EXX is ideal for the very specific task of multiplying a 16-bit 2d vector by a scalar, which makes it helpful for 2d vector graphics, and the projection part of 3d vector graphics.



                                Specifically:



                                • use A for the multiplier — rotate right from it into carry;

                                • use BC and BC' for the working copy of the multiplicands; these will need shifting left on each iteration;

                                • use HL and HL' to accumulate the result; perform ADD HL, BC if carry is set after the RRA.

                                So the specific convenient observations are:



                                • you're juggling four 16-bit quantities, but they interact only in pairs;

                                • and using EXX lets you use the 16-bit arithmetic that's right there on the main instruction page.






                                share|improve this answer












                                share|improve this answer



                                share|improve this answer










                                answered Sep 30 at 21:42









                                Tommy

                                12.6k13264




                                12.6k13264




















                                    up vote
                                    7
                                    down vote













                                    Following the spirit of personal examples, here's one from when I was in high school: I, like everyone else, used a Ti 84 calculator for my Math classes. Said calculator used a Z80.



                                    Out of a mixture of boredom and curiosity, I wrote a program that printed "WOZ IS ALWAYS THE ANSWER" any time the user hit the enter key. The program didn't need to be running for this to happen, and it could happen anywhere in the menu system.



                                    To do that, I used the chip's interrupts to check the input every 1/140th second. The thing is, each time I checked, I had to corrupt the A register. The input methods simply couldn't get around that. So, if I had just used A, the A register would be switching values for every other program run. That wouldn't work.



                                    Instead, I switched into the shadow registers, did the input checking, switched back, and waited for the next call. It worked out pretty well, and did the job!






                                    share|improve this answer








                                    New contributor




                                    Kyle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                    Check out our Code of Conduct.





















                                      up vote
                                      7
                                      down vote













                                      Following the spirit of personal examples, here's one from when I was in high school: I, like everyone else, used a Ti 84 calculator for my Math classes. Said calculator used a Z80.



                                      Out of a mixture of boredom and curiosity, I wrote a program that printed "WOZ IS ALWAYS THE ANSWER" any time the user hit the enter key. The program didn't need to be running for this to happen, and it could happen anywhere in the menu system.



                                      To do that, I used the chip's interrupts to check the input every 1/140th second. The thing is, each time I checked, I had to corrupt the A register. The input methods simply couldn't get around that. So, if I had just used A, the A register would be switching values for every other program run. That wouldn't work.



                                      Instead, I switched into the shadow registers, did the input checking, switched back, and waited for the next call. It worked out pretty well, and did the job!






                                      share|improve this answer








                                      New contributor




                                      Kyle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                      Check out our Code of Conduct.



















                                        up vote
                                        7
                                        down vote










                                        up vote
                                        7
                                        down vote









                                        Following the spirit of personal examples, here's one from when I was in high school: I, like everyone else, used a Ti 84 calculator for my Math classes. Said calculator used a Z80.



                                        Out of a mixture of boredom and curiosity, I wrote a program that printed "WOZ IS ALWAYS THE ANSWER" any time the user hit the enter key. The program didn't need to be running for this to happen, and it could happen anywhere in the menu system.



                                        To do that, I used the chip's interrupts to check the input every 1/140th second. The thing is, each time I checked, I had to corrupt the A register. The input methods simply couldn't get around that. So, if I had just used A, the A register would be switching values for every other program run. That wouldn't work.



                                        Instead, I switched into the shadow registers, did the input checking, switched back, and waited for the next call. It worked out pretty well, and did the job!






                                        share|improve this answer








                                        New contributor




                                        Kyle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                        Check out our Code of Conduct.









                                        Following the spirit of personal examples, here's one from when I was in high school: I, like everyone else, used a Ti 84 calculator for my Math classes. Said calculator used a Z80.



                                        Out of a mixture of boredom and curiosity, I wrote a program that printed "WOZ IS ALWAYS THE ANSWER" any time the user hit the enter key. The program didn't need to be running for this to happen, and it could happen anywhere in the menu system.



                                        To do that, I used the chip's interrupts to check the input every 1/140th second. The thing is, each time I checked, I had to corrupt the A register. The input methods simply couldn't get around that. So, if I had just used A, the A register would be switching values for every other program run. That wouldn't work.



                                        Instead, I switched into the shadow registers, did the input checking, switched back, and waited for the next call. It worked out pretty well, and did the job!







                                        share|improve this answer








                                        New contributor




                                        Kyle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                        Check out our Code of Conduct.









                                        share|improve this answer



                                        share|improve this answer






                                        New contributor




                                        Kyle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                        Check out our Code of Conduct.









                                        answered Oct 2 at 5:21









                                        Kyle

                                        711




                                        711




                                        New contributor




                                        Kyle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                        Check out our Code of Conduct.





                                        New contributor





                                        Kyle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                        Check out our Code of Conduct.






                                        Kyle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                        Check out our Code of Conduct.




















                                            up vote
                                            6
                                            down vote













                                            Back in the early 90s I used the alternate registers in 3 different ways:



                                            1. For task switching in a dual task "operating system".

                                            2. In a RAM-less control application, temperature sensors and relay control. (208 bits in total was enough).

                                            3. In a floating point library. The latter was also done by several others (and better than my attempt), see for example http://www.andreadrian.de/oldcpu/Z80_number_cruncher.html





                                            share|improve this answer








                                            New contributor




                                            Baard is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                            Check out our Code of Conduct.





















                                              up vote
                                              6
                                              down vote













                                              Back in the early 90s I used the alternate registers in 3 different ways:



                                              1. For task switching in a dual task "operating system".

                                              2. In a RAM-less control application, temperature sensors and relay control. (208 bits in total was enough).

                                              3. In a floating point library. The latter was also done by several others (and better than my attempt), see for example http://www.andreadrian.de/oldcpu/Z80_number_cruncher.html





                                              share|improve this answer








                                              New contributor




                                              Baard is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                              Check out our Code of Conduct.



















                                                up vote
                                                6
                                                down vote










                                                up vote
                                                6
                                                down vote









                                                Back in the early 90s I used the alternate registers in 3 different ways:



                                                1. For task switching in a dual task "operating system".

                                                2. In a RAM-less control application, temperature sensors and relay control. (208 bits in total was enough).

                                                3. In a floating point library. The latter was also done by several others (and better than my attempt), see for example http://www.andreadrian.de/oldcpu/Z80_number_cruncher.html





                                                share|improve this answer








                                                New contributor




                                                Baard is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                Check out our Code of Conduct.









                                                Back in the early 90s I used the alternate registers in 3 different ways:



                                                1. For task switching in a dual task "operating system".

                                                2. In a RAM-less control application, temperature sensors and relay control. (208 bits in total was enough).

                                                3. In a floating point library. The latter was also done by several others (and better than my attempt), see for example http://www.andreadrian.de/oldcpu/Z80_number_cruncher.html






                                                share|improve this answer








                                                New contributor




                                                Baard is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                Check out our Code of Conduct.









                                                share|improve this answer



                                                share|improve this answer






                                                New contributor




                                                Baard is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                Check out our Code of Conduct.









                                                answered 2 days ago









                                                Baard

                                                1612




                                                1612




                                                New contributor




                                                Baard is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                Check out our Code of Conduct.





                                                New contributor





                                                Baard is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                Check out our Code of Conduct.






                                                Baard is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                Check out our Code of Conduct.




















                                                    up vote
                                                    4
                                                    down vote













                                                    Sorry to rain on any parades but, while it's been a long time since I did any serious Z-80 programming, I distinctly remember the alternate registers being one of the major broken features of the Z-80.



                                                    You see, they could be used to either:



                                                    • Speed up interrupt handling.

                                                    • Store alternate data in your main code.

                                                    The thing was you could not do both.



                                                    So if I was writing a device driver with an interrupt service routine, I could not use the alternate registers to save pushing and popping the main registers. I of course would need to avoid doing this in case the application programmer was using those registers.



                                                    And if I was writing application code, I could not use the alternate registers because an ISR could trample all over my data.



                                                    The overall result was that most programmers stayed away from the alternate register sets so that there code would run on more machines without crashing and burning. Any savings was swamped by the risk of shipping non-working code.



                                                    Safe code may be slower, but broken code does not run at all.



                                                    Of course if you have the luxury of controlling all aspects of the system and its coding, you were free to do as you please. I never had that luxury.






                                                    share|improve this answer
















                                                    • 2




                                                      Since some of the tricks in the other answers depend on pointing sp into non-stack data structures, I think it's implicit that one would disable interrupts on the way into those tight inner loops.
                                                      – Henning Makholm
                                                      Oct 3 at 14:26






                                                    • 1




                                                      @HenningMakholm, funnily enough, some people came up with register use conventions that allowed them to keep interrupts enabled while using SP pointing at non-stack data. Of course, this was happening on micros, where programmers had the luxury of controlling every aspect of the system :)
                                                      – introspec
                                                      Oct 3 at 18:03






                                                    • 1




                                                      I suppose that the luxury in question is a nice way of saying "Having to do all the work yourself." Does not sound so good when you put it that way!
                                                      – Peter Camilleri
                                                      2 days ago






                                                    • 2




                                                      It's hardly fair to describe a feature as broken just because it doesn't fit your particular use case. No, you can't have your cake and eat it; the alternate registers will suit either of the purposes you described but not both together.
                                                      – user3570736
                                                      2 days ago














                                                    up vote
                                                    4
                                                    down vote













                                                    Sorry to rain on any parades but, while it's been a long time since I did any serious Z-80 programming, I distinctly remember the alternate registers being one of the major broken features of the Z-80.



                                                    You see, they could be used to either:



                                                    • Speed up interrupt handling.

                                                    • Store alternate data in your main code.

                                                    The thing was you could not do both.



                                                    So if I was writing a device driver with an interrupt service routine, I could not use the alternate registers to save pushing and popping the main registers. I of course would need to avoid doing this in case the application programmer was using those registers.



                                                    And if I was writing application code, I could not use the alternate registers because an ISR could trample all over my data.



                                                    The overall result was that most programmers stayed away from the alternate register sets so that there code would run on more machines without crashing and burning. Any savings was swamped by the risk of shipping non-working code.



                                                    Safe code may be slower, but broken code does not run at all.



                                                    Of course if you have the luxury of controlling all aspects of the system and its coding, you were free to do as you please. I never had that luxury.






                                                    share|improve this answer
















                                                    • 2




                                                      Since some of the tricks in the other answers depend on pointing sp into non-stack data structures, I think it's implicit that one would disable interrupts on the way into those tight inner loops.
                                                      – Henning Makholm
                                                      Oct 3 at 14:26






                                                    • 1




                                                      @HenningMakholm, funnily enough, some people came up with register use conventions that allowed them to keep interrupts enabled while using SP pointing at non-stack data. Of course, this was happening on micros, where programmers had the luxury of controlling every aspect of the system :)
                                                      – introspec
                                                      Oct 3 at 18:03






                                                    • 1




                                                      I suppose that the luxury in question is a nice way of saying "Having to do all the work yourself." Does not sound so good when you put it that way!
                                                      – Peter Camilleri
                                                      2 days ago






                                                    • 2




                                                      It's hardly fair to describe a feature as broken just because it doesn't fit your particular use case. No, you can't have your cake and eat it; the alternate registers will suit either of the purposes you described but not both together.
                                                      – user3570736
                                                      2 days ago












                                                    up vote
                                                    4
                                                    down vote










                                                    up vote
                                                    4
                                                    down vote









                                                    Sorry to rain on any parades but, while it's been a long time since I did any serious Z-80 programming, I distinctly remember the alternate registers being one of the major broken features of the Z-80.



                                                    You see, they could be used to either:



                                                    • Speed up interrupt handling.

                                                    • Store alternate data in your main code.

                                                    The thing was you could not do both.



                                                    So if I was writing a device driver with an interrupt service routine, I could not use the alternate registers to save pushing and popping the main registers. I of course would need to avoid doing this in case the application programmer was using those registers.



                                                    And if I was writing application code, I could not use the alternate registers because an ISR could trample all over my data.



                                                    The overall result was that most programmers stayed away from the alternate register sets so that there code would run on more machines without crashing and burning. Any savings was swamped by the risk of shipping non-working code.



                                                    Safe code may be slower, but broken code does not run at all.



                                                    Of course if you have the luxury of controlling all aspects of the system and its coding, you were free to do as you please. I never had that luxury.






                                                    share|improve this answer












                                                    Sorry to rain on any parades but, while it's been a long time since I did any serious Z-80 programming, I distinctly remember the alternate registers being one of the major broken features of the Z-80.



                                                    You see, they could be used to either:



                                                    • Speed up interrupt handling.

                                                    • Store alternate data in your main code.

                                                    The thing was you could not do both.



                                                    So if I was writing a device driver with an interrupt service routine, I could not use the alternate registers to save pushing and popping the main registers. I of course would need to avoid doing this in case the application programmer was using those registers.



                                                    And if I was writing application code, I could not use the alternate registers because an ISR could trample all over my data.



                                                    The overall result was that most programmers stayed away from the alternate register sets so that there code would run on more machines without crashing and burning. Any savings was swamped by the risk of shipping non-working code.



                                                    Safe code may be slower, but broken code does not run at all.



                                                    Of course if you have the luxury of controlling all aspects of the system and its coding, you were free to do as you please. I never had that luxury.







                                                    share|improve this answer












                                                    share|improve this answer



                                                    share|improve this answer










                                                    answered Oct 3 at 0:42









                                                    Peter Camilleri

                                                    71429




                                                    71429







                                                    • 2




                                                      Since some of the tricks in the other answers depend on pointing sp into non-stack data structures, I think it's implicit that one would disable interrupts on the way into those tight inner loops.
                                                      – Henning Makholm
                                                      Oct 3 at 14:26






                                                    • 1




                                                      @HenningMakholm, funnily enough, some people came up with register use conventions that allowed them to keep interrupts enabled while using SP pointing at non-stack data. Of course, this was happening on micros, where programmers had the luxury of controlling every aspect of the system :)
                                                      – introspec
                                                      Oct 3 at 18:03






                                                    • 1




                                                      I suppose that the luxury in question is a nice way of saying "Having to do all the work yourself." Does not sound so good when you put it that way!
                                                      – Peter Camilleri
                                                      2 days ago






                                                    • 2




                                                      It's hardly fair to describe a feature as broken just because it doesn't fit your particular use case. No, you can't have your cake and eat it; the alternate registers will suit either of the purposes you described but not both together.
                                                      – user3570736
                                                      2 days ago












                                                    • 2




                                                      Since some of the tricks in the other answers depend on pointing sp into non-stack data structures, I think it's implicit that one would disable interrupts on the way into those tight inner loops.
                                                      – Henning Makholm
                                                      Oct 3 at 14:26






                                                    • 1




                                                      @HenningMakholm, funnily enough, some people came up with register use conventions that allowed them to keep interrupts enabled while using SP pointing at non-stack data. Of course, this was happening on micros, where programmers had the luxury of controlling every aspect of the system :)
                                                      – introspec
                                                      Oct 3 at 18:03






                                                    • 1




                                                      I suppose that the luxury in question is a nice way of saying "Having to do all the work yourself." Does not sound so good when you put it that way!
                                                      – Peter Camilleri
                                                      2 days ago






                                                    • 2




                                                      It's hardly fair to describe a feature as broken just because it doesn't fit your particular use case. No, you can't have your cake and eat it; the alternate registers will suit either of the purposes you described but not both together.
                                                      – user3570736
                                                      2 days ago







                                                    2




                                                    2




                                                    Since some of the tricks in the other answers depend on pointing sp into non-stack data structures, I think it's implicit that one would disable interrupts on the way into those tight inner loops.
                                                    – Henning Makholm
                                                    Oct 3 at 14:26




                                                    Since some of the tricks in the other answers depend on pointing sp into non-stack data structures, I think it's implicit that one would disable interrupts on the way into those tight inner loops.
                                                    – Henning Makholm
                                                    Oct 3 at 14:26




                                                    1




                                                    1




                                                    @HenningMakholm, funnily enough, some people came up with register use conventions that allowed them to keep interrupts enabled while using SP pointing at non-stack data. Of course, this was happening on micros, where programmers had the luxury of controlling every aspect of the system :)
                                                    – introspec
                                                    Oct 3 at 18:03




                                                    @HenningMakholm, funnily enough, some people came up with register use conventions that allowed them to keep interrupts enabled while using SP pointing at non-stack data. Of course, this was happening on micros, where programmers had the luxury of controlling every aspect of the system :)
                                                    – introspec
                                                    Oct 3 at 18:03




                                                    1




                                                    1




                                                    I suppose that the luxury in question is a nice way of saying "Having to do all the work yourself." Does not sound so good when you put it that way!
                                                    – Peter Camilleri
                                                    2 days ago




                                                    I suppose that the luxury in question is a nice way of saying "Having to do all the work yourself." Does not sound so good when you put it that way!
                                                    – Peter Camilleri
                                                    2 days ago




                                                    2




                                                    2




                                                    It's hardly fair to describe a feature as broken just because it doesn't fit your particular use case. No, you can't have your cake and eat it; the alternate registers will suit either of the purposes you described but not both together.
                                                    – user3570736
                                                    2 days ago




                                                    It's hardly fair to describe a feature as broken just because it doesn't fit your particular use case. No, you can't have your cake and eat it; the alternate registers will suit either of the purposes you described but not both together.
                                                    – user3570736
                                                    2 days ago

















                                                     

                                                    draft saved


                                                    draft discarded















































                                                     


                                                    draft saved


                                                    draft discarded














                                                    StackExchange.ready(
                                                    function ()
                                                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fretrocomputing.stackexchange.com%2fquestions%2f7794%2fdid-anyone-ever-use-the-extra-set-of-registers-on-the-z80%23new-answer', 'question_page');

                                                    );

                                                    Post as a guest













































































                                                    Popular posts from this blog

                                                    Peggy Mitchell

                                                    Palaiologos

                                                    The Forum (Inglewood, California)