home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.barnyard.co.uk
/
2015.02.ftp.barnyard.co.uk.tar
/
ftp.barnyard.co.uk
/
cpm
/
walnut-creek-CDROM
/
JSAGE
/
ZSUS
/
TCJ
/
TCJ-ZDOS.LBR
/
ZSART2.WZ
/
ZSART2.WS%a0
Wrap
Text File
|
2000-06-30
|
44KB
|
899 lines
..No justification
..No multiple spaces, use TAB chars (^P^I)
..Bolding with ^P^B, Italics with ^P^Y
ZSDOS, Anatomy of an Operating System, Part II
by
Harold F. Bower, Major, US Army Signal Corps; BSEE, MSCIS, Ham ì
(WA5JAY), avid homebuilder (starting with 8008 running SCELBAL).
and
Cameron W. Cotrill, Vice President, Advanced Multiware Systems; ì
specialist in "impossible" real-time hardware and software ì
systems.
In the first part of this article, we presented the philosophy ì
and the features of ZSDOS (Z-System Disk Operating System). In ì
this portion, we will summarize the performance of ZSDOS, share a ì
few of the tricks we used to shoehorn all these features into 7 ì
bytes, and give a few programming examples showing how to use ì
some of the new features of ZSDOS and ZDDOS.
ZSDOS Performance.
Measuring the performance improvements of ZSDOS is a complicated ì
matter. During development, an entire suite of tests was run on ì
ZS/ZDDOS in various configurations in an attempt to validate the ì
design tradeoffs. The most revealing tests of BDOS differences ì
turned out to be a series of assemblies done under control of a ì
command script. This should be no surprise as assemblies are by ì
nature disk intensive.
To reduce the perception that our results are "tailored" or ì
skewed in favor of a particular system or configuration, ì
different processor chips (Z80 and HD64180), different BIOSes ì
(MicroMint, XBIOS, Ampro), and different media (RAM disk, Hard ì
Disk and Floppy disk) were used in the timed runs. Since the ì
results were most affected by the media, results are shown in the ì
categories of RAM, Hard Disk and Floppy Disk performance. No form ì
of file date stamping was done since ZSDOS would have a distinct ì
advantage in this field.
Three sets of hardware were used in these analyses in an attempt ì
to minimize the effect of any unique processes in a given system ì
from skewing the results. The first system (System 1 in the ì
timing runs) was a "stock" MicroMint SB-180 operating at a 6.144 ì
MHz clock speed. System 2 was an Ampro Little Board 1A with a Z80 ì
running at 4.0 MHz, and System 3 was a homebrew Z-180 system ì
designed to be compatible with the SB-180 operating at 9.216 MHz. ì
Complete information on each system in the Appendix.
OPERATING SYSTEMS.
CP/M 2.2. Gary Kildall and Digital Research developed this ì
operating system for 8-bit processors in an evolutionary process ì
on early 8080-based computers. A subsequent product, CP/M Plus ì
(also known as CP/M 3) is still in limited use, but has not ìègained the wide acceptance of the earlier release. CP/M 2.2 is ì
coded in 8080 assembly language and is a non-banked, non-ì
reentrant single-user, single tasking operating system.
ZRDOS 1.9. Echelon Incorporated released many versions of this ì
CP/M 2.2-compatible operating system over the past several years. ì
It is coded in Z80 assembly language and will therefore not ì
execute on 8080 processors. Some additional features were added, ì
such as one-level reentrancy under user control, and return of ì
the current DMA address. Later versions (after 1.5) include ì
enhanced support for hard disk media by not rebuilding the ì
allocation bit map on a disk relog command. Version 1.9 added ì
larger disk and file sizes. Like CP/M, it is single-user and ì
single-tasking.
ZSDOS. This is the topic of this article, with details and ì
descriptions of features contained in Part I. ZSDOS is coded in ì
Z80 assembly language and is also a single-user, single-tasking ì
operating system capable of single-level reentrancy.
Since this report was an aimed at formalizing an evaluation of ì
the performance characteristics of ZSDOS, a number of different ì
variants to the above operating systems were initially timed. ì
Because the performance of these systems was very similar to ì
others in the test, their comparative results are simply ì
summarized below.
CP/M 2.2 with Plu*Perfect Systems' PUBlic patch. Only minor ì
differences in performance from the basic CP/M 2.2 were noted, so ì
results of the patched system were not included in the final ì
results.
ZRDOS 1.2. The performance of ZRDOS 1.2 was very close to CP/M ì
2.2, being a couple of percent slower in the majority of cases. ì
It was therefore not included in the final timing analyses.
ZRDOS 1.7. Timing tests indicate no significant performance ì
differences between ZRDOS 1.7 and 1.9.
ZDDOS. Since ZSDOS and ZDDOS are largely the same code and since ì
comparative timings between them show less than a 1% difference, ì
only times for ZSDOS will be presented.
BASIC IO SYSTEMS (BIOSes).
MICRO MINT, SB-180. While MicroMint currently ships Version 3.2 ì
with their systems, a slightly modified version of 2.7 was used ì
in these timings on the SB-180. The changes included independent ì
step rates for floppy drives, different floppy formats and fixing ì
of eight-inch drivers as well as a slight amount of optimization. ì
Little performance difference from the standard BIOS should be ì
noticed. A 54k system size was used. The BIOS uses programmed IO ì
on most peripherals with DMA functions of the 64180 processor ì
used for Floppy and RAM disk data movement.è
XBIOS, SB-180. XSystems' XBIOS version 1.1 is an extremely ì
powerful and flexible banked system with excellent tools and ì
interfaces. Malcom Kemp has concentrated on providing functions ì
in this release, and has deferred optimization to future ì
releases. XBIOS fully supports the ETS180 IO+ board, allows ì
complete configuration of peripherals, and provides a larger TPA ì
since only a small kernel resides in the primary memory area. ì
Most of the BIOS code resides in an alternate memory bank. XBIOS ì
installs the largest possible TPA when used which was 57.5k for ì
these tests. XBIOS was installed with three buffers for disk IO.
AMPRO, Little Board-1A. A stock version of the Ampro version 3.8 ì
BIOS assembled with no ZCPR support was used for testing. A ì
system size of 59k was chosen to provide support for 5 hard disk ì
partitions spread over two physical drives. NZCOM was then loaded ì
to provide Z-System support. The Ampro BIOS is strictly a polled ì
system and uses no interrupts or DMA.
EVALUATION PROCEDURES.
Since the goal of evaluating performance was to heavily exercise ì
BDOS functions, a set of fourteen assembly modules, thirteen of ì
which were 2-4k in size, and one of 6k were assembled to produce ì
Microsoft REL files. To restrict external influences, no file ì
date stamping was used, and many ZSDOS features such as Public ì
and Path were disabled. On the other hand, to provide a semi-ì
realistic setting, ZEX.COM and the executable assemblers were ì
placed in a different Drive/User with the ZCPR search path set to ì
locate the files on the second directory scan. SLR's SLR180 ì
assembler was used on system 2, while tests on systems 1 and 3 ì
used Z80ASM+. Assembly was done under the control of a memory-ì
based SUBMIT utility (ZEX Version 3.1A) script file. Times were ì
measured from the carriage return terminating the command ì
invoking the ZEX file to display of the "Done" message after ì
assembly of the last file. After each run, the .REL files ì
produced by the assembly were erased so that the same disk space ì
could be used in the next run. No other files were added or ì
deleted to any media during the timing runs. At least three runs ì
were performed for each configuration, and the results averaged. ì
Timing was manually performed with a stopwatch.
Due to the radical differences in access times for different ì
media, three categories of times were considered; RAM disk, Hard ì
Disk, and Floppy disk. If you think you know how each system ì
fared, read on - there may be a twist or two in the plot.
RAM DISK. The Ampro has no RAM disk, so timings in this category ì
reflect only the SB180. The SB180 computer is equipped with 256k ì
of memory. The standard MicroMint BIOS divides this into a 64k ì
main memory area and a 192k RAM disk. With XBIOS as tested here, ì
64k is allocated for the main memory, 24k for the banked portion ì
of XBIOS, buffers and banked system extensions. The remaining ì
space is available for a RAM disk. RAM disks on the SB180 use ìèbuilt-in DMA capabilities of the HD64180 processor to move ì
"sectors" of data rather than the slower block move instructions ì
used by Z80 systems.
Exiting a program via the Warm Boot vector in CP/M relogs the A ì
drive. To minimize time penalties imposed by this, a Hard disk ì
partition was defined as the A drive. Needed programs as well as ì
the assembly modules were placed on the RAM disk (M:), with ì
ZEX.COM and Z80ASM+.COM placed in User 15 and the sources files ì
in User 0. The search path for this phase was: Drive M, User 0 to ì
Drive M, User 15.
Since the RAM disk is defined as a non-removable media in the ì
Disk Parameter Block, the "Rapid Relog" feature of ZSDOS and ì
ZRDOS was expected to produce much shorter execution times than ì
CP/M for this series of measurements. As can be seen from the ì
results, this was indeed the case. The raw timings in seconds ì
with percentage changes from the shortest time are:
ZSDOS ZRDOS 1.9 CP/M 2.2
+------------------------------------------------+
BIOS 2.7 | 17.0 (---) 17.1 (+4%) 36.4 (+114%) |
XBIOS 1.1 | 14.2 (---) 14.5 (+2%) 34.5 (+144%) |
+------------------------------------------------+
The effects of the Rapid Relog feature were borne out, with ZSDOS ì
being a couple of percent faster. Disabling the Rapid Relog ì
feature of ZSDOS produced nearly identical results to CP/M, so ì
most of the additional time for that system may be attributed to ì
rebuilding the disk allocation bit maps for Drives A and M on ì
each warm boot.
HARD DISK.
Three systems, 6.144 MHz SB-180 (System 1), 4.0 MHz Ampro Little ì
Board-1A (System 2), 9.216 MHz Z-180 Homebrew SB-180 (System 3), ì
were used to gather information for this phase. This latter ì
system was added to demonstrate performance on a heavily loaded ì
system.
ZSDOS ZRDOS 1.9 CP/M 2.2
+------------------------------------------------+
1-BIOS 2.7 | 0:54.7 (---) 1:16.6 (+40%) 1:34.7 (+73%) |
1-XBIOS 1.1 | 0:52.2 (---) 1:15.4 (+44%) 1:33.4 (+79%) |
2-AMPRO | 1:55 (---) 2:44 (+43%) 3:15 (+70%) |
3-BIOS 2.7 | 1:07.7 (---) 1:40.6 (+49%) 1:50.2 (+63%) |
3-XBIOS 1.1 | 1:29.5 (---) 2:06.4 (+41%) 2:11.3 (+47%) |
+------------------------------------------------+
As in the previous RAM Disk results, the results of ZSDOS with ì
"Rapid Relog" disabled and CP/M were nearly the same confirming ì
that rebuilding the allocation bit maps on a disk relog is the ì
principle cause for the increased CP/M times.
èAll reported times were made with a path which forced a search of ì
the current directory before locating executable files on the ì
second path element. As an experiment, the path on the Ampro ì
system was changed to go directly to A2:, eliminating the current ì
directory scan. All DOSes showed an identical 10 second speedup, ì
indicating directory scan time for all DOSes was the same.
A further point to note is the effect of multiple disk buffers on ì
performance. For system 1, the number of buffers was adequate to ì
retain directory information which improved performance over the ì
single-buffer Micromint BIOS by 1-5%. In system 3, the buffering ì
was inadequate to retain necessary information, so the multiple ì
buffers were of no benefit.
FLOPPY DISK.
Examination of system performance on a Floppy Disk system was ì
tailored to duplicate, as closely as possible, a hypothetical ì
operating configuration using multiple drives with non-trivial ì
search path along differing Drives and User area lines.
Since all three primary operating systems of interest to this ì
analysis (ZSDOS, CP/M 2.2 and ZRDOS 1.9) rebuild removable-media ì
disk allocation maps on a relog, there was no need to explicitly ì
disable the "Rapid Relog" feature of ZSDOS for this portion of ì
the study. Results are:
ZSDOS ZRDOS 1.9 CP/M 2.2
+----------------------------------------------+
BIOS 2.3 | 2:18.7 (+2%) 2:22.4 (+5%) 2:16.0 (---) |
XBIOS 1.0 | 2:29.5 (+0.5%) 2:32.7 (+3%) 2:29.0 (---) |
AMPRO | 2:26 (+1%) 2:28 (+2%) 2:25 (---) |
+----------------------------------------------+
Since all of the operating systems are functionally identical in a ì
Floppy Disk configuration, we did not expect large differences in ì
measured times. We were therefore not surprised with variations ì
over a spread of only five percent. While we strove to make ZSDOS ì
as efficient as possible, CP/M was still the champ on floppy ì
systems by a nose.
As a final comparison test between the three DOSes, the amount of ì
time WordStar 4 took to ^QC and ^QR through the 92k ZSDOS source ì
file was measured under all three DOSes. All timings were within ì
1%, indicating that read/write to open file times were similar.
PERFORMANCE CONCLUSIONS.
ZSDOS offers significant improvements in system performance on ì
CP/M 2.2 compatible Z80-compatible computer systems with fixed ì
media even under the restricted test conditions which disabled ì
some of the most powerful features of ZSDOS. Even more impressive ì
results may be obtained in a "tuned" installation with such ìèfeatures as Public files, and proper selection of the DOS search ì
path (improvements of 9% on a hard disk system are typical).
The other major conclusion that can be drawn from this effort is ì
that the selection of a BIOS tailored to the requirements is ì
crucial to achieving optimum performance. The multiple buffering ì
capability of XBIOS offers speed increases in systems where an ì
adequate number of buffers exists, but degrades floppy-based and ì
heavily loaded hard disk performance.
During the data gathering for this report, an anomaly was noted ì
with respect to CP/M Plus (or P2DOS) stamps. System #1 was ì
initialized for P2DOS stamps on the disk holding data files to ì
quantify the differences. In all cases ZSDOS was affected less ì
than one percent, yet ZRDOS increased to seven percent longer ì
than ZSDOS on RAM disk, 20% longer on floppy and 144% longer on ì
hard disk. CP/M 2.2 was similarly affected, but to a lesser ì
degree, increasing times over ZSDOS to 115% on RAM disk, ten ì
percent on floppy and 140% on hard disk. While neither ZRDOS nor ì
CP/M 2.2 can manipulate this type of stamp, merely using a disk ì
which is so prepared will result in slower processing.
HOW WE DID IT.
During the year or so that we pursued our independent paths in ì
modifying H.A.J. Ten Brugge's excellent P2DOS alternative to CP/M ì
2.2's BDOS, our approaches were somewhat diverse. While Cam's ì
approach was directed at perfecting features, Hal's effort was ì
directed at streamlining the code to create a "speed demon" ì
operating system, and Carson concentrated on enhancing embedded ì
Date Stamping. In mid-1987, Bridger Mitchell was instrumental in ì
getting us to pool our resources and collaborate in a joint ì
venture. The results have been more than worth it. In Part I, we ì
described the functional enhancements and standards embodied in ì
ZSDOS, and have just shown the performance improvements compared ì
to CP/M 2.2 and ZRDOS 1.9. In our efforts to foster better code ì
for our 8-bit systems, we would now like to describe how the task ì
of adding features and decreasing execution time was accomplished ì
without increasing the Operating System memory requirements.
The topic of code optimization is a controversial one. In the ì
early days of computers, programmers were saddled with small ì
memory space and slow processors, so every effort was made to ì
optimize programs for speed and size. As memory became cheaper ì
and processors emerged with ever increasing clock speeds, ì
programming techniques became lost to all but a few. This same ì
path of evolution has also been followed in the Personal Computer ì
field.
To demonstrate this point, first compare the 3.5 kbyte CP/M 2.2 ì
BDOS and the 1 kbyte Plu*Perfect DateStamper to the functionally ì
superior 3.5k ZDDOS. Next, compare the 3.5 kbyte size of CP/M 2.2 ì
and ZSDOS to the 16 kbyte size of the functionally similar MS-DOS ì
2.1. To carry the point further, contrast the almost 16 kbyte ìèCOMMAND.COM to the 7 kbyte size of a more capable ZCPR3 Command ì
Processor with a full environment. Some of this bloat is ì
understandable with the change in processor chips. On the other ì
hand, the more powerful instructions of 16-bit 808x processors ì
should have counteracted a good portion of this code bloat.
In line with the size comparisons, execution speeds also suffer ì
with the larger code. Friends and co-workers who are used to ì
working with PCs and clones operating at 4.77 and 8 MHz clock ì
rates are constantly amazed at the speed of even a lowly 4 MHz ì
ZSDOS system, and dazzled at the 6 and 9 MHz Hitachi 64180 ì
systems running the same software! While much of this is ì
subjective, quite a bit is due to the fact that the "smaller" 8-ì
bit code has been hand-coded and optimized, whereas the PC arena ì
is devoting more of its energy to coding in high-level languages. ì
This makes sense under certain circumstances (e.g. during ì
development and for long-term maintainability), but it most ì
certainly does NOT make sense for operating systems where size ì
and speed are of the essence.
Since all of our efforts have been directed at the Zilog Z80 and ì
compatible family of microprocessors (including Hitachi's 64180 ì
and National's NSC800), the optimization steps covered here apply ì
directly only to these. Having stated that, we also need to point ì
out that many of the basic concepts will still apply to other ì
processors, although details may differ.
No matter what processor is used, the goals of faster program ì
execution and smaller memory size are in conflict. Smaller memory ì
size normally means using each section of code as many times as ì
possible - typically by using many subroutines. Faster code ì
execution often means avoiding as many subroutine calls as ì
possible. In every program undergoing optimization, the ì
conflicting size and speed requirements must be balanced. This ì
balance can be highly subjective. In ZSDOS, code size was the ì
primary concern though significant effort was given to making the ì
smaller code run as fast as possible.
Now for the minutiae. If you are not a programmer, or are ì
interested only in how to use ZSDOS, you might want to skip to ì
PROGRAMMING FOR ZSDOS. For the diehards - here it is!
One of the first techniques we used in optimizing code was to ì
examine all JUMP instructions. The basic instruction is three ì
bytes long and executes in 10 clock cycles on a Z80. These ì
absolute jumps may be unconditional (JP addr), or conditional (JP ì
C,addr) based on the contents of the Carry, Zero or ì
Parity/Overflow flags. The Z80 also features a two-byte Relative ì
jump (JR) which also may be absolute (JR addr), or conditional ì
(JR C,addr) based on the Carry or Zero flags. The relative jump ì
is only two bytes long and may branch only to addresses within ì
the range of +127 to -128 bytes of the jump instruction. While it ì
is relatively easy to blindly change all jump instructions within ì
range to Relative jumps, the careful programmer will also note ì
that the Relative jump may carry a time penalty. The absolute ìèrelative jump, and conditional jumps where the condition is ì
satisfied (the jump is taken) require 12 clock cycles compared to ì
the long jump consuming only 10 cycles regardless of condition. ì
On the other hand, conditional relative jumps need only 7 cycles ì
if the condition is false. This type of optimization was one of ì
the first used in our efforts to enhance P2DOS.
The next simple optimizing technique we used was to make maximum ì
use of the Decrement-B and Jump Relative if Not Zero (DJNZ) ì
instruction. This two-byte sequence executes in 8 or 13 clock ì
cycles (B=0 and B<>0 respectively) for an absolute time and code ì
saving over separate decrement/jump sequences. In some of our ì
work on ZSDOS, using this instruction required redefining ì
register usage to free up the B register for use as a counter.
Another simple optimizing step was examining the use of the IX ì
register. IX holds the argument passed to DOS in the DE register ì
(typically a file control block pointer). Despite having this ì
value available all the time, there were a significant number of ì
cases when faster and/or shorter code was produced by moving the ì
pointer into HL. This was normally the case when the same offset ì
within the FCB was accessed two or more times in succession.
The final "simple" optimization technique we used was to examine ì
all PUSHes and POPs to the stack and delete any found to be ì
unnecessary. While this sounds simple, it is quite a chore in a ì
complex program such as ZSDOS where CALLs call other CALLs which ì
call still other CALLs, etc. Each path must be examined to insure ì
that the registers are, in fact, not altered or needed.
After the above "simple" optimizations were performed, A series ì
of what we term "moderate" optimization steps were undertaken. ì
One of these involved examining all series of sequential checks ì
on a byte (such as the input command character scanner) and ì
structure the check sequences to optimize performance based on ì
clock cycle counting mentioned above, and estimated frequency of ì
access for various commands. In the case of the command ì
dispatcher, this technique resulted in extremely fast command ì
parsing implemented with minimum code.
Sequential bit shifts and rotates are another area where more ì
analysis is required before final code can be written. Sixteen-ì
bit shifts, and 8-bit shifts in registers other than the ì
accumulator are areas where gains can be achieved. The usual ì
method of using a subroutine which loads all bytes to the ì
accumulator for shifts and rotates fares poorly if only one or ì
two bit shifts are needed. While most of these cases had been ì
removed from the P2DOS code by the original author, the ì
replacement inline code still suffered from some inefficiencies. ì
A two-bit shift right (division by 4) of the 16-bit HL register ì
pair in the STDIR routine using the code:
SRL H ; Divide by 2
RR L
SRL H ; Divide by 4è RR L
proved optimum. Using a two-iteration loop with the DJNZ ì
instruction around a single SRL H, RR L sequence would have ì
produced the same 8-byte code length, but at a penalty of 21 ì
clock cycles. A call to a subroutine would have fared even worse ì
with a 27 clock cycle CALL/RET penalty, and four bytes of ì
overhead. On the other hand, three-bit shifts of the HL ì
register pair occurred in a number of routines. These were ì
consolidated into a single callable routine that uses the B ì
register as a counter in an iterative loop with the sequence:
SHRHL3: LD B,3
SHRHLB: SRL H
RR L
DJNZ SHRHLB
RET
While the replacement code added overhead, it saved 3-5 bytes of ì
code (depending on entry point) which were sorely needed to add ì
additional features. ZSDOS calls this routine from three places, ì
while ZDDOS calls it from five. The difference is due to ZSDOS ì
"unrolling" the loop in time critical routines.
Shifts to the left were occasionally handled a little more ì
efficiently by using the 16-bit ADD instructions of the HL ì
register pair to perform bit shifts. An example of this appeared ì
in the CALST routine. In this case, the DE register pair was ì
rotated one bit to the left with sequential RL E, RL D ì
instructions, with the Carry bit shifted into the HL register ì
pair. Where the original code used the sequence: RL L, RL H to ì
shift the bit into the HL pair, a two byte code savings was ì
achieved with the single two-byte ADC HL,HL instruction.
Another area where considerable code and time savings were ì
realized was in the consolidation of routines into "straight-ì
line" code. While this seems to be an anathema to structured ì
programmers, it is often a must to obtain the performance ì
improvements which we sought from our efforts. As a first step, ì
all routines ending in Jump instructions were examined. Target ì
addresses were then checked to insure that no other routine "fell ì
through" to them. If it was in fact a "stand-alone" routine, it ì
was moved to the end of the first routine so that the Jump could ì
be deleted. An example of this is where the INITDR routine was ì
moved to follow SELDK directly saving the two-byte relative jump ì
and 12 clock cycles. Other cases involving long jumps saved three ì
bytes and 10 clock cycles. A minor variation in relocation of ì
code is to group functions to bring them within range of relative ì
jumps thereby saving one byte at the expense of two clock cycles. ì
This minor penalty in time often outweighed the value of a single ì
byte of code in our efforts.
A variant on this concept involved examining sequences of code ì
for duplicity, and combining identical sequences into new ì
routines which "fall through" to the destination. This was amply ìèused to define a new routine:
SRCT15: LD A,15
CALL SEARCH
This sequence was placed immediately before the TSTFCT routine, ì
and replaced three occurrences of:
LD A,15
CALL SEARCH
CALL TSTFCT
with a single CALL to SRCT15. The overall effect of this one ì
change was a savings of 10 bytes of code and 24 clock cycles for ì
each of the three sequences replaced.
Detailed examination of code also produced unexpected savings by ì
merely defining new labels. As an example, the last three ì
instructions of the routine OPENEX were:
LD A,0FFH
LD (PEXIT),A
RET
This sequence occurred two other times in the original code, and ì
three times in the latest version of ZSDOS. The last two ì
instructions were repeated in many locations, so one location was ì
selected (centrally located to take advantage of relative jumps), ì
with other instances accessing it with a call or jump to the new ì
label, SAVEA. Setting the value to 0FFH in OPENEX was labeled as ì
SETCFF, and the other two occurrences jumping to this location. ì
While a small time penalty was incurred in jumping to this common ì
code, the three byte savings was again needed to add features.
Our code "walk-throughs" and optimization efforts did not stop ì
with the original code, but continued with every test version. ì
First, we discovered a common "shell" of instructions around the ì
DELETE, CSTAT, and RENAME functions and combined them with a net ì
savings of 12 bytes. Later, a trick used in public-domain inline ì
print routines to pass addresses on the processor's stack was ì
used to recover five bytes of code by replacing three sequences ì
of:
LD HL,(address)
JR COMCOD
with three 3-byte CALL COMCOD instructions. The trick involved in ì
this change was to place the CALLs immediately in front of the ì
routines whose addresses were to be passed to COMCOD. When ì
executed, the CALL placed the routine address on the stack. A ì
one-byte POP HL instruction at the beginning of COMCOD completed ì
the change by placing the address in the desired HL register. ì
Still later, the internal code in the COMCOD routine was again ì
optimized to remove several memory references. This saved another ì
four bytes.è
Cameron's rewrite of the Console IO routines demonstrated another ì
technique of reducing code size with very little overhead. The ì
majority of affected code involved different DOS commands, yet ì
exited through common code with absolute jumps. By PUSHing the ì
exit address on the stack prior to jumping to the routines, a ì
simple RETurn instruction sufficed to direct execution through ì
the exit code saving two bytes per occurrence. The four bytes ì
required to set the return address meant that the code size ì
break-even point occurred at two instances. Since far more cases ì
than that were involved, a significant code size reduction was ì
realized. For DOS function calls, the time penalty incurred was ì
21 clock cycles, however, that was not considered significant ì
when dealing with the normal serial IO devices used in console ì
functions.
A final noteworthy trick was added by Cameron which neither of us ì
had ever seen documented in the Z80 world. It used the sixteen-ì
bit load instruction into the IX register (a four byte ì
instruction) to "fall through" successive 16-bit loads to the ì
primary registers. In this fashion, the sequence:
CMND27: LD HL,(ALV)
JR SAVHL
CMND24: LD HL,(LOGIN)
JR SAVHL
CMND31: LD HL,(IXP)
JR SAVHL
CMND47: LD HL,(DMA)
SAVHL: LD (PEXIT),HL
RET
was replaced by a more efficient (in code size) construct. The ì
bytes, as coded, are on the left, with the instructions seen by ì
CMND27 shown on the right:
CMND27: LD HL,(ALV) CMND27: LD HL,(ALV)
DEFB 0DDH LD IX,(LOGIN)
CMND24: LD HL,(LOGIN)
DEFB 0DDH LD IX,(IXP)
CMND31: LD HL,(IXP)
DEFB 0DDH LD IX,(DMA)
CMND47: LD HL,(DMA)
SAVHL: LD (PEXIT),HL LD (PEXIT),HL
RET RET
This code works because the IX register is not used in the ì
remainder of the exit code, and the entry IX value is restored ì
upon returns from ZSDOS functions. Each cascaded value saves one ì
byte of code, but adds additional clock cycles to the execution ì
time. Where the original code required a constant 28 clock cycles ì
before arriving at the SAVHL routine, the new code execution time ìèis different for each entry point. In this example, the time (in ì
clock cycles) required for each entry point to arrive at SAVHL ì
is:
CMND47 - 16 cycles
CMND31 - 20 + 16 = 36
CMND24 - 20 + 20 + 16 = 56
CMND27 - 20 + 20 + 20 + 16 = 76
At this point, an analysis of probable calling frequency was done ì
to order the calls so that the most frequently used functions ì
would incur the least penalty. The ordering shown here was judged ì
to be the optimum sequence.
In a similar manner, eight-bit loads of the A register were ì
consolidated at the beginning of the SEARCH routine. Our analyses ì
of the code showed that SEARCH was called several times with ì
values of 12 and 15 in the A register. Loading of these values ì
was relocated to the beginning of SEARCH, then consolidated with ì
another single-byte DEFB prefix. The resultant code as entered, ì
and as seen by SEAR12 is:
SEAR12: LD A,12 SEAR12: LD A,12
DEFB 21H LD HL,0F3EH
SEAR15: LD A,15
SEARCH: ... SEARCH: ...
Instead of posing a time penalty as the LD IX,nn trick described ì
above, this case saved one byte over a relative jump and two ì
clock cycles (JR = 12 cycles, LD HL,nn = 10 cycles). As above, ì
this worked because the HL register contents were "don't care" ì
upon entry to the SEARCH routine.
These techniques are very powerful when code size is at a ì
premium. Any sequence of code that loads a register or register ì
pair then jumps or calls a common routine is a candidate for this ì
technique. You need a register pair to throw away, but this is ì
usually easy to find.
The final case of optimization is the most difficult, and ì
involved complete logic redesigns. This area is so specific and ì
lengthy that it will not be covered here. As so often stated in ì
textbooks, it is "left as an exercise for the reader" to examine ì
the original P2DOS source and identify areas which can be ì
redesigned. Much logic redesign was required as a part of the ì
added ZSDOS and ZDDOS features, though the effort didn't stop ì
there.
Just as important as what we did to gain speed and reduce size is ì
what we didn't do. P2DOS originally used some self modifying code ì
in the error printing routine. We decided from the outset that we ì
would avoid this practice (tempting though it is..) in order to ì
produce code that could be ROMed and/or run on the Z280 in ì
protected mode. This decision cost us several bytes of code, but ì
allowed us to accomplish our goals.è
PROGRAMMING FOR ZSDOS.
ZSDOS places a few restrictions on systems which do not exist in ì
other CP/M compatible operating systems. The most significant is ì
that the BIOS MUST NOT DISTURB THE IX REGISTER. So far, the Epson ì
QX-10 and Zorba computers have been identified as having BIOSes ì
that corrupt this register. With NZCOM, we have developed a ì
"protective" NZBIOS (look for ZSNZBI12.LBR on most Z-Nodes) that ì
shields the Z80 registers from ill-behaved BIOSes, but operation ì
without NZCOM on such systems will require that the BIOS be re-ì
written.
On this topic, we would like to propose that all programmers ì
observe register usage more closely. The Z80 alternate and index ì
registers belong to APPLICATION programs, and must be preserved ì
by all operating system components. On the other hand, the "I" ì
and "R" registers, as well as all new 64180 and Z280 registers ì
(with the exception of the Z280's SSP) belong to the BIOS since ì
they are hardware specific and directly I/O related. The Z280 SSP ì
should be reserved for BDOS use.
Before trying to access any of the expanded ZSDOS features ì
discussed in the last issue, you should first insure that the ì
program is in fact executing under ZSDOS. This is a two-step ì
procedure involving a call to check for CP/M 2.2, then a call to ì
the ZSDOS Return Version function. By checking in this manner, ì
your program will be able to identify CP/M 1, 2 and 3 (aka Plus) ì
as well as ZSDOS, ZDDOS and ZRDOS. Code to accomplish this task ì
is:
LD C,12 ; Return CP/M Version
CALL 0005 ; ..via BDOS
CP 30H ; Is it CP/M Plus?
JR NC,ISCPM3 ; ..jump if so
CP 20H ; Is it CP/M 1.x?
JR C,ISCPM1 ; ..jump if so w/version # in A
CP 22H ; Is it CP/M 2.2?
JR NZ,BADVER ; ..jump to unknown 2.x version
LD C,48 ; Now make the extended call
CALL 0005 ; ..via BDOS
LD A,H ; Check the DOS type first
CP 'D' ; Is it ZDDOS?
JR Z,ISZD ; ..jump if so, Ver # in L
CP 'S' ; Is it ZSDOS?
JR Z,ISZS ; ..jump if so, Ver # in L
OR A ; Is it ZRDOS?
JR Z,ISZR ; ..jump if so, Ver # in L
... ; Else can't identify, do error
Bridger Mitchell's Advanced CP/M column in TCJ #36 also provides ì
sample code to perform this function. A slight variation on the ì
above sequence is used in utilities provided with ZSDOS to enable ì
them to work under a variety of different operating systems. We ì
propose that this technique be used for any future Disk Operating ì
systems by returning a different unique character in the "H" ìèregister.
Many programs in the past have relied on unpublished locations ì
within the BDOS to alter the performance or functionality of the ì
system. With ZSDOS, we provide published "standard" ways to ì
dynamically tailor DOS parameters. The most important way of ì
accomplishing this is with a set of configuration bits, or flags. ì
To accommodate future expansion, a word value of sixteen bits is ì
defined with only the lower seven used in the current 1.0 ì
release. The Flag bits used in ZSDOS 1.0 are:
D D D D D D D D
7 6 5 4 3 2 1 0
\ \ \ \ \ \ \ \_Public File Access
\ \ \ \ \ \ \__Public/Path Write
\ \ \ \ \ \___Read-Only Disk
\ \ \ \ \____Fast Fixed Disk Relog
\ \ \ \_____Disk Change Warning
\ \ \______BDOS Search Path *
\ \_______Path w/o SYS Attribute *
\________(Reserved)
The cited function is activated by setting the respective bit to ì
a "1", and disabled by clearing the bit to a "0". Since ZDDOS has ì
no search path capability, the features marked with an asterisk ì
pertain only to the full ZSDOS configuration, and are "don't ì
care" bits in ZDDOS. The bits will be returned as the lower byte ì
in the 16-bit word field in the "L" register. Code for returning ì
them is:
LD C,100 ; Get the FLAGS bits
CALL 0005 ; ..with DOS call
... ; "L" has present 7 bits
Likewise, the flags may be set from applications programs with ì
Function 101 as:
LD DE,(FLAGS) ; 1.0 only recognizes byte in E
LD C,101 ; Now set flags in ZSDOS
CALL 0005 ; ..with DOS call
... ; New settings are now effective
Date and Time capabilities are just as easily accessed. The 6-ì
byte Clock data may be retrieved to a specified buffer with DOS ì
Function 98 as:
LD DE,TIMEAD ; Address of 6-byte buffer
LD C,98
CALL 0005 ; Read Clock from DOS
INC A ; Any Errors? (FF --> 0)
JR Z,ERROR ; ..jump if error (no clock?)
... ; Else use the retrieved time
TIMEAD: DEFB 0,0,0,0,0,0 ; Initialized Null DateSpec
With the File Date Stamping capabilities of ZSDOS, we developed a ìèsingle standardized way of accessing individual file stamps. ì
Function 102 will copy the set of stamps for a specified file to ì
the current DMA address, while 103 will set the stamps for the ì
specified file to the values at the current DMA address. Since ì
all supported stamping methods (currently DateStamper(tm) and the ì
CP/M Plus compatible P2DOS) feature the same format at the ZSDOS ì
level, no user conversions are needed. Indeed, using special ì
stamp drivers provided with the ZSDOS package, either stamp type ì
may be read with both being written by Function 103 if the ì
destination disk has been so prepared. A sample of code used to ì
copy stamp data from one file to another is:
LD DE,DSBUF ; Point to 15-byte stamp buffer
LD C,26 ; ..and set the DMA address
CALL 0005
LD DE,SRCFCB ; Source FCB (User set already)
LD C,102 ; Get the source's Stamps
CALL 0005
... ; Set User to destination?
LD DE,DSTFCB ; Destination FCB
LD C,103 ; Write Stamps from DMA buffer
CALL 0005 ; ..to Dest file
...
FINAL THOUGHTS.
ZSDOS was a labor of love. Though we didn't really start out to ì
create such a significant step forward in 2.2 compatible BDOSes, ì
it turned out that way. It is our hope that the ideas presented ì
in ZSDOS will form the basis for the next generation of BDOS ì
replacements. If nothing else, we hope that ZSDOS stimulates the ì
Z80 compatible community to address the issues of standards for ì
datestamping, enhanced error handling, and global file access.
The next step for an improved operating system will be to break ì
the 64k barrier. Joe Wright and Jay Sage's efforts in dynamic ì
system configuration with NZCOM are very useful, but fail to ì
address the fundamental problem - we need to use the banked ì
memory featured in most newer systems. Furthermore, this must be ì
done in a way that allows existing applications to run properly. ì
This means (unlike CP/M Plus) a BDOS that lets BIOS deblock, a ì
BIOS jump table that is directly callable from all banks, system ì
vectors at the normal locations, etc. This also means ì
establishing standards for bank sizes and addresses, hardware and ì
processor independence, and finally universal DOS level and BIOS ì
level interfaces to banked memory. Other standards that will be ì
needed by the next generation of OS's include banked RSX ì
standards (though Bridger Mitchell and Malcom Kemp seem to have ì
this nailed down), banked device driver standards, and expanded ì
TCAPS and ENV definitions (aren't these properly BIOS structures ì
folks?). Now is the time to come together, speak up on these ì
matters, carefully weigh all alternatives, and make our wishes ì
known.
èAlso, we urge the community to support those doing active ì
development for our systems by purchasing legal copies of the ì
software you use. This will allow and encourage development of ì
things like a new, better, and faster banked systems with all the ì
goodies we really want. We applaud the efforts of MicroPro in ì
developing and releasing WordStar 4 for CP/M systems, and ì
encourage other vendors to update their CP/M offerings in the ì
fields of Database Management systems and Spreadsheets for the ì
new generation of systems. Further, let's agree to agree on what ì
we really want. In this manner, we can all concentrate our ì
efforts on applications programs, not rewriting BDOS. In short, ì
let's work together to create a computing environment that will ì
turn the big blue clones green with envy.
In conclusion, what started as independent "labors of love" to ì
produce a better operating system rapidly became identical ì
obsessions as we reverted to counting clock cycles and bytes. We ì
are satisfied with the results, and hope that others will benefit ì
from our work and produce smaller, faster and more full-featured ì
programs to help make our lives easier (and keep from emptying ì
our wallets with requirements for constant upgrades). Finally, we ì
must thank H.A.J. Ten Brugge for beginning this entire episode by ì
releasing P2DOS. Without his efforts, none of us (Cam, Hal and ì
Carson) would have been tempted into the area of operating system ì
authorship, and would have left it to "others" to determine what ì
we need in our respective systems.
APPENDIX: The hardware used in these analyses is:
System #1: MicroMint SB-180.
Processor: HD64180 operating at 6.144 MHz clock rate with
No memory wait states and 2 IO wait states.
Console: Serial Console connected to ACSI port 1 at 19.2
kbps, Interrupt-driven buffered keyboard input.
Interfaces: ETS180 IO+ providing SCSI interface and RTC.
CCP: ZCPR 3.3 with full environment.
BIOS: MicroMint 2.7 modified / XSystems XBIOS 1.1.
Search Path: $$:, A15: (Current Drive & User, then A15:)
Hard Disk: Syquest SQ-306R 5 Megabyte removeable-media,
Interleave of 3, 12 microsecond buffered seek,
Adaptec 4010 controller.
A: 1576k of 2552k free, 94 files, 68 in User 15.
B: 2432k of 2568k Free, 17 files, 16 in User 1.
Floppy Disks: A: NEC 80-track DSDD, 4 mS step, 4 mS Head Load,
16k of 782k free, 93 files, 68 in User 15.
C: Shugart SA465 80-track DSDD, 6mS step, 736k of
782k Free, 17 files in User 1.
System #2: Ampro Little Board 1A.
Processor: Z80A operating at 4.0 MHz.
Console: Serial Console connected to DART port 1 at 9600
baud, hardware handshake enabled.
Interfaces: SCSI daughter board with NCR 5830 driving 1610-4ìè controller.
CCP: ZCPR 3.4 with full environment.
BIOS: Ampro V3.8/NZCOM.
Search Path: $$:, A2:, A0: (Current Drive & User, then A2, A0:)
Hard Disks: Seagate ST-225 20 Megabyte, interleave of 2,
200 microsecond buffered seek, Shugart 1610-4
controller. A Shugart 5Mb full height drive was
also connected to the controller, but was not
used in the test.
A: 2744k of 8160k free, 425 files, 77 in User 2.
C: 984k of 4192k free, 258 files, 32 in User 3.
Floppy Drives: A: Teac 55F 80 track DSDD, 6 mS step, 10k of
782k free, 74 files.
B: Teac 55F 80 track DSDD, 6 mS step, 736k of
782k free, 17 files in User 0.
System #3: Homebrew SB-180 compatible.
Processor: Z-180 operating at 9.216 MHz clock rate with
No memory wait states and 3 IO wait states.
Console: Serial Console connected to ACSI port 1 at 19.2
kbps, Interrupt-driven buffered keyboard input.
Interfaces: ETS180 IO+ providing SCSI interface and RTC.
CCP: ZCPR 3.0 with full environment.
BIOS: MicroMint 2.7 modified / XSystems XBIOS 1.1.
Search Path: A15: (ZCPR 3.0 searches current, then A15:)
Hard Disk: Shugart SA-712 10 Megabyte, Interleave of 1,
12 microsecond buffered seek, Shugart 1610-3
controller.
A: 324k of 2552k free, 179 files, 101 in User 15.
D: 252k of 2792k Free, 438 files, 16 in User 5.