home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Crawly Crypt Collection 1
/
crawlyvol1.bin
/
program
/
books
/
68k_book
/
arp_doc
/
chap_06.doc
< prev
next >
Wrap
Text File
|
1985-11-20
|
91KB
|
2,172 lines
Atari ST Machine Specific Programming In Assembly
Chapter 6: Selections Based On Speed
Reducing Program Loading Time
As I was concluding chapter 3, I intended to indicate
that, in many ways, this chapter is an extension of that
one. For example, the first thing that I shall do here is
to resume the discussion of assembly modes. As I stated
there, the first step involved in rapid execution is rapid
loading. A program cannot even begin run until it has been
moved from disk storage to ram.
For programs which simply load, execute and terminate,
without spending much time in ram, the time required to load
the program may be a significant portion of the total time
involved in program execution. Many of the programs that
are executed from an AUTO folder during boot are of this
type. It seems reasonable, therefore, that discussions
which involve the reduction of loading time precede
discussions involving the reduction of execution time.
As I mentioned in program SPEED_1's documentation, the
time required to load a particular program from disk to ram
depends on the assembly mode which produced the code; the
type of drive from which it is loaded, hard disk or floppy;
the method used to format the disk, such as ST normal,
Double Click's DC Formatter or David Small and Dan Moore's
Mega-Twister; the position of the file on the disk, which
could include fragmentation; and, if the program is spawned
by another, the distance between parent and child, which
could include intermedia displacement.
The Effects of Assembly Modes
Programs 23 and 24 have been prepared for use in
comparing the load and execution times of a program
assembled in three assembly modes. Program 23 should be
assembled in the Relocatable mode to produce PRG_2CC.TOS.
Program 24 should be assembled in AssemPro's PC-relative
mode to produce PRG_2CP.TOS, then the name of the source
should be changed to PRG_2CR, while it is still in the
editor, so that it can be assembled in the Relocatable mode
to produce PRG_2CR.TOS.
I obtained the load and execute times for the three TOS
programs using the following method to isolate the
experiment from the intermedia displacement variables
discussed above. Execution results follow the appropriate
listings.
1. Copy SPEEDTST.TTP to a blank hard disk partition or
floppy disk.
2, Copy PRG_2CC.TOS to the same hard disk partition or
floppy disk.
3. Execute SPEEDTST.TTP and type PRG_2CC.TOS on the
command line.
4. Copy PRG_2CC.DAT to a safe place.
5. Remove PRG_2CC.TOS and PRG_2CC.DAT from the
partition or floppy.
6. Repeat steps 2 through 5 for PRG_2CP.TOS and
PRG_2CR.TOS.
Program 23. Source file for PRG_2CC.TOS
; Program Name: PRG_2CC.S
; Version: 1.002
; Assembly Instructions:
; Assemble in Relocatable mode and save with a TOS extension.
; Execution Instructions:
; Execute program SPEEDTST.TTP and type PRG_2CC.TOS on the input
; parameter line. SPEEDTST.TTP will produce a data file named PRG_2CC.DAT
; on disk. You will be able to compare the data for this program to that
; produced for programs PRG_2CP.TOS and PRG_2CR.TOS.
; Program Function:
; Statements within a nested loop structure are executed 50,000 times
; so that the load and execution time of this program can be compared with
; similar programs assembled in the PC-relative and Relocatable modes.
store_after_load_time:
trap #3 ; Returns value of system clock in D0.
lea after_load_time(pc), a0
move.w d0, (a0)
move.w #9, d1 ; Initialize outer loop counter.
outer_loop: ; Loop ten times.
move.w #49999, d0 ; Initialize inner loop counter.
inner_loop: ; Loop 50,000 times.
move.l #label, a0 ; Can't use (pc) here.
lea label(pc), a0
move.l label(pc), a0
move.l #label, -(sp) ; Can't use (pc) here.
pea label(pc)
move.l label(pc), -(sp)
lea $C(sp), sp ; Reposition stack pointer to top of stack.
dbra d0, inner_loop ; Loop back until D0 = -1.
dbra d1, outer_loop ; Loop back until D1 = -1.
terminate:
move.w after_load_time(pc), -(sp) ; Pass after load time to SPEEDTST.TTP.
move.w #$4C, -(sp) ; Function = p_term = GEMDOS $4C.
trap #1
data
; NOTE: Below, the variable "label" is supposed to be a pointer to the
; variable "after_load_time". If this program is assembled in
; Relocatable mode, the "run time" address of "after_load_time" will be
; stored in the 4 bytes declared at "label" when the program is loaded
; from disk to ram.
; But, if the program is assembled in PC-relative mode, the "run time"
; address will not be stored there; instead, the "assembly time" address
; will be stored in the 4 bytes. That is undesirable.
label: dc.l after_load_time ; This works for COMBO assembly.
bss
after_load_time: ds.w 1
end
SPEEDTST.TTP Execution Results
for PRG_2CC.TOS, loaded from drive: G
Load time: 1995 milliseconds
Execution time: 7485 milliseconds
Program 24. Source file for PRG_2CP.TOS and
PRG_2CR.TOS.
; Program Name: PRG_2CP.S
; Version: 1.002
; Assembly Instructions:
; Assemble in PC-relative mode and save with a TOS extension, then
; change the name to PRG_2CR, assemble in Relocatable mode and save as
; PRG_2CR.TOS. This will allow you to prepare two object code files from
; the same source file.
; Execution Instructions:
; Execute program SPEEDTST.TTP and type PRG_2CP.TOS on the input
; parameter line. SPEEDTST.TTP will produce a data file named PRG_2CP.DAT
; on disk. You will be able to compare the data for this program to that
; produced for programs PRG_2CC.TOS and PRG_2CR.TOS.
; Do the same with PRG_2CR.TOS to produce a data file named PRG_2CR.DAT.
; You can view the DAT files with an editor, from the desktop using the Show
; function or by printing them.
; Program Function:
; Statements within a nested loop structure are executed 50,000 times
; so that the load and execution time of this program can be compared with
; similar programs assembled in the Relocatable and Combo modes.
store_after_load_time:
trap #3 ; Returns value of system clock in D0.
lea after_load_time, a0
move.w d0, (a0) ; Store time in variable "after_load_time".
move.w #9, d1 ; Initialize outer loop counter.
outer_loop: ; Loop ten times.
move.w #49999, d0 ; Initialize inner loop counter.
inner_loop: ; Loop 50,000 times.
move.l #label, a0
lea label, a0
move.l label, a0
move.l #label, -(sp)
pea label
move.l label, -(sp)
lea $C(sp), sp ; Reposition stack pointer to top of stack.
dbra d0, inner_loop ; Loop back until D0 = -1.
dbra d1, outer_loop ; Loop back until D1 = -1.
terminate:
move.w after_load_time, -(sp) ; Pass after load time to SPEEDTST.TTP.
move.w #$4C, -(sp) ; Function = p_term = GEMDOS $4C.
trap #1
data
; NOTE: Below, the variable "label" is supposed to be a pointer to the
; variable "after_load_time". If this program is assembled in
; Relocatable mode, the "run time" address of "after_load_time" will be
; stored in the 4 bytes declared at "label" when the program is loaded
; from disk to ram.
; But, if the program is assembled in PC-relative mode, the "run time"
; address will not be stored there; instead, the "assembly time" address
; will be stored in the 4 bytes. That is undesirable.
label: dc.l after_load_time ; This does not give desired response for
bss ; PC-relative assembly, it works for
after_load_time: ds.w 1 ; Relocatable assembly.
end
SPEEDTST.TTP Execution Results
for PRG_2CP.TOS, loaded from drive: G
Load time: 35 milliseconds
Execution time: 7475 milliseconds
SPEEDTST.TTP Execution Results
for PRG_2CR.TOS, loaded from drive: G
Load time: 1995 milliseconds
Execution time: 8510 milliseconds
A comparison of load times is dramatic. When the
program is assembled in AssemPro's PC-relative mode, the
time required to load is 1960 msec (about 2 seconds) less
than it is when the program is assembled in Relocatable
mode. And, although the use of pc-relative addressing in a
program that is assembled in Relocatable mode does not
reduce load time, no less dramatic is its effect on
execution time. Because for even so short a program as
this, it reduces the execution time from 8510 msec to 7485
msec, effecting a 1 second savings.
Why PC-relative Programs Load Faster
As I mentioned in chapter 2, part of the extra load
time (or that which I have chosen to designate as load time,
but which could as easily be called start-up time) required
by programs assembled in the Relocatable mode is due to the
task of program relocation itself. I have since received
information from a reader which further clarifies the
comparison between the two assembly modes. This reader is
well known for his hard disk backup utility, TURTLE.PRG. I
speak, of course, of George Woodside.
I now paraphrase the information that Mr. Woodside was
kind enough to share with me. It seems that when a
Relocatable program is executed the operating system clears
all unoccupied ram before the onset of execution. In
addition, Mr. Woodside went on to explain that PC-relative
LSR programs which are executed from the AUTO folder can't
be deleted because the operating system does not close those
files while they are still ram resident. I have mentioned
this phenomenon elsewhere, but I have not known the reason
until now.
I had planned to write a few programs that would
graphically display the information presented by Mr.
Woodside, but something has occurred which probably renders
such explorations irrelevant from my position. Since
receiving his letter I have been forced to purchase a MEGA 2
because I needed the additional memory. And, while the
differences in loading and execution times are still
noticeable, they are not as dramatic as they were with the
1040ST. Therefore, I simply rest the matter by saying that
there is a difference and that difference is reported as I
experienced it.
The Effects of Floppy Disk Formatting Techniques
The first thing I'm going to do in this section is to
refer you to a couple of START magazine articles written by
Dave Small and Dan Moore: Hard Disk Warfare, Spring, 1987
and Let's Twist Again, Summer, 1988. Those articles will
help you to understand why this section is included in the
book. Briefly, there is more than one way to format floppy
disks for the ST. I have designed two experiments which
permit you to compare the effects of two non-standard
methods to that of the standard ST method.
I refer to the formatted disk you obtain when you
format from the desktop by selecting the Format option under
the File menu as standard ST. I refer to the formatted disk
you obtain by any other method as non-standard. The non-
standard formatting methods used in the experiment are Mega
Twister, included on START magazine's Summer, 1988 disk and
DC Formatter Version 2.2, included on ST Informer's PDM 188
disk.
The most extensive reference to DC Formatter that I
could find occurs in the July/August 1988 issue of RESET
magazine, on page 11, within the Public Beat column. If
this article is not readily available to you, I suggest that
you be not concerned about it. While the author's
enthusiasm for DC Formatter is undoubtedly genuine, and,
although I have found DC Formatter version 2.2 to format
faster than Mega Twister (1 minute 7 seconds versus 1 minute
45 seconds), and, although the data in figures 6.1 and 6.2
seem to confirm that DC Formatter may indeed write to
floppies slightly faster than Mega Twister, the data does
not confirm the author's apparent conclusion that DC
Formatter is tremendously faster than Mega Twister.
A new version of DC Formatter, Version 3.0, is
discussed in the December, 1988 issue of ST Informer.
Before deciding to use this method of formatting disks, I
suggest that you read the information on page 33 of that
issue, under the heading DCFRMT3.ARC. There you will find a
statement concerning the incompatibility of the earlier
version with GDOS. Furthermore, I suggest that you use any
non-standard method of formatting disks with all of the
caution the word non-standard connotes.
Program 25 has been designed to permit a comparison of
the load, write and read times associated with each disk
formatting method. The steps used to isolate the program on
the appropriate disks are discussed within the program's
documentation. Program 25 invokes trap calls to custom trap
#10, which is installed by executing TRAP_10.PRG. The
source listing for TRAP_10.PRG follows the listing for
program 25.
Program 25. This program is used to compare the effects
of three floppy disk formatting techniques.
; Program Name: PRG_2DP.S
; Version 1.003
; Assembly Instructions:
; Assemble in PC-relative mode and save with a TOS extension.
; Execution Note:
; This program invokes custom traps which must be installed by
; TRAPS.PRG and TRAP_10.PRG prior to its execution. Execute this program
; by typing its name on SPEEDTST.TTP's command line.
; But before executing this program, prepare three floppy disks. The
; first should be formatted from the desktop, using the ST formatting
; algorithm. The second should be formatted with a version of DCFORMAT,
; available from ST Informer, 909 NW Starlite Place, Grants Pass, OR 97526,
; (503)476-0071 on disk PDM 188 or PDM 1288. The price of each disk from
; ST Informer is $6.00 for non-subscribers. You can subscribe for $18.00
; per year and get a free disk coupon. I highly recommend a subscription.
; The third disk should be formatted with Dave Small and Dan Moore's
; Twister program available on START magazine's Summer, 1988 issue. You
; can find an order form in any issue of START or phone (800)234-7001.
; Copy PRG_2DP.TOS to each blank disk. Copy SPEEDTST.TTP to each disk.
; Execute SPEEDTST.TTP on each disk in turn, typing PRG_2DP.TOS on its
; command line. Compare the contents of the WRITE_1 and WRITE_2 files of
; each disk to verify that they are identical. Compare the PRG_2DP.DAT file
; on each disk to the others.
; Program Function:
; This program writes data to WRITE_1.DAT, reads WRITE_1.DAT, then
; writes what it has read to WRITE_2.DAT to confirm that it has correctly
; written and read the data declared within the data section of the program.
; SPEEDTST.TTP will report the load and total execution time for this
; program.
; Within the program, the time to write the data to WRITE_1.DAT will be
; calculated and reported; and the time to read the data from WRITE_1.DAT
; will also be calculated and reported; finally, the time to write the data
; to WRITE_2.DAT will be calculated and reported.
; This program is to be used to compare the write to and read from times
; involving three floppy disks, each of which has been formatted with a
; different formatting algorithm.
fetch_load_time:
trap #3 ; Returns value of system clock in D0.
release_excess_memory: ; Also stores after-load time in TRAPS bss.
lea program_end, a0 ; Put "end of program" address in A0.
movea.l 4(a7), a1 ; Put "basepage" address in A1.
trap #6 ; Calculate program size and release memory.
lea stack, a7
print_heading:
lea heading, a0
bsr print_string
fetch_write_start_time:
trap #3
lea write_start_time, a0
move.w d0, (a0)
create_file_1:
move.w #0, -(sp) ; File attribute = read/write.
pea file_1_name ; For WRITE_1.DAT.
move.w #$3C, -(sp) ; Function = f_create = GEMDOS $3C.
trap #1 ; File handle is returned in D0.
addq.l #8, sp
lea file_1_handle, a0 ; Store returned file handle.
move.w d0, (a0)
write_to_file_1:
pea string ; Push address of buffer.
move.l #451, -(sp) ; Number of bytes to write.
move.w d0, -(sp) ; File handle to be written to.
move.w #$40, -(sp) ; GEMDOS function = write.
trap #1
lea $C(sp), sp ; Reposition stack pointer to top of stack.
close_file_1:
move.w file_1_handle, -(sp)
move.w #$3E, -(sp) ; Function = GEMDOS $3E = f_close.
trap #1
addq.l #4, sp
get_end_time:
trap #3
sub.w write_start_time, d0 ; Subtract start time from end time.
ext.l d0 ; Extend to 32 bits
trap #10 ; Convert to milliseconds and print.
set_dta:
pea dta ; dta = address of 44 byte buffer.
move.w #$1A, -(sp) ; GEMDOS function = set dta.
trap #1
addq.l #6, sp
print_read_time_label:
lea read_msg, a0
bsr print_string
fetch_read_start_time:
trap #3
lea read_start_time, a0
move.w d0, (a0)
; NOTE: Reading is so fast, must loop to accumulate enough time to measure.
move.w #99, d3 ; Set up counter for 100 loops.
search_for_file:
move.w #0, -(sp) ; Attribute = normal access.
pea file_1_name ; Name of file to search for.
move.w #$4E, -(sp) ; GEMDOS function = search first.
trap #1
addq.l #8, sp
tst d0
bne.s not_found
read_WRITE_1_DAT:
lea dta, a0
pea buffer
move.l $1A(a0), -(sp) ; Number of bytes to read.
move.w file_1_handle, -(sp) ; File to read.
move.w #$3F, -(sp) ; GEMDOS function = read.
trap #1
lea $C(sp), sp ; Reposition stack pointer.
dbra d3, search_for_file
_get_end_time:
trap #3
sub.w read_start_time, d0 ; Subtract start time from end time.
ext.l d0 ; Extend to 32 bits
trap #10 ; Convert to milliseconds and print.
print_write_time_label:
lea write_msg, a0
bsr print_string
_fetch_write_start_time:
trap #3
lea write_start_time, a0
move.w d0, (a0)
create_file_2:
move.w #0, -(sp) ; File attribute = read/write.
pea file_2_name ; For WRITE_2.DAT.
move.w #$3C, -(sp) ; Function = f_create = GEMDOS $3C.
trap #1 ; File handle is returned in D0.
addq.l #8, sp
lea file_2_handle, a0 ; Store returned file handle.
move.w d0, (a0)
write_to_file_2:
lea dta, a0
pea string ; Push address of buffer.
move.l $1A(a0), -(sp) ; Number of bytes to write.
move.w d0, -(sp) ; File handle to be written to.
move.w #$40, -(sp) ; GEMDOS function = write.
trap #1
lea $C(sp), sp ; Reposition stack pointer to top of stack.
close_file_2:
move.w file_2_handle, -(sp)
move.w #$3E, -(sp) ; Function = GEMDOS $3E = f_close.
trap #1
addq.l #4, sp
_get__end_time:
trap #3
sub.w write_start_time, d0 ; Subtract start time from end time.
ext.l d0 ; Extend to 32 bits
trap #10 ; Convert to milliseconds and print.
not_found:
trap #8 ; Terminate.
print_string:
pea (a0)
move.w #9, -(sp)
trap #1
addq.l #6, sp
rts
data
file_1_name: dc.b 'WRITE_1.DAT',0
file_2_name: dc.b 'WRITE_2.DAT',0
heading: dc.b 'PRG_2DP.TOS Execution Results',$D,$A,$D,$A
dc.b ' Time to create, write and close WRITE_1.DAT: ',0
read_msg dc.b ' Time to read WRITE_1.DAT into buffer 10 times: ',0
write_msg: dc.b ' Time to create, write and close WRITE_2.DAT: ',0
string: dc.b ' This paragraph will be written to a disk file named '
dc.b 'WRITE_1.DAT. The time ',$D,$A
dc.b ' required to write the paragraph will be reported in '
dc.b 'file PRG_2DP.DAT.',$D,$A
dc.b ' Then the contents of WRITE_1.DAT will be read into a '
dc.b 'buffer. The time ',$D,$A
dc.b ' required to read the contents of the file will be '
dc.b 'reported in file ',$D,$A
dc.b ' PRG_2DP.DAT. Finally, the contents of the buffer '
dc.b 'will be written to ',$D,$A
dc.b ' WRITE_2.DAT so that what has been read can be compared '
dc.b 'to what was written.',$D,$A,$1A
; NOTE: The ASCII code for ^Z (control Z) is normally used to mark the end
; of a file so that a program reading the file may look for that mark.
bss
align
file_1_handle: ds.w 1
file_2_handle: ds.w 1
write_start_time: ds.w 1
read_start_time: ds.w 1
dta: ds.b 44
buffer: ds.b 452
ds.l 96
stack: ds.l 0
program_end: ds.l 0
end
Program 26. The custom trap #10 algorithm.
; Program Name: TRAP_10.S
; Version 1.001
; Assembly Instructions:
; Assemble in PC-relative mode and save with a PRG extension.
; Program Function:
; This is a LSR program that establishes a user defined trap. It may be
; executed from the desktop, but you may prefer to copy it to the AUTO
; folder of your boot partition or floppy disk so that it will execute
; automatically during boot.
; Trap #10 is used by programs which time an interval. This trap
; converts the interval passed in register D0 to milliseconds, then it
; prints the ASCII decimal value of that interval in milliseconds.
; This program invokes a custom trap that is established by TRAPS.PRG,
; therefore, that program must be executed before trap #10 is invoked by a
; program.
program_start: ; Calculate program size and retain result.
lea program_end, a3 ; Fetch program end address.
suba.l 4(a7), a3 ; Subtract basepage address.
enter_supervisor_mode:
move.l #0, -(sp) ; The zero turns on supervisor mode.
move.w #$20, -(sp) ; Function = super = GEMDOS $20.
trap #1 ; Go to supervisor mode.
addq.l #6, sp ; Supervisor stack pointer (SSP) returned in D0.
movea.l d0, a5 ; Save SSP in scratch register.
install_trap_10_routine: ; Note: pointer = vector = pointer.
lea $A8, a0 ; Fetch trap #10 pointer address.
lea trap_10_routine, a1 ; Fetch address of trap #10 routine.
move.l a1, (a0) ; Store trap address at pointer address.
enter_user_mode:
pea (a5) ; Restore supervisor stack pointer.
move.w #$20, -(sp) ; Function = super = GEMDOS $20.
trap #1 ; Go to user mode.
addq.l #6, sp ; Reset stack pointer to top of stack.
relinquish_processor_control: ; Maintain memory residency.
move.w #0, -(sp) ; See page 121 of Internals book.
move.l a3, -(sp) ; Program size.
move.w #$31, -(sp) ; Function = ptermres = GEMDOS $31.
trap #1
trap_10_routine:
; Expects a binary value in D0 which represents a measured interval. This
; algorithm converts the value in D0 to milliseconds (msec) then prints the
; value in decimal msec.
preserve_value_in_d3:
move.l d3, -(sp)
convert_time_to_msec:
move.l d0, d3 ; Save copy in D0 to add.
asl.l #2, d3 ; Shift to multiply by 4.
add.l d0, d3 ; To complete multiplication by 5.
print_time:
cmpi.l #999, d3 ; If time is less than 1000, then
bgt no_space ; print a leading blank space for output
lea space, a0 ; alignment.
bsr print_string
cmpi.l #99, d3 ; If time is less than 100, then
bgt no_space ; print another leading blank space.
lea space, a0
bsr print_string
no_space:
move.l d3, d1 ; Move to D1 so can convert to ASCII decimal
trap #4 ; Returns address of decimal string in A0.
bsr.s print_string
lea units_label, a0
bsr.s print_string
move.l (sp)+, d3 ; Restore contents of D3.
rte
;
; Subroutine
;
print_string: ; Expects address of string to be in A0.
pea (a0) ; Push address of string onto stack.
move.w #9, -(sp) ; Function = c_conws = GEMDOS $9.
trap #1 ; GEMDOS call
addq.l #6, sp ; Reset stack pointer to top of stack.
rts
data
space: dc.b " ",0
units_label: dc.b " milliseconds", $D,$A,0
bss
align ; Align storage on a word boundary.
program_end: ds.l 0
end
Figure 6.1 shows the execution results of program 25
with the contents of memory location $444 unaltered (See
page 252 of the Internals book and Dave's Write-With-Verify
Lecture, page 88 of START's Spring, 1987 issue.). As you
will see, it makes sense to perform the experiment twice,
once with the ST standard value in this memory location and
once with the value zero stored there. And since that
subject has come up, this is a convenient spot at which to
introduce a program that I use to configure certain ST
system variables during boot. Refer to section 3.7 The ST
System Variables, pages 250-257 of the Internals book.
Program 27. A program used to configure system variables
during boot. I do not guarantee that this program will work
with all ST systems.
; Program Name: CONFIG.S
; Assembly Instructions:
; Assemble in PC-relative mode and save with a PRG extension. Move
; CONFIG.PRG to the AUTO folder of the boot disk.
; Program Purpose:
; Configures system variables.
mainline:
lea stack, a7 ; Point A7 to this program's stack.
enter_supervisor_mode:
move.l #0, -(sp) ; The zero turns on supervisor mode.
move.w #$20, -(sp) ; Function = super = GEMDOS $20.
trap #1 ; Supervisor stack pointer returned in D0.
addq.l #6, sp
; Algorithm 1:
; Turns off keyclick sound. Refer to page 254 of the Internals book. The
; system variable at address $484 is a byte length variable. The bits of
; this variable have the meanings as indicated in the Internals book. The
; bit of interest is #0. When this bit is a one, the computer emits a
; click each time a key is pressed. When the bit is a zero, these clicks
; are not emitted. A zero is placed in this bit by replacing the content
; of the byte at $484 (which is 7 before the replacement, if key click is
; enabled) with $6.
disable_key_click:
move.b #6, $484 ; Refer to page 254 of the Internals book.
; Algorithm 2:
; Performs the printer installation that is accomplished by CONTROL.ACC.
; The printer configuration table consists of one word, stored at $E4A.
; The bits of this word have the following meanings:
; BIT MEANING IF ZERO MEANING IF ONE
; --- --------------- --------------
; 0 Dot matrix Daisy printer
; 1 Black/white Color printer
; 2 1280 dots/line 960 dots/line
; 3 Draft mode Final mode
; 4 Printer port Modem port
; 5 Continuous feed Single sheet
; Bits 6 through 15 are not used.
install_printer:
move.w #$4, $E4A
; Algorithm 3:
; Turns off disk write verify.
disable_write_verify:
move.w #0, $444 ; Refer to page 252 of the Internals book.
return_to_user_mode:
move.l d0, -(sp) ; Restore "before call" SSP.
move.w #$20, -(sp) ; Function = super = GEMDOS $20.
trap #1
addq.l #6, sp
terminate:
move.w #0, -(sp) ; Function = p_term_old = GEMDOS $0.
trap #1 ; GEMDOS call.
ds.l 24 ; Stack.
stack: ds.l 1 ; Address of stack.
program_end: ds.l 0
end
Figure 6.1. Program 25's execution results with write-verify
active.
PRG_2DP.TOS Execution Results: Hard disk drive.
Time to create, write and close WRITE_1.DAT: 160 milliseconds
Time to read WRITE_1.DAT into buffer: 520 milliseconds
Time to create, write and close WRITE_2.DAT: 115 milliseconds
SPEEDTST.TTP Execution Results
for PRG_2DP.TOS, loaded from drive: G
Load time: 55 milliseconds
Execution time: 1025 milliseconds
PRG_2DP.TOS Execution Results: ST Standard Formatted.
Time to create, write and close WRITE_1.DAT: 3185 milliseconds
Time to read WRITE_1.DAT into buffer 10 times: 540 milliseconds
Time to create, write and close WRITE_2.DAT: 2650 milliseconds
SPEEDTST.TTP Execution Results
for PRG_2DP.TOS, loaded from drive: A
Load time: 385 milliseconds
Execution time: 7840 milliseconds
PRG_2DP.TOS Execution Results: Twister Formatted, Version 2.0.
Time to create, write and close WRITE_1.DAT: 3195 milliseconds
Time to read WRITE_1.DAT into buffer 10 times: 540 milliseconds
Time to create, write and close WRITE_2.DAT: 2260 milliseconds
SPEEDTST.TTP Execution Results
for PRG_2DP.TOS, loaded from drive: A
Load time: 365 milliseconds
Execution time: 7300 milliseconds
PRG_2DP.TOS Execution Results: DC Formatter, Version 2.2.
Time to create, write and close WRITE_1.DAT: 3185 milliseconds
Time to read WRITE_1.DAT into buffer 10 times: 540 milliseconds
Time to create, write and close WRITE_2.DAT: 2255 milliseconds
SPEEDTST.TTP Execution Results
for PRG_2DP.TOS, loaded from drive: A
Load time: 365 milliseconds
Execution time: 7275 milliseconds
Figure 6.2 shows the execution results of program 25
with the contents of memory location $444 equal to zero,
thereby disabling the write-verify function. Notice that
the state of the write-verify function does not seem to
effect disk access times when the program is executed from a
hard disk partition. The hard disk times of figures 6.1 and
6.2 are identical, considering the 5 msec resolution of the
system clock.
Figure 6.2. Program 25's execution results with write-
verify disabled.
PRG_2DP.TOS Execution Results: Hard disk drive.
Time to create, write and close WRITE_1.DAT: 160 milliseconds
Time to read WRITE_1.DAT into buffer 10 times: 520 milliseconds
Time to create, write and close WRITE_2.DAT: 110 milliseconds
SPEEDTST.TTP Execution Results
for PRG_2DP.TOS, loaded from drive: G
Load time: 60 milliseconds
Execution time: 1020 milliseconds
PRG_2DP.TOS Execution Results: ST Standard Formatted.
Time to create, write and close WRITE_1.DAT: 1590 milliseconds
Time to read WRITE_1.DAT into buffer 10 times: 540 milliseconds
Time to create, write and close WRITE_2.DAT: 1450 milliseconds
SPEEDTST.TTP Execution Results
for PRG_2DP.TOS, loaded from drive: A
Load time: 390 milliseconds
Execution time: 4640 milliseconds
PRG_2DP.TOS Execution Results: Twister Formatted, Version 2.0.
Time to create, write and close WRITE_1.DAT: 1595 milliseconds
Time to read WRITE_1.DAT into buffer 10 times: 540 milliseconds
Time to create, write and close WRITE_2.DAT: 1060 milliseconds
SPEEDTST.TTP Execution Results
for PRG_2DP.TOS, loaded from drive: B
Load time: 365 milliseconds
Execution time: 4100 milliseconds
PRG_2DP.TOS Execution Results: DC Formatter, Version 2.2.
Time to create, write and close WRITE_1.DAT: 1590 milliseconds
Time to read WRITE_1.DAT into buffer 10 times: 540 milliseconds
Time to create, write and close WRITE_2.DAT: 1050 milliseconds
SPEEDTST.TTP Execution Results
for PRG_2DP.TOS, loaded from drive: A
Load time: 370 milliseconds
Execution time: 4080 milliseconds
Unfinished Business: Pushing Zero Onto the Stack
In chapter 2, I discussed the execution speed and
memory requirements of three ways to push a zero onto the
stack. At that time, I used table 2.1 to list data that was
observed and prepared manually. I am now going to introduce
a program that prints comparative data for the three methods
previously discussed. Because the time required to execute
individual instructions is too short for accurate
measurement, we rely on multiple executions of each
instruction to lengthen the period of time that we are
measuring. In addition, we are not as interested in
absolute execution speed as we are in relative speed. We
just want to know which instructions are faster, not how
long it takes to execute one of them.
Program 28 will generate values that we can compare
objectively. In addition, the program will calculate the
memory requirements for each of the three instructions in
which we are interested. As a finale, the advantage of
using the third method, when it is appropriate to do so, is
illustrated by removal of the statement which needs be
executed only once from the execution loop. That is the
statement which prestores a zero in register D0.
Program 28. Three ways to push zero onto the stack.
; Program Name: PUSHZERO.S
; Version: 1.003
; Assembly Instructions:
; Assemble in AssemPro PC-relative mode and save with a TOS extension.
; Execution Instructions:
; Execute from the desktop; or execute SPAWN.TTP, type PUSHZERO.TOS on
; its command line and view this program's output in PUSHZERO.DAT.
; Program Function:
; For each of the three methods of pushing $0 onto the stack discussed in
; chapter two, calculates the memory occupied by each instruction, and
; calculates the execution time in milliseconds required for 50,000
; executions of each.
; Then, in order to emphasize the comparison for a more practical type of
; application, the one statement in the third algorithm that needs be
; executed only once, before the execution of the loop, is placed outside
; of the loop, and the time for 50,000 executions of the third method is
; repeated.
calculate_program_size:
lea -$102(pc), a1 ; Fetch basepage start address.
lea program_end, a0 ; Fetch program end address.
adda.l #100512, a0 ; Attach large stack space to end of program.
; NOTE: The above method of declaring stack space at the end of this program is
; preferable to that which I have been using because, if a label were
; used to declare the stack in the bss section, then the label used
; to mark the end of the program would be too far to permit the program
; to be assembled in AssemPro's PC-relative mode.
movea.l a0, a7 ; Point A7 to this program's stack.
trap #6 ; Return unused memory to op system.
print_heading:
lea heading, a0
bsr print_string
push_method_1:
lea header_1, a0
bsr print_string
move.l #49999, d7 ; D7 is counter for the push loop.
trap #3 ; Fetch start time.
move.l d0, d3 ; Save start_time in D3.
push_1_loop: ; Marks start of instruction in the loop.
clr.w -(sp) ; Instruction in the loop.
memory_1: ; Marks end of instruction in the loop.
dbra d7, push_1_loop ; Loop 50000 times.
trap #3 ; Fetch end time.
sub.l d3, d0 ; Subtract start time from end time.
trap #10 ; Convert to decimal milliseconds and print.
adda.l #100000, sp ; Reset stack pointer to top of stack.
print_method_1_requisite_memory:
lea header_5, a0 ; Print requisite memory header.
bsr print_string
lea memory_1, a0 ; Calculate number of bytes occupied by the
lea push_1_loop, a1 ; instruction in the loop.
suba.l a1, a0
bsr print_requisite_memory
push_method_2:
lea header_2, a0
bsr print_string
move.l #49999, d7 ; D7 is counter for the push loop.
trap #3 ; Fetch start time.
move.l d0, d3 ; Save start_time in D3.
push_2_loop: ; Marks start of instruction in the loop.
move.w #0, -(sp) ; Instruction in the loop.
memory_2: ; Marks end of instruction in the loop.
dbra d7, push_2_loop ; Loop 50000 times.
trap #3 ; Fetch end time.
sub.l d3, d0 ; Subtract start time from end time.
trap #10 ; Convert to decimal milliseconds and print.
adda.l #100000, sp ; Reset stack pointer to top of stack.
print_method_2_requisite_memory:
lea header_6, a0 ; Print requisite memory header.
bsr.s print_string
lea memory_2, a0 ; Calculate number of bytes occupied by the
lea push_2_loop, a1 ; instruction in the loop.
suba.l a1, a0
bsr.s print_requisite_memory
push_method_3:
lea header_3, a0
bsr.s print_string
move.l #49999, d7 ; D7 is counter for the push loop.
trap #3 ; Fetch start time.
move.l d0, d3 ; Save start_time in D3.
push_3_loop: ; Marks start of instructions in the loop.
moveq #0, d0 ; There are two instructions in the loop.
move.w d0, -(sp) ;
memory_3: ; Marks end of instructions in the loop.
dbra d7, push_3_loop ; Loop 50000 times.
trap #3 ; Fetch end time.
sub.l d3, d0 ; Subtract start time from end time.
trap #10 ; Convert to decimal milliseconds and print.
adda.l #100000, sp ; Reset stack pointer to top of stack.
print_method_3_requisite_memory:
lea header_7, a0 ; Print requisite memory header.
bsr.s print_string
lea memory_3, a0 ; Calculate number of bytes occupied by the
lea push_3_loop, a1 ; instructions in the loop.
suba.l a1, a0
bsr.s print_requisite_memory
modified_push_method_3:
lea header_4, a0
bsr.s print_string
move.l #49999, d7 ; D7 is counter for the push loop.
trap #3 ; Fetch start time.
move.l d0, d3 ; Save start_time in D3.
moveq #0, d0 ; Prestore $0 in D0.
push_4_loop:
move.w d0, -(sp) ; Contents of D0 onto the stack.
dbra d7, push_4_loop ; Loop 50000 times.
trap #3 ; Fetch end time.
sub.l d3, d0 ; Subtract start time from end time.
trap #10 ; Convert to decimal milliseconds and print.
adda.l #100000, sp ; Reset stack pointer to top of stack.
terminate:
trap #8
;
; SUBROUTINES
;
print_requisite_memory:
move.l a0, d1
trap #4
bsr.s print_string
lea header_8, a0
bsr.s print_string
rts
print_string: ; Expects address of string to be in A0.
move.l a0, -(sp) ; Push address of string onto stack.
move.w #9, -(sp) ; Function = c_conws = GEMDOS $9.
trap #1 ; GEMDOS call
addq.l #6, sp ; Reset stack pointer to top of stack.
rts
data
heading: dc.b $D,$A,"PUSHZERO Execution Results",$D,$A,$D,$A,0
header_1: dc.b " Elapsed time for clr.w -(sp): ",0
header_2: dc.b " Elapsed time for move.w #0, -(sp): ",0
header_3: dc.b " Elapsed time for moveq #0, d0",$D,$A
dc.b " move.w d0, -(sp): ",0
header_4: dc.b " Time for only move.w d0, -(sp): ",0
header_5: dc.b " Requisite memory: ",0
header_6: dc.b " Requisite memory: ",0
header_7: dc.b " Requisite memory: ",0
header_8: dc.b " bytes",$D,$A,$D,$A,0
bss
align
program_end: ds.l 0
end
PUSHZERO Execution Results
Elapsed time for clr.w -(sp): 180 milliseconds
Requisite memory: 2 bytes
Elapsed time for move.w #0, -(sp): 155 milliseconds
Requisite memory: 4 bytes
Elapsed time for moveq #0, d0
move.w d0, -(sp): 155 milliseconds
Requisite memory: 4 bytes
Time for only move.w d0, -(sp): 125 milliseconds
Accuracy of the Results
It is true that we are interested in relative results,
as far as instruction execution time is concerned, but it is
worth remembering that when we view the results of a
PUSHZERO execution, the elapsed times are calculated to a
resolution of approximately 10 msec (milliseconds); 5 msec
for the first call to get_time and 5 for the second call.
How does this affect the accuracy of the program's
report? Well, there are four cases to consider, each
composed of two incidents, with each incident depending on
whether we receive the time just before or just after the
variable _hz_200 has been incremented.
1. Both the first and second times are received just
after increment. In this case true time equals
time, and loss of accuracy is due to overhead in
making the calls and the inherent accuracy of the
system clock.
2. The first time is received immediately after
increment, the second is received immediately
before. Then, true time equals time plus 5 msec
plus overhead.
3. The first time is received immediately before
increment, the second is received immediately after.
Then, true time equals time minus 5 msec plus
overhead.
4. Both the first time and second times are received
immediately before increment. In this case true
time equals time plus overhead.
If you execute PUSHZERO repeatedly, you should notice
some differences in reported times. For each of the timed
events in the program, I have never seen a variance greater
than 5 msec, however, the variance can be less than or
greater than the most often reported value. We should
expect this to happen because the resolution of the elapsed
time calculations is 10 msec.
To compare the reported times with the instruction
execution times given in the Motorola's reference book, we
need only consider the instructions that form the loop. For
push_method_1, the instructions are:
clr.w -(sp)
dbra d7, push_1_loop
The clr.w instruction requires 14 clock periods, the dbra
instruction requires 10. Total clock periods for the loop
is 24 X 50,000 = 1.2X106. Total time required to execute
the loop is 1.2X106 X 1.25X10-7(time for one clock period) =
150 msec. The program reports the time most often as 180
msec. The difference is due to inaccuracies in the system
clock, the imprecise resolution and the time needed to
execute non-related instructions.
Increasing Loop Execution Speed
My primary use of program 28 was to introduce a method
of determining the relative execution speed and memory
requirements of two or more algorithms so that these
attributes could be compared. As an adjunct to the three
primary algorithms, I took the opportunity to introduce the
modified_push_method_3 algorithm. With this algorithm, I
intended to show you an example for which the third method
of pushing a zero onto the stack was definitely superior
because it clearly decreased loop execution time by
decreasing the number of instructions within the loop.
The next program, CLR_MEM, extends the idea of
increasing loop execution speed, not by decreasing the
instructions within the loop, but by increasing them. We
begin the example by considering the task of clearing 32000
bytes of memory, the size of a video screen, to zero.
Having been conditioned to consider quantities of memory in
terms of bytes, the thought of clearing it a byte at a time
should occur naturally. Program 29 also illustrates the
advantages to be gained by thinking of memory in terms of
words and longwords.
Program 29. A program that clears 32,000 bytes of ram.
; Program Name: CLR_MEM.S
; Version: 1.004
; Assembly Instructions:
; Assemble in PC-relative mode save with a TOS extension.
; Execution Instructions:
; Execute from the desktop; or execute SPAWN.TTP, type CLR_MEM.TOS on
; its command line and view this program's output in CLR_MEM.DAT.
; Program Function:
; Expands the concepts established with program PUSHZERO. Here,
; simulating the type of algorithm that would be used to clear video screen
; memory, the execution time when clearing memory a longword each time,
; within a loop, is compared to that of the more obvious method of clearing
; a byte, or perhaps a word, each time through the loop.
; Then, the speed advantage of clearing a longword by storing the
; content of a cleared register is illustrated.
; Finally, the advantage of clearing more than one longword within the
; body of the loop is explored, with an eye out for the maximum amount of
; beneficial loop expansion.
calculate_program_size:
lea -$102(pc), a1 ; Fetch basepage start address.
lea array, a0 ; Fetch program end = array address.
adda.l #32000, a0 ; Add in array space.
movea.l a0, a7 ; Point A7 to this program's stack.
trap #6 ; Return unused memory to op system.
print_heading:
lea heading, a0
bsr print_string
clear_one_byte_algorithm:
lea header_1, a0
bsr print_string
lea array, a0 ; A0 is pointer to 32000 byte array.
move.l #31999, d7 ; D7 is counter for the clear loop.
trap #3 ; Fetch start time.
move.l d0, d3 ; Save start_time in D3.
clear_a_byte:
move.b #0, (a0)+
dbra d7, clear_a_byte ; Loop 32000 times.
trap #3 ; Fetch end time.
sub.l d3, d0 ; Subtract start time from end time.
trap #10 ; Convert to decimal milliseconds and print.
clear_one_word_algorithm:
lea header_2, a0
bsr print_string
lea array, a0 ; A0 is pointer to 32000 byte array.
move.l #15999, d7 ; D7 is counter for the clear loop.
trap #3 ; Fetch start time.
move.l d0, d3 ; Save start_time in D3.
clear_a_word:
move.w #0, (a0)+
dbra d7, clear_a_word ; Loop 16000 times.
trap #3 ; Fetch end time.
sub.l d3, d0 ; Subtract start time from end time.
trap #10 ; Convert to decimal milliseconds and print.
clear_one_longword_algorithm:
lea header_3, a0
bsr print_string
lea array, a0
move.l #7999, d7
trap #3 ; Fetch start time.
move.l d0, d3 ; Save start_time in D3.
clear_a_longword:
move.l #0, (a0)+
dbra d7, clear_a_longword; Loop 8000 times.
trap #3 ; Fetch end time.
sub.l d3, d0 ; Subtract start time from end time.
trap #10 ; Convert to decimal milliseconds and print.
clear_one_longword_with_precleared_register:
lea header_4, a0
bsr print_string
lea array, a0
moveq #0, d1 ; Preclear D1 for use in the loop.
move.l #7999, d7
trap #3 ; Fetch start time.
move.l d0, d3 ; Save start_time in D3.
clear_with_register:
move.l d1, (a0)+ ; Clear a longword.
dbra d7, clear_with_register
trap #3 ; Fetch end time.
sub.l d3, d0 ; Subtract start time from end time.
trap #10 ; Convert to decimal milliseconds and print.
clear_4_longwords_algorithm:
lea header_5, a0
bsr.s print_string
lea array, a0
move.l #0, d1 ; Preclear D1 for use in the loop.
move.l #1999, d7
trap #3 ; Fetch start time.
move.l d0, d3 ; Save start_time in D3.
clr_4_longwords:
move.l d1, (a0)+ ; A single move statement clears 4 bytes.
move.l d1, (a0)+ ; Reduces number of loops to 4000.
move.l d1, (a0)+ ; Reduces number of loops to 2666+.
move.l d1, (a0)+ ; Reduces number of loops to 2000.
dbra d7, clr_4_longwords ; Loop 2000 times, clear 16 bytes each time.
trap #3 ; Fetch end time.
sub.l d3, d0 ; Subtract start time from end time.
trap #10 ; Convert to decimal milliseconds and print.
clear_8_longwords_algorithm: ; Will a further increase in move instructions
lea header_6, a0 ; within the loop decrease execution time?
bsr.s print_string
lea array, a0
move.l #0, d1 ; Preclear D1 for use in the loop.
move.l #999, d7
trap #3 ; Fetch start time.
move.l d0, d3 ; Save start_time in D3.
clr_8_longwords:
move.l d1, (a0)+ ; A single move statement clears 4 bytes.
move.l d1, (a0)+ ; Reduces number of loops to 4000.
move.l d1, (a0)+ ; Reduces number of loops to 2666+.
move.l d1, (a0)+ ; Reduces number of loops to 2000.
move.l d1, (a0)+ ; Reduces number of loops to 1600.
move.l d1, (a0)+ ; Reduces number of loops to 1333+.
move.l d1, (a0)+ ; Reduces number of loops to 1142+.
move.l d1, (a0)+ ; Reduces number of loops to 1000.
dbra d7, clr_8_longwords ; Loop 1000 times, clear 32 bytes each time.
trap #3 ; Fetch end time.
sub.l d3, d0 ; Subtract start time from end time.
trap #10 ; Convert to decimal milliseconds and print.
terminate:
trap #8
;
; SUBROUTINES
;
print_string: ; Expects address of string to be in A0.
move.l a0, -(sp) ; Push address of string onto stack.
move.w #9, -(sp) ; Function = c_conws = GEMDOS $9.
trap #1 ; GEMDOS call
addq.l #6, sp ; Reset stack pointer to top of stack.
rts
data
heading: dc.b 'CLR_MEM Execution Results',$D,$A,0
header_1: dc.b ' Clear one byte time: ',0
header_2: dc.b ' Clear one word time: ',0
header_3: dc.b ' Clear one longword time: ',0
header_4: dc.b ' With register time: ',0
header_5: dc.b ' Clear 4 longwords time: ',0
header_6: dc.b ' Clear 8 longwords time: ',0
bss
align
start_time: ds.l 1
ds.l 96 ; Stack.
stack: ds.l 1 ; Address of stack.
array: ds.l 0
end
CLR_MEM Execution Results
Clear one byte time: 100 milliseconds
Clear one word time: 45 milliseconds
Clear one longword time: 35 milliseconds
With register time: 25 milliseconds
Clear 4 longwords time: 15 milliseconds
Clear 8 longwords time: 10 milliseconds
If you assemble the program and repeatedly execute it,
you will occasionally obtain 95 msec for the one byte time,
50 msec for the one word time, 15 msec for the 8 longwords
time and etc. You know that variances are possible because
of your experience with program 28, so don't let the
variances bother you. The important things to realize are
these:
1. Clearing the 32000 bytes a word at a time is just
twice as fast as clearing it a byte at a time. This
is so only because we were able to reduce the number
of times through the loop by half. The amount of
time to clear one word of memory is exactly equal to
the time it takes to clear one byte.
2. The time it took to clear the 32000 bytes a longword
at a time is not one/fourth the time it took to
clear a byte, even though the number of loops was
reduced by four, because it takes more time to clear
a longword of memory than it does to clear a byte or
a word.
3. Clearing a register to zero before we enter the
loop, then storing the content of the register is
faster than using a move.l #0 instruction within the
loop.
4. Storing more than one longword within the loop
reduced the loop execution time significantly.
However, the times reported for the
clear_four_longwords and clear_eight_longwords
algorithms are too close to the resolution of the
time recording functions; therefore, the
significance of the reported values is questionable.
Even though the results obtained for the longword
algorithms mentioned in item four are not entirely reliable,
they seem to suggest that, as the number of instructions
within the loop is increased, we will begin to notice a
saturation effect; that is, a reduction in the number of
loops will affect the overall loop execution time to a
lesser degree because of the amount of time spent within the
loop.
After I had seen the results of CLR_MEM, I could have
rewritten the program so that it would produce more
palatable results, however, if I had done that, I would have
denied you the experience of viewing a program which
produces valid results for only a portion of its execution.
You would not know that I had not been completely successful
the first time.
In doing something to validate the results for the last
two algorithms of program 29, one would be easily tempted to
simply increase the number of times through each loop in the
program by an identical factor, say 10, perhaps. In fact,
it is the first method of improvement that occurred to me.
When it didn't work, I was forced to examine the details of
the DBRA instruction, something I had not done in a while.
If you refer back to program 29, you can see that, if
we increase the size of memory to be cleared by a factor of
10, we must clear 320,000 bytes in the clear_one_byte
algorithm. To do that, we must loop 320,000 times,
therefore, the value 319,999 must be prestored in the loop
counter. What is wrong with this picture? The DBRA
instruction (as does all of the DBcc instructions) permits
its counter parameter to contain only a 16-bit value.
The maximum count we can fit in 16 bits is 65535. So
we see why we can't set the counter to 320,000 and expect
valid results. Similarly, we know that a value of 159,999
for the clear_one_word algorithm will not work, nor will a
value of 79,999 work for the clear_one_longword algorithm.
In order to rearrange CLR_MEM so that the results for each
algorithm would be comparable, we would have to resort to a
minor loop inside of a major loop. Preparing CLR_MEM2, by
extracting the necessary portions from PRGM_1F and
increasing the appropriate values, is a more elegant
solution. And, while we are doing that, we may as well
include an algorithm that will let us obtain another
saturation effect observation point.
Program 30. Measures the time to clear memory with greater
accuracy.
; Program Name: CLR_MEM2.S
; Version: 1.002
; Assembly Instructions:
; Assemble in PC-relative mode and save with a TOS extension.
; Execution Instructions:
; Execute SPAWN.TTP and type CLR_MEM2.TOS on its command line. View the
; output of this program in CLR_MEM2.DAT.
; Program Function:
; The four_longwords_time and the eight_longwords_time reported by program
; CLR_MEM are too short to be reliable. Both values are too close to
; the resolution of the time recording functions.
; In this program, the significance of the reported execution times is
; increased by increasing the number of loops for those two algorithms
; by a factor of 10. This will permit a more valid comparison between the
; two algorithms.
; In addition, since I am going through the trouble of preparing this
; special program, I include one more algorithm. This one to investigate
; the intermediate five_longwords_time.
; Why go through this trouble? Well, as the number of instructions within
; the loop increases, we should observe a saturation effect in execution
; speed reduction, even though the number of loops are decreased, simply
; because more time is spent inside the loop. This additional point of
; observation will help to confirm or dismiss that theory.
; Now, because we are increasing the number of loops by 10 in this program,
; if we want to compare the results we obtain with those obtained from
; program CLR_MEM, we will have to divide CLR_MEM2's results by 10.
calculate_program_size:
lea -$102(pc), a1 ; Fetch basepage start address.
lea program_end, a0 ; Fetch program end address.
adda.l #320000, a0 ; Add in array space.
movea.l a0, a7 ; Point A7 to this program's stack.
trap #6 ; Return unused memory to op system.
print_heading:
lea heading, a0
bsr print_string
clear_4_longwords_algorithm:
lea header_1, a0
bsr print_string
lea array, a0
move.l #0, d1 ; Preclear D1 for use in the loop.
move.l #19999, d7
trap #3 ; Fetch start time.
move.l d0, d3 ; Save start_time in D3.
clr_4_longwords:
move.l d1, (a0)+ ; A single move statement clears 4 bytes.
move.l d1, (a0)+ ; Reduces number of loops to 40000.
move.l d1, (a0)+ ; Reduces number of loops to 26666+.
move.l d1, (a0)+ ; Reduces number of loops to 20000.
dbra d7, clr_4_longwords ; Loop 20000 times, clear 16 bytes each time.
trap #3 ; Fetch end time.
sub.l d3, d0 ; Subtract start time from end time.
trap #10 ; Convert to decimal milliseconds and print.
clear_5_longwords_algorithm:
lea header_2, a0
bsr.s print_string
lea array, a0
move.l #0, d1 ; Preclear D1 for use in the loop.
move.l #15999, d7
trap #3 ; Fetch start time.
move.l d0, d3 ; Save start_time in D3.
clr_5_longwords:
move.l d1, (a0)+ ; A single move statement clears 4 bytes.
move.l d1, (a0)+ ; Reduces number of loops to 40000.
move.l d1, (a0)+ ; Reduces number of loops to 26666+.
move.l d1, (a0)+ ; Reduces number of loops to 20000.
move.l d1, (a0)+ ; Reduces number of loops to 16000.
dbra d7, clr_5_longwords ; Loop 16000 times, clear 16 bytes each time.
trap #3 ; Fetch end time.
sub.l d3, d0 ; Subtract start time from end time.
trap #10 ; Convert to decimal milliseconds and print.
clear_8_longwords_algorithm:
lea header_3, a0
bsr.s print_string
lea array, a0
move.l #0, d1 ; Preclear D1 for use in the loop.
move.l #9999, d7
trap #3 ; Fetch start time.
move.l d0, d3 ; Save start_time in D3.
clr_8_longwords:
move.l d1, (a0)+ ; A single move statement clears 4 bytes.
move.l d1, (a0)+ ; Reduces number of loops to 40000.
move.l d1, (a0)+ ; Reduces number of loops to 26666+.
move.l d1, (a0)+ ; Reduces number of loops to 20000.
move.l d1, (a0)+ ; Reduces number of loops to 16000.
move.l d1, (a0)+ ; Reduces number of loops to 13333+.
move.l d1, (a0)+ ; Reduces number of loops to 11428+.
move.l d1, (a0)+ ; Reduces number of loops to 10000.
dbra d7, clr_8_longwords ; Loop 10000 times, clear 32 bytes each time.
trap #3 ; Fetch end time.
sub.l d3, d0 ; Subtract start time from end time.
trap #10 ; Convert to decimal milliseconds and print.
terminate:
trap #8
;
; SUBROUTINES
;
print_string: ; Expects address of string to be in A0.
move.l A0, -(sp) ; Push address of string onto stack.
move.w #9, -(sp) ; Function = c_conws = GEMDOS $9.
trap #1 ; GEMDOS call
addq.l #6, sp ; Reset stack pointer to top of stack.
rts
data
heading: dc.b 'CLR_MEM2 Execution Results',$D,$A,0
header_1: dc.b ' Clear 4 longwords time: ',0
header_2: dc.b ' Clear 5 longwords time: ',0
header_3: dc.b ' Clear 8 longwords time: ',0
bss
align ; Align storage on a word boundary.
ds.l 24 ; Stack.
stack: ds.l 0 ; Address of stack.
program_end: ds.l 0 ; Marks the end of program memory.
array: ds.l 0
end
CLR_MEM2 Execution Results
Clear 4 longwords time: 155 milliseconds
Clear 5 longwords time: 145 milliseconds
Clear 8 longwords time: 140 milliseconds
Program 30's results indicate that those of program 29
were actually valid enough. For if we divide the values
reported by CLR_MEM2 by 10, we obtain 15.5, 15.0 and 14.0.
In addition, this is the kind of data grouping we observe at
a saturation level. Program 30's report gives us the
confidence we need to utilize the clear_4_longwords
algorithm to clear the video screen, knowing that we are
prudently compromising between execution speed and memory
requirement.
Unfinished Business: The Morton Conclusion
As promised at the end of chapter 3, I will resume my
discussion of Mr. Morton's conclusion concerning
multiplication. Program 31 records the relative speeds of
the two methods of multiplying a word operand by four;
however, I have added a little spice because, not only are
the execution speeds for the two methods identical, but the
shift method is the faster when multiplying a longword
operand by four. Furthermore, the shift method requires
less memory. I should also state that the adding dn to
itself twice is faster than shifting left twice theory is
supported in the Kelly-Bootle book on page 202.
Program 31. The refutation.
; Program Name: TIMES_4.S
; Version: 1.005
; Assembly Instructions:
; Assemble in PC-relative mode and save with a TOS extension.
; Execution Instructions:
; Execute from the desktop; or execute SPAWN.TTP, type TIMES_4.TOS on
; its command line and view this program's output in TIMES_4.DAT.
; Program Function:
; Compares the speed of multiplying by 4 with add dn,dn to asl #2,dn
; to determine which is faster. This experiment attempts to prove or
; refute the conclusion advanced by Mike Morton in his magazine article,
; "68000 Tricks and Traps", Byte, September, 1986, p163. As it turns out,
; when multiplying a word operand by 4, as Mr. Morton specifies in his
; article, the execution times are identical. But when multiplying a
; longword by 4, the shift algorithm is faster. Furthermore, the shift
; algorithm requires less memory.
; Author Stan Kelly-Bootle also asserts that the add.w dn,dn algorithm
; is a faster method of multiplying by 4 than is asl #2,dn at the top of
; page 202 of his book "680x0 Programming by Example".
calculate_program_size:
lea -$102(pc), a1 ; Fetch basepage start address.
lea program_end, a0 ; Fetch program end address.
trap #6 ; Return unused memory to op system.
print_heading:
lea heading, a0
bsr print_string
word_addition:
lea header_1, a0
bsr print_string
move.l #49999, d7 ; D7 is counter for the loop.
trap #3 ; Fetch start time.
move.l d0, d3 ; Save start_time in D3.
word_addition_loop: ; Marks start of instruction in the loop.
add.w d0, d0 ; To multiply by two.
add.w d0, d0 ; To multiply by four.
memory_1: ; Marks end of instruction in the loop.
dbra d7, word_addition_loop
trap #3 ; Fetch end time.
sub.l d3, d0 ; Subtract start time from end time.
trap #10 ; Convert to decimal milliseconds and print.
print_word_addition_requisite_memory:
lea header_5, a0
bsr print_string
lea memory_1, a0 ; Calculate number of bytes occupied by
lea word_addition_loop, a1 ; the instructions in the loop.
suba.l a1, a0
bsr print_requisite_memory
word_shift:
lea header_2, a0
bsr print_string
move.l #49999, d7 ; D7 is counter for the loop.
trap #3 ; Fetch start time.
move.l d0, d3 ; Save start_time in D3.
word_shift_loop: ; Marks start of instruction in the loop.
asl.w #2, d0 ; Shift to multiply by 4.
memory_2: ; Marks end of instruction in the loop.
dbra d7, word_shift_loop ; Loop 50000 times.
trap #3 ; Fetch end time.
sub.l d3, d0 ; Subtract start time from end time.
trap #10 ; Convert to decimal milliseconds and print.
print_word_shift_requisite_memory:
lea header_6, a0
bsr.s print_string
lea memory_2, a0 ; Calculate number of bytes occupied by the
lea word_shift_loop, a1 ; instruction in the loop, then store.
suba.l a1, a0
bsr.s print_requisite_memory
longword_addition:
lea header_3, a0
bsr.s print_string
move.l #49999, d7 ; D7 is counter for the loop.
trap #3 ; Fetch start time.
move.l d0, d3 ; Save start_time in D3.
addition_loop: ; Marks start of instructions in the loop.
add.l d0, d0 ; To multiply by two.
add.l d0, d0 ; To multiply by four.
memory_3: ; Marks end of instruction in the loop.
dbra d7, addition_loop ; Loop 50000 times.
trap #3 ; Fetch end time.
sub.l d3, d0 ; Subtract start time from end time.
trap #10 ; Convert to decimal milliseconds and print.
print_longword_addition_requisite_memory:
lea header_7, a0
bsr.s print_string
lea memory_3, a0 ; Calculate number of bytes occupied by the
lea addition_loop, a1 ; instruction in the loop, then store.
suba.l a1, a0
bsr.s print_requisite_memory
longword_shift:
lea header_4, a0
bsr.s print_string
move.l #49999, d7 ; D7 is counter for the loop.
trap #3 ; Fetch start time.
move.l d0, d3 ; Save start_time in D3.
trap #3 ; Value of system clock returned in D0.
move.l d0, d1 ; Save in trap call register.
shift_loop: ; Marks start of instruction in the loop.
asl.l #2, d0 ; Shift to multiply by 4.
memory_4: ; Marks end of instruction in the loop.
dbra d7, shift_loop ; Loop 50000 times.
trap #3 ; Fetch end time.
sub.l d3, d0 ; Subtract start time from end time.
trap #10 ; Convert to decimal milliseconds and print.
print_longword_shift_requisite_memory:
lea header_8, a0
bsr.s print_string
lea memory_4, a0 ; Calculate number of bytes occupied by the
lea shift_loop, a1 ; instruction in the loop, then store.
suba.l a1, a0
bsr.s print_requisite_memory
terminate:
trap #8
;
; SUBROUTINES
;
print_requisite_memory:
move.l a0, d1
trap #4
bsr.s print_string
lea header_9, a0
bsr.s print_string
rts
print_string: ; Expects address of string to be in A0.
move.l a0, -(sp) ; Push address of string onto stack.
move.w #9, -(sp) ; Function = c_conws = GEMDOS $9.
trap #1 ; GEMDOS call
addq.l #6, sp ; Reset stack pointer to top of stack.
rts
data
heading: dc.b "TIMES_4 Execution Results",$D,$A,$D,$A,0
header_1: dc.b " Word addition time: ",0
header_2: dc.b $D,$A," Word shift time: ",0
header_3: dc.b $D,$A," Longword addition time: ",0
header_4: dc.b $D,$A," Longword shift time: ",0
header_5: dc.b " Word addition requisite memory: ",0
header_6: dc.b " Word shift requisite memory: ",0
header_7: dc.b " Longword add requisite memory: ",0
header_8: dc.b " Longword shift requisite memory: ",0
header_9: dc.b " bytes",$D,$A,0
bss
align
program_end: ds.l 0
end
TIMES_4 Execution Results
Word addition time: 130 milliseconds
Word addition requisite memory: 4 bytes
Word shift time: 125 milliseconds
Word shift requisite memory: 2 bytes
Longword addition time: 180 milliseconds
Longword add requisite memory: 4 bytes
Longword shift time: 155 milliseconds
Longword shift requisite memory: 2 bytes
SPEEDTST.TTP Execution Results
for TIMES_4.TOS, loaded from drive: G
Load time: 50 milliseconds
Execution time: 925 milliseconds
Now that the subject of multiplication algorithms has
been introduced, it seems appropriate to present a program
which compares the MC68000 multiply instruction to
algorithms that accomplish the same task. That this seems
appropriate might seem enigmatic at first because
multiplication instructions are part of the MC68000's
repertoire. Nevertheless, as is indicated in Mr. Morton's
article, multiplication algorithms which ignore those
instructions are faster for certain multipliers even if they
are not powers of 2.
Program 32 illustrates this fact for multiplication by
5, a multiplier that is used in many of my programs to
convert the system clock count to milliseconds.
MULTIPLY.TOS compares three methods of multiplying by 5.
The program can be executed from the desktop or by typing
its name on the command line of SPAWN.TTP or SPEEDTST.TTP.
Program 32. Compares the speed of three multiplication
algorithms.
; Program Name: MULTIPLY.S
; Version: 1.002
; Assembly Instructions:
; Assemble in PC-relative mode and save with a TOS extension.
; Execution Instructions:
; Execute from the desktop; or execute SPAWN.TTP, type MULTIPLY.TOS on
; its command line and view this program's output in MULTIPLY.DAT.
; Program Function:
; Measures the speed of multiplication algorithms.
calculate_program_size:
lea -$102(pc), a1 ; Fetch basepage start address.
lea program_end, a0 ; Fetch program end address.
trap #6 ; Return unused memory to op system.
print_heading:
lea heading, a0
bsr print_string
the_mulu_instruction:
lea header_1, a0
bsr print_string
move.l #49999, d7 ; D7 is counter for the loop.
trap #3 ; Fetch start time.
move.l d0, d3 ; Save start_time in D3.
mulu_loop: ; Marks start of instruction in the loop.
mulu #5, d0 ; Instruction in the loop.
memory_1: ; Marks end of instruction in the loop.
dbra d7, mulu_loop ; Loop 50000 times.
trap #3 ; Fetch end time.
sub.l d3, d0 ; Subtract start time from end time.
trap #10 ; Convert to decimal milliseconds and print.
addition:
lea header_2, a0
bsr.s print_string
move.l #49999, d7 ; D7 is counter for the push loop.
trap #3 ; Fetch start time.
move.l d0, d3 ; Save start_time in D3.
addition_loop: ; Marks start of instruction in the loop.
move.l d0, d2 ; To add one.
add.l d0, d0 ; To double to two.
add.l d0, d0 ; To double to four.
add.l d2, d0 ; To complete multiplication by 5.
memory_2: ; Marks end of instruction in the loop.
dbra d7, addition_loop ; Loop 50000 times.
trap #3 ; Fetch end time.
sub.l d3, d0 ; Subtract start time from end time.
trap #10 ; Convert to decimal milliseconds and print.
shift_and_add:
lea header_3, a0
bsr.s print_string
move.l #49999, d7 ; D7 is counter for the push loop.
trap #3 ; Fetch start time.
move.l d0, d3 ; Save start_time in D3.
shift_loop: ; Marks start of instruction in the loop.
move.l d0, d2 ; Save a copy to add.
asl.l #2, d0 ; Shift to multiply by 4.
add.l d2, d0 ; To complete multiplication by 5.
memory_3: ; Marks end of instruction in the loop.
dbra d7, shift_loop ; Loop 50000 times.
trap #3 ; Fetch end time.
sub.l d3, d0 ; Subtract start time from end time.
trap #10 ; Convert to decimal milliseconds and print.
print__mulu_requisite_memory:
lea header_4, a0
bsr.s print_string
lea memory_1, a0 ; Calculate number of bytes occupied by the
lea mulu_loop, a1 ; instruction in the loop.
bsr.s print_requisite_memory
print_addition_requisite_memory:
lea header_5, a0
bsr.s print_string
lea memory_2, a0 ; Calculate number of bytes occupied by the
lea addition_loop, a1 ; instruction in the loop.
bsr.s print_requisite_memory
printS_shift_and_add_requisite_memory:
lea header_6, a0
bsr.s print_string
lea memory_3, a0 ; Calculate number of bytes occupied by the
lea shift_loop, a1 ; instruction in the loop.
bsr.s print_requisite_memory
terminate:
trap #8
; SUBROUTINES
print_requisite_memory:
suba.l a1, a0
move.l a0, d1
trap #4 ; Returns address of decimal string in A0.
bsr.s print_string
lea header_7, a0
bsr.s print_string
rts
print_string: ; Expects address of string to be in A0.
pea (a0) ; Push address of string onto stack.
move.w #9, -(sp) ; Function = c_conws = GEMDOS $9.
trap #1 ; GEMDOS call
addq.l #6, sp ; Reset stack pointer to top of stack.
rts
data
heading: dc.b "MULTIPLY Execution Results",$D,$A,$D,$A,0
header_1: dc.b " MULU time: ",0
header_2: dc.b " Addition time: ",0
header_3: dc.b " Shift and add time: ",0
header_4: dc.b $D,$A," MULU requisite memory: ",0
header_5: dc.b " Addition requisite memory: ",0
header_6: dc.b " Shift and add requisite memory: ",0
header_7: dc.b " bytes",$D,$A,0
bss
align
program_end: ds.l 0
end
MULTIPLY Execution Results
MULU time: 360 milliseconds
Addition time: 260 milliseconds
Shift and add time: 230 milliseconds
MULU requisite memory: 4 bytes
Addition requisite memory: 8 bytes
Shift and add requisite memory: 6 bytes
The results clearly show that the shift and add
algorithm, in this case, is superior. It is 1.5 times
faster than the MULU instruction, while requiring only 2
additional bytes of memory. On page 168 of his article, Mr.
Morton shows an example of multiplication by 17.
LEA Versus ADDA For Stack Adjustments
On page 166 of the same Morton article is a nice tip
concerning stack pointer movement. Mr. Morton used so few
words to describe this hint that it is easy to slide right
by it while reading the article. Program 33 explores this
subject in a bit more detail, confirming that adjustments to
the stack using the LEA instruction is twice as fast as
adjustments using the ADDA instruction. The reason that
this comparison is of concern is that the ADDQ instruction
can be used for stack adjustments only when the alteration
is limited to a maximum of 8 bytes. Beyond that, the ADDA
instruction is mandatory.
Within the documentation of program 33, I have included
a timing note concerning the timing method I have been using
and its relationship to the instruction execution times
listed in the Motorola manual. With suitable precautions,
you may be able to used the information there to generate
timing data that more closely conforms to the Motorola data
if you should find it necessary or desirable.
Program 33. Confirms that the LEA instruction is the better
choice for stack adjustments greater than 8 bytes.
; Program Name: LEA_ADDA.S
; Version 1.004
; Assembly Instructions:
; Assemble in PC-relative mode and save with a TOS extension.
; Execution Instructions:
; Execute from the desktop; or execute SPAWN.TTP, type LEA_ADDA.TOS on
; its command line and view this program's output in LEA_ADDA.DAT.
; Program Function:
; Confirms (or refutes) the contention that lea $C(An), An is faster
; than adda.l #$C(An), where n = 1 - 7. Reference: "68000 Tricks and Traps",
; by Mike Morton, p.166, Byte, September, 1986. See the paragraph which
; begins "Small adjustments to the stack pointer...".
calculate_program_size:
lea -$102(pc), a1 ; Fetch basepage start address.
lea program_end, a0 ; Fetch program end address.
trap #6 ; Return unused memory to op system.
lea stack, a7 ; Point A7 to this program's stack.
print_heading:
lea heading, a0
bsr print_string
lea_method:
lea lea_time_msg, a0 ; Print label for lea execution results.
bsr print_string
move.l #49999, d7 ; D7 is counter for the loop.
trap #3 ; Value of system clock returned in D0.
move.l d0, d1 ; Save time in scratch register.
lea_loop: ; Marks start of instruction in the loop.
lea $C(a2), a2 ; Instruction in the loop.
memory_1: ; Marks end of instruction in the loop.
dbra d7, lea_loop ; Loop 50000 times.
trap #3 ; Get current value of system clock.
sub.l d1, d0 ; Subtract beginning value from ending value.
mulu #5, d0 ; Convert to milliseconds.
sub.l #80, d0 ; Subtract dbra time and "error".
; See Timing Note below.
move.l d0, d1 ; Transfer time for trap #4 call.
convert_lea_time_to_ASCII_decimal:
trap #4 ; Address of decimal string returned in A0.
print_lea_time:
bsr.s print_string ; Print the decimal string.
lea time_msg, a0 ; Print units label.
bsr.s print_string
; Timing Note:
; Until the count has expired, each dbra instruction requires 10 clock
; periods to execute. 49,999 dbra instructions require that amount of
; time. When the last dbra instruction is executed, the count has expired
; and that instruction requires 14 clock periods.
; Total clock periods consumed by the dbra instruction is
; 49,999 X 10 + 14 = 500,004.
; Each clock period is .000000125 second. Total time consumed by the
; dbra instructions is
; 500,004 X .000000125 = .0625 second = 62.5 milliseconds.
; If we assume that the 8 clock period execution time given in the
; Motorola reference manual for the lea d(An) instruction is correct,
; then 50,000 executions of lea $C(a2), a2 require
; 50,000 X 8 X .000000125 = 50 milliseconds
; Total time for the lea loop is 62.5 + 50 = 112.5 milliseconds.
; When no adjustment is made for "overhead" time, such as attendant
; instructions, which includes the dbra instruction, trap invocations and
; a few others, this program reports a lea loop execution time of 130
; milliseconds. We can assume that the excess 17.5 milliseconds (130-112.5)
; is partially due to the "overhead" time and partially due to a system
; clock frequency different from 8 mhz. We can simply combine both
; components into "overhead" time.
; With no adjustment for "overhead" time this program reports a loop
; execution time of 180 milliseconds for the adda.l #$C, a2 loop. We
; subtract the 17.5 milliseconds "overhead" from this to obtain the true
; loop time of 162.5 milliseconds.
; From this we subtract the dbra execution time of 62.5 milliseconds to
; get 100 milliseconds for 50,000 adda.l instruction..
; 62.5 milliseconds / 50,000 = .000002 second per instruction.
; .000002 / .000000125 = 16 clock periods per instruction
; The execution time given in the Motorola manual for the adda.l #d, An
; instruction is 16 clock periods.
; These calculations validate the method and justify subtracting the
; "overhead" time from the total loop execution times before printing the
; execution time for 50,000 executions of each of the two instructions of
; interest. Finally, we can combine dbra time and "overhead" time into
; a "new" overhead time that is 17.5 + 62.5 = 80 milliseconds.
; When the adjustments are made to the recorded loop times, the program
; prints out the correct time for 50,000 executions of each instruction,
; and the output correctly indicates that the lea instruction used in
; the program is twice as fast as the adda.l instruction used. The
; execution results of the program agree with the timing data given in
; the Motorola reference guide.
adda_method:
lea adda_time_msg, a0 ; Print label for adda execution results.
bsr.s print_string
move.l #49999, d7 ; D7 is counter for the loop.
trap #3 ; Value of system clock returned in D0.
move.l d0, d1 ; Save time in scratch register.
adda_loop: ; Marks start of instruction in the loop.
adda.l #$C, a2 ; Instruction in the loop.
memory_2: ; Marks end of instruction in the loop.
dbra d7, adda_loop ; Loop 50000 times.
trap #3 ; Get current value of system clock.
sub.l d1, d0 ; Subtract beginning value from ending value.
mulu #5, d0 ; Convert to milliseconds.
sub.l #80, d0 ; Subtract dbra time and "error".
move.l d0, d1 ; Transfer time for trap #4 call.
convert_adda_time_to_ASCII_decimal:
trap #4 ; Address of decimal string returned in A0.
print_adda_time:
bsr.s print_string ; Print the decimal string.
lea time_msg, a0 ; Print units label.
bsr.s print_string
print_lea_requisite_memory:
lea lea_memory_msg, a0
bsr.s print_string
lea memory_1, a0 ; Calculate number of bytes occupied by the
lea lea_loop, a1 ; instruction in the loop.
bsr.s print_requisite_memory
print_adda_requisite_memory:
lea adda_memory_msg, a0
bsr.s print_string
lea memory_2, a0 ; Calculate number of bytes occupied by the
lea adda_loop, a1 ; instruction in the loop.
bsr.s print_requisite_memory
terminate:
trap #8
; SUBROUTINES
print_requisite_memory:
suba.l a1, a0
move.l a0, d1
trap #4 ; Returns address of decimal string in A0.
bsr.s print_string
lea memory_msg, a0
bsr.s print_string
rts
print_string: ; Expects address of string to be in A0.
pea (a0) ; Push address of string onto stack.
move.w #9, -(sp) ; Function = c_conws = GEMDOS $9.
trap #1 ; GEMDOS call
addq.l #6, sp ; Reset stack pointer to top of stack.
rts
data
heading: dc.b $D,$A,'LEA_ADDA Execution Results',$D,$A,$D,$A,0
lea_time_msg: dc.b ' Time for 50,000 lea instructions: ',0
adda_time_msg: dc.b ' Time for 50,000 adda.l instructions: ',0
time_msg: dc.b ' milliseconds',$D,$A,0
lea_memory_msg: dc.b $D,$A,' LEA requisite memory: ',0
adda_memory_msg: dc.b ' ADDA.L requisite memory: ',0
memory_msg: dc.b ' bytes',$D,$A,0
bss
align ; Align storage on a word boundary.
ds.l 96
stack: ds.l 0
program_end: ds.l 0 ; Marks the end of program memory.
end
LEA_ADDA Execution Results
Time for 50,000 lea instructions: 50 milliseconds
Time for 50,000 adda.l instructions: 100 milliseconds
LEA requisite memory: 4 bytes
ADDA.L requisite memory: 6 bytes
The programs that I have used to compare the execution
speeds and requisite memory of specific instructions and
algorithms should have, by now, revealed their similarity of
structure. It is appropriate that I present a more general
algorithm which can be used to perform such comparisons. By
endowing this algorithm with a certain amount of
intelligence, I will be able to reduce the size of the
programs that compare algorithms, and, in addition, I will
be able to introduce the subject of algorithmic
intelligence, itself.
At first, my inclination was to include the general
purpose performance testing algorithm in this chapter, but
the detail with which I intend to discuss it and its
ramifications would force this chapter to become very long
and unwieldy, therefore, I have decided to devote an
exclusive chapter to the algorithm. The subject of the next
chapter, then, is the design of an algorithm that is smart
enough to accept a variety of instructions or other
algorithms as input and to generate suitable data for
execution speed and requisite memory comparisons.
I suggest that, if at all possible, you read The Micro
Millennium, by Christopher Evans, first published in 1980 by
The Viking Press, before you turn to chapter 7. At the very
least, I think that you should read Chapters 12 through 14
which discuss the following subjects: The Nature of
Intelligence, Can a Machine Think? and Towards the Ultra
Intelligent Machine. If you follow this advice, then you
will probably feel more comfortable with my use of the word
intelligence as it applies to computers and computer
algorithms.
Conclusion
The emphasis in this chapter, as it has been in
previous chapters and as it will continue to be in future
chapters, is accuracy, reliability, execution speed and
minimum requisite memory. I feel that it is via a constant
pressure to maintain these algorithmic attributes that all
computer programming goals are eventually achieved. In
chapter 7, I shall illustrate the manner in which the desire
to maintain these attributes leads to the development of
algorithms which assume an intelligence of their own,
thereby violating the computers are dumb principle.