home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Crawly Crypt Collection 1
/
crawlyvol1.bin
/
program
/
books
/
68k_book
/
arp_doc
/
chap_05.doc
< prev
next >
Wrap
Text File
|
1985-11-20
|
114KB
|
2,575 lines
Atari ST Machine Specific Programming In Assembly
Chapter 5: Performance Testing
The Never Finished Theory
In a recent magazine article, the author stated that no
program is ever finished. I have seen that viewpoint
expressed many times. Sympathetically, I agree in principle
with the emotional flavor of this statement, I suppose, but
when I am working on a program, I always reach a point at
which I can conclude that its performance is satisfactory.
At that stage I say that the program is finished.
The reason that some programmers, and perhaps some
users also, come to accept the never finished concept as
gospel is that they have seen too many programs, either
purchased or written, that never seem to perform completely
satisfactorily, and, therefore, seem to continuously require
fine tuning or corrections. But this program attribute is
not an inherent consequence of program development. The
problem with such programs is that their performance was
judged to be satisfactory prematurely. Too often, the
performance of a program is judged to be satisfactory by its
author if the program seems to accomplish its primary
function after a few cursory tests. Program testing, like
program documentation, seems to be a distasteful chore to
many programmers. That's probably why so many programs are
thrust into the software market prematurely.
The attitude that I have developed is one which views
algorithmic design, documentation and testing as steps in a
single process, each of which demands the same level of
concentration, concern and quality control. If you can
adopt a similar attitude, I guarantee that you will be a
happier, more successful programmer than one who finds any
phase of program development boring or distasteful.
Documentation is your front line defense against
programming catastrophes. To be able to fix a program, at
repair time, you must be able to understand it as well as
you did when you wrote it. The same level of understanding
is required when you decide to intentionally enhance a
program. If program documentation includes the results of
performance testing, then a program's prior performance can
be used to gauge performance after alterations.
When a program seems to be malfunctioning, the first
action you take should be to compare its current performance
to past performance under known conditions. Many times,
such comparisons will reveal that the execution environment,
not the program, is at fault. Of course, it is then that
you may decide that a new version of the program is required
to cope with an altered execution environment.
The Three For One Theory
When I was working on a large mainframe, the
manufacturer to remain nameless, for a company that shall
also remain nameless, we programmers developed a formula for
bug introduction into the mainframe's operating system. For
every bug fixed, three more were introduced. This is, of
course, one of Murphy's laws. Sometimes the new bugs were
called enhancements to obscure the fact that they were
screwups.
But that's what all bugs are. They are errors that you
make when you write your programs. This is the first truth
that you should hold to be self-evident, if you want to
develop programs that eventually perform satisfactory. Once
you realize that errors in your programs will be there
because of your own carelessness, or in spite of your best
efforts, you can take steps to prevent them from being
catastrophic.
Realistic Expectations
When I judge the performance of an item of software or
hardware that I have purchased, I compare its performance to
the levels at which I have been led to believe it should be
according to the product's designer, manufacturer and
seller. To that extent, if the product fails to meet my
expectations, then I have been cheated. If I am fooled
twice by the same designer, manufacturer or seller, then I
have cheated myself.
When I judge the performance of an item of software or
hardware that I have designed and constructed, I restrict
my expectations to levels that are commensurable with my
knowledge, experience and available tools. If I have done
my best, then I can do no more, unless I decide to redesign
and reconstruct after obtaining more knowledge, more
experience or better tools.
The performance of an item of software is inherently
restricted by the design of the computer system--that should
be obvious. Performance is also influenced by the extent of
the programmer's knowledge about the system and developed
programming ability. A programmer's ability is developed
via education and experience. To the extent that this
ability restricts the performance of final product, it is a
constituent of the overall programming environment.
If a program's performance depends on your programming
ability, how can you determine when its performance is
satisfactory? Well, when a program executes a task
according to your specifications, then its performance must
be judged satisfactory. How stringent should your
specifications be? Your performance demands must be
commensurable with your programming ability. When you have
exerted your best effort, you must be satisfied with the
final product; or you must obtain another system; or you
must accumulate more knowledge and experience, so that you
can demand more stringent specifications.
Accumulating Knowledge about your computer system's
capabilities can be a horrendous task: cataloging your
system's real, versus advertised or reported, capabilities
requires extensive performance testing of the system,
because you can't trust someone else's assessments. For
example, on page 18 of my star NX-10 user's manual there is
a description of a self-test and the following statements:
Were you surprised? It's fast, isn't it? About 120
characters a second, to be exact.
Would any serious person use the words about and exact
in the way they are used above? When I execute the test on
my printer, 503 characters are printed. The elapsed time
according to my stopwatch was 6 seconds. This means that
the printer prints 83 characters per second in that test
mode. Even allowing for a one second error in timing, the
printing speed would be increased to only 100 characters per
second. In order for the printer to meet specifications, it
would have to print the 503 characters in 4.19 seconds.
Again, on page 213 of the manual the printing speed in
Draft pica mode is specified to be 120 characters per
second; no about qualifier there. When printing an ASCII
file in that mode, with the entire file contained in the
printer's buffer, so that the printer's speed depends only
upon its own capability, I measure a maximum printing speed
of 68 characters per second.
Are the manufacturer's specifications incorrect? Is my
method of timing the printer's speed incorrect? I have
learned not to be absolutely sure about anything in this
world, but I think my time would be wasted if I were to
spend it trying to develop a program which depended on the
printer's ability to print at 120 characters per second.
That would be an unrealistic expectation.
Performance Measuring Tools
Because of its dependency on ability and personal
assessment, to some extent, performance testing must be
subjective. In chapter 1, I said that I have been satisfied
with the star NX-10, and I have been, in spite of the
printing speed controversy. The printer's other
capabilities and its low cost more than compensate for that
discrepancy, if it actually exists. Therefore, in my
opinion, the performance of the printer is satisfactory.
This is my personal assessment.
Of course, one might be inclined to scold me, pointing
out that my method of measuring elapsed time during the
printing speed tests was crude. To which I would reply,
"It's the only method that was available to me." And, I
might add, I have found it to be much more reliable than
words printed on paper in a user's manual. Any user's
manual.
One individual's judgement of the overall performance
of a particular item of software is as subjective as is my
conclusions about the star NX-10. But specific software
attributes can be judged objectively, if tools which can
measure pertinent aspects of performance are available. I
am going to provide you with some of those tools in this
chapter.
I will introduce utility programs with which the
efficiency of individual instructions, algorithms and
programs may be compared. I will provide programs that
perform many comparisons, but, since the subject of
performance testing must be restricted to a reasonable
length in the book, I will concentrate more on showing you,
by example, when and how I decide to conduct performance
tests, rather than flit through comparisons until you become
bored with the whole idea.
The First Utility
While the primary objective of the chapter is to
provide performance measuring utilities, a secondary
objective is to illustrate specific stages of program
development. I begin with the specifications for a utility
to be called SPEEDTST. Then I introduce the first of a
series of programs, each of which is a model that represents
a snapshot of a continuous process. The other programs
follow after the introduction of a program on which the
models can operate.
The programs introduced in this chapter invoke the
custom traps described in program 13. To install the custom
traps, execute TRAPS.PRG. If you want the traps to be
automatically installed during system boot, copy TRAPS.PRG
to the AUTO folder on your boot partition or floppy disk.
The first utility will calculate a program's load and
execution times. As I concluded chapter 3, I mentioned that
the first stage of increasing a program's execution speed
involved getting it into ram as quickly as possible.
Methods of doing that will be discussed in this chapter. In
order to discuss a variety of methods, I need a way to
measure the time required to load and execute a program.
Specifications For SPEEDTST
SPEEDTST must accomplish the following:
1. Spawn a process = load and execute a program.
2. Programs to be spawned will have a TOS or PRG
suffix.
3. The spawned program will reside in the same
directory as does SPEEDTST.
4. Create a disk file which is to be identified by
the name of the spawned program with a DAT
suffix. The disk file is to reside in the same
directory as does SPEEDTST.
5. Calculate the spawned program's load and
execution times.
6. Store the load and execution times in the disk
file described in item 4.
7. If the spawned process directs output to the
video screen via GEMDOS function $9, redirect
that output to the file described in item 4.
The First Model
Program 15 is the first in a series of four programs
which progress in algorithmic perfection until the program
SPEEDTST is developed. SPEED_1 is the first working model
of a parent program which loads and executes a child
program. The parent calculates the spawned program's load
and execution times, using information returned to the
parent when the child terminates. The parent creates a disk
file and stores the calculated values therein. If the child
directs output to the screen using GEMDOS function $9, that
output will be redirected to the file. The name of the file
created by the parent is composed of the name of the child,
without suffix, plus the extension DAT.
While it is doing all of that, the parent also confirms
that trap #6 has been installed by TRAPS.PRG and functions
correctly. The parent accomplishes the verification simply
by being able to spawn, which it can't do if custom trap #6
fails to return excess memory to the operating system. Trap
#6 also performs another function, but its effectiveness is
confirmed only if the child terminates using custom trap #8.
Refer to the extensive note in the data section of program
16.
Program 16 must be assembled in PC-relative mode and the
executable file must be saved with a TTP extension. When it
is executed, the filename of the program to be spawned must
reside in the same directory as does program 16. Type the
name of the program to be spawned on program 16's input
parameter line. As you shall see, a program that is to be
spawned by program 16 must be specifically prepared for the
spawning operation.
Program 16, as does programs 18 and 19, invokes custom
traps which must be installed by programs TRAPS.PRG (program
13, chapter 4) and TRAP_9.PRG (program 15, chapter 5),
therefore, these programs must be executed from the desktop
or from the AUTO folder of a boot partition or floppy before
programs SPEED_1.TTP, SPEED_2.TTP or SPEED_3.TTP are
executed. TRAP_9.S follows.
Program 15. This program installs a custom trap for programs
16, 18 and 19.
; Program Name: TRAP_9.S
; Version 1.002
; Assembly Instructions:
; Assemble in PC-relative mode and save with a PRG extension.
; Program Function:
; This is a LSR program that establishes a user defined trap. It may be
; executed from the desktop, but you may prefer to copy it to the AUTO
; folder of your boot partition or floppy disk so that it will execute
; automatically during boot.
; MAJOR NOTE: SEE FURTHER DOCUMENTATION FOR THIS PROGRAM IN TRAPS.S.
; Trap #9 is special in that it is only used by three programs: SPEED_1.TTP,
; SPEED_2.TTP and SPEED_3.TTP. The custom trap is used simply to reduce the
; size of those programs.
; This program invokes a custom trap that is established by TRAPS.PRG,
; therefore, that program must be executed before trap #9 is invoked by a
; program.
program_start: ; Calculate program size and retain result.
lea program_end, a3 ; Fetch program end address.
suba.l 4(a7), a3 ; Subtract basepage address.
enter_supervisor_mode:
move.l #0, -(sp) ; The zero turns on supervisor mode.
move.w #$20, -(sp) ; Function = super = GEMDOS $20.
trap #1 ; Go to supervisor mode.
addq.l #6, sp ; Supervisor stack pointer (SSP) returned in D0.
movea.l d0, a5 ; Save SSP in scratch register.
install_trap_9_routine: ; Note: pointer = vector = pointer.
lea trap_9_routine, a0 ; Fetch address of trap #9 routine.
move.l a0, $A4 ; Store custom trap address in pointer.
enter_user_mode:
pea (a5) ; Restore supervisor stack pointer.
move.w #$20, -(sp) ; Function = super = GEMDOS $20.
trap #1 ; Go to user mode.
addq.l #6, sp ; Reset stack pointer to top of stack.
relinquish_processor_control: ; Maintain memory residency.
move.w #0, -(sp) ; See page 121 of Internals book.
move.l a3, -(sp) ; Program size.
move.w #$31, -(sp) ; Function = ptermres = GEMDOS $31.
trap #1
trap_9_routine:
; Expects a programs load time in register D3 as a binary number. This
; algorithm converts the value in D3 to milliseconds (msec) then prints the
; load time in decimal msec.
; Also expects a programs execution time in register D5. The same service
; is performed for the value in that register.
convert_load_time_to_msec:
move.l d3, d0 ; Save a copy to add.
asl.l #2, d3 ; Shift to multiply by 4.
add.l d0, d3 ; To complete multiplication by 5.
print_load_time:
cmpi.l #999, d3 ; If load time is less than 1000, then
bgt no_space ; print a leading blank space for output
lea space, a0 ; alignment.
bsr print_string
cmpi.l #99, d3 ; If load time is less than 100, then
bgt no_space ; print another leading blank space.
lea space, a0
bsr print_string
no_space:
move.l d3, d1 ; Copy load time to D1 for decimal conversion.
trap #4 ; Returns address of decimal string in A0.
bsr.s print_string
lea units_label, a0
bsr.s print_string
convert_execution_time_to_msec:
lea execute_time_msg, a0
bsr.s print_string
move.l d5, d0 ; Save a copy to add.
asl.l #2, d5 ; Shift to multiply by 4.
add.l d0, d5 ; To complete multiplication by 5.
print_execution_time:
cmpi.l #999, d5 ; If execute time is less than 1000, then
bgt _no_space ; print a leading blank space for output
lea space, a0 ; alignment.
bsr print_string
cmpi.l #99, d5 ; If execute time is less than 100, then
bgt.s _no_space ; print another leading blank space.
lea space, a0
bsr.s print_string
_no_space:
move.l d5, d1 ; Copy execute time for decimal conversion.
trap #4 ; Returns address of decimal string in A0.
bsr.s print_string
lea units_label, a0
bsr.s print_string
rte
;
; Subroutine
;
print_string: ; Expects address of string to be in A0.
pea (a0) ; Push address of string onto stack.
move.w #9, -(sp) ; Function = c_conws = GEMDOS $9.
trap #1 ; GEMDOS call
addq.l #6, sp ; Reset stack pointer to top of stack.
rts
data
space: dc.b " ",0
execute_time_msg: dc.b " Execute time: ",0
units_label: dc.b " milliseconds", $D,$A,0
bss
align ; Align storage on a word boundary.
program_end: ds.l 0
end
Program 16. A utility that computes a program's load and
execution times.
; Program Name: SPEED_1.S
; Version: 1.006
; Assembly Instructions:
; Assemble in "PC-relative" mode and save with a TTP extension.
; Execution Instructions:
; SPEED_1.TTP will not execute unless the custom traps in program
; TRAPS.PRG and TRAP_9.PRG have previously been installed. The custom
; traps are installed when those programs are executed from the desktop
; or from an AUTO folder on a boot disk.
; NOTE: The time required for a program to be loaded into memory depends
; on the assembly mode used to assemble the program. This will be
; shown, using SPEEDTST.TTP, in chapter 5.
; In addition, a program's load time depends on the drive from which
; the program is loaded, the method used to format the disk on which
; the program is located, the position of the program on the disk
; and, in this case, the position of the child process relative to the
; position of the parent process.
; To eliminate the drive-variables when comparing the load and
; execution times of one program to that of another, the parent and
; the child should be isolated to an otherwise empty partition or
; floppy disk for each spawning instance.
; For example, if there are two programs involved in the comparison,
; first copy the parent, which is SPEED_1 in this case, so that it is
; the only item in the hard disk partition or on the floppy. Then,
; copy the first program to the same partition or floppy. Execute
; the parent, SPEED_1 in this case, and obtain the results.
; Remove the first program and copy the second. Execute the parent
; and obtain the results for the second program.
; Execute from the desktop. Type the name of an executable file which
; has a TOS or PRG extension on SPEED_1.TTP's input parameter line. The
; name of the program you type on the parameter line must be in the same
; directory as is SPEED_1.TTP and the program must be one that terminates
; with GEMDOS function $4C.
; Upon termination, the spawned program must return the value that is in
; memory location $4BA immediately after it has been loaded, hereafter called
; the after-load time or after-load value. Custom trap #3 (get_time) can be
; used to obtain that value. SPEED_1.TTP uses the value returned in D0 to
; calculate the spawned program's load and execution times.
; The spawned program must terminate with GEMDOS function $4C so that
; the after-load value can be returned in D0 by that function. The value
; returned in D0 by GEMDOS function $4C is limited to a 16 bit value.
; If the spawned program has any halt or wait instructions, such as wait
; for a keypress, etc., those should be commented out, then the program
; should be assembled especially for the speed test. Otherwise the
; execution time will include the time waiting for input.
; If custom trap #8 is used to terminate the program, the trap will
; execute a wait_for_keypress algorithm when the program is executed from
; the desktop, but it will omit the wait algorithm when the program is
; spawned by SPEED_1.TTP. In addition, trap #8 will return the after-load
; value to SPEED_1.TTP and terminate the spawned program with GEMDOS function
; $4C.
; Both trap #8 and SPEED_1.TTP require that the spawned program be
; initialized with custom trap #6. See the note in the data section, below.
; Primary Function:
; Spawn a process. Calculate the spawned program's load and execution
; times. Store these values in a disk file that is identified by the name
; of the spawned process with a DAT suffix.
; If the spawned process directs output to the screen, store that output
; in the same disk file. Note: only screen directed output processed by
; GEMDOS function $9 will be directed to the file. If BIOS function $3 is
; used for screen output, that output will not be redirected to the file.
; Secondary Function:
; Verify that trap #6 is resident and functions correctly. SPEED_1
; confirms that because it will not be able to spawn a process unless
; the trap #6 call has returned excess memory to the system.
; Description:
; SPEED_1 is the first in a series of programs which progress in
; algorithmic perfection until the program SPEEDTST is developed. Using
; this series of programs, I intend to help you experience selected stages
; of a program development process.
; The primary attribute of this development process is its dependence,
; during the early stages of development, on familiar documented algorithms
; that can easily be found in references for many programming languages.
; After a working model has been developed with these familiar algorithms,
; attempts are made to introduce unfamiliar algorithms which may be faster
; or consume less memory.
release_excess_memory:
lea program_end, a0 ; Put "end of program" address in A0.
movea.l 4(a7), a1 ; Put "basepage" address in A1.
movea.l a1, a4 ; Copy to A4 for command line access.
trap #6 ; Calculate program size and release memory.
; NOTE: A local stack is not declared in PRG_5AP.TOS. Because of the long
; string that is printed by that program, this program will bomb when
; it spawns PRG_5AP.TOS, if a local stack is not declared here.
lea stack, a7 ; Point A7 to this program's stack.
; The next task to be accomplished is an initialization algorithm. The
; name of the program that is to be typed on SPEED_1.TTP's input parameter
; line must be used in several ways. First, its suffix must be changed to
; DAT so that it can be passed as a NULL terminated string when GEMDOS $3C
; is invoked to create the disk file.
; Then it must be passed as a NULL terminated string with the program's
; original suffix when GEMDOS $4B is invoked to spawn the program.
; Finally, the program's name is used as part of SPEED_1.TTP's output
; header.
; The command line processing algorithm creates the required NULL terminated
; strings, storing them in locations declared in the data section of SPEED_1.
process_command_line_parameters:
lea $80(a4), a4 ; Fetch address of parameters.
move.b (a4)+, d0 ; Fetch parameter line character count.
lea program_name, a3 ; Load program_name address in A3.
subq.b #1, d0 ; Set up counter.
ext.w d0 ; Extend to match the size of the dbra
; instruction.
; NOTE: The dbcc instruction operates on a word length value, therefore,
; the value in the register that is to be decremented by a dbcc
; instruction must be placed there with a word size instruction, such
; as move.w #10, D0; or with a longword size instruction, as long as
; the value in the longword is limited to word size validity, or with
; a byte size instruction, as long as the value in the register is
; sign extended to word size, as is done in the instruction above.
fetch_character:
move.b (a4)+, (a3)+ ; Store character.
dbra d0, fetch_character ; Loop until d0 becomes negative.
move.b #0, (a3) ; Finish with a NULL.
create_file_name: ; Create a file to accept standard output.
lea filename, a4
lea program_name, a3
copy_name:
move.b (a3)+, (a4)+
cmpi.b #$2E, (a3) ; Is next byte of program_name the period?
bne.s copy_name ; Continue looping until period is seen.
move.b #$2E, (a4)+ ; Add a period.
move.b #$44, (a4)+ ; Add letter 'D'.
move.b #$41, (a4)+ ; Add letter 'A'.
move.b #$54, (a4)+ ; Add letter 'T'.
move.b #0, (a4) ; Add a NULL.
create_file:
move.w #0, -(sp) ; File attribute = read/write.
pea filename ; Will be name of spawned process + .DAT.
move.w #$3C, -(sp) ; Function = f_create = GEMDOS $3C.
trap #1 ; File handle is returned in D0.
addq.l #8, sp
lea file_handle, a0 ; Store returned file handle.
move.w d0, (a0)
redirect_output: ; Exchange file handle with screen's handle.
move.w file_handle, -(sp) ; This is the disk file's handle.
move.w #1, -(sp) ; This is the video screen's handle.
move.w #$46, -(sp) ; Function = f_force = GEMDOS $46.
trap #1
addq.l #6, sp
get_start_time:
lea start_time, a3 ; Fetch address of variable "start_time".
trap #3 ; Returns value of system clock in D0.
move.w d0, (a3) ; Save start time.
load_and_execute_program:
pea environ_string
pea command_line
pea program_name
move.w #0, -(sp)
move.w #$4B, -(sp) ; Function = GEMDOS $4B = p_exec.
trap #1
move.w d0, d3 ; Copy after-load value to D3 for calculation.
get_end_time:
trap #3 ; Returns value of system clock in D0.
move.w d0, d5 ; Copy to D5 for calculation.
sub.w d3, d5 ; Subtract after-load time from end time.
ext.l d5 ; Extend to 32 bits.
; NOTE: D5 now contains the spawned program's execution time, but the time
; has not yet been converted to milliseconds. See the note below
; concerning the sign extension of D3 and D5.
reposition_stack_pointer:
lea $10(sp), sp
; Note the difference between the use of GEMDOS function $19 below and
; the way it is used on page 116 of the Internals book. In the
; Internals book there are two errors: (1) sp should not be referenced
; indirectly, as (sp); (2) the ASCII code for the letter A should be
; added to the contents of the register--in the internals book the
; contents of the register are added to the ASCII code for the letter
; A.
get_drive:
move.w #$19, -(sp) ; Function = dgetdrv = GEMDOS $19.
trap #1 ; Returns 0 for drive A, 1 for B, etc.
addq.l #2, sp
add.b #$41, d0 ; Add ASCII value for A to compute ASCII
lea drive, a0 ; letter code for the drive value returned.
move.b d0, (a0) ; Save drive's ASCII letter code.
print_heading:
lea heading, a0
bsr print_string
lea program_name, a0
bsr print_string
print_drive_for_spawned_program:
lea drive_msg, a0
bsr print_string
compute_load_time:
lea load_time_msg, a0
bsr.s print_string
lea start_time, a3
sub.w (a3), d3 ; Subtract start time from after-load time.
ext.l d3 ; Extent to 32 bits.
; SIGN EXTENSION NOTE
; The value in D3, above, and in D5 previously, is extended to 32 bits
; because, although the number of 200hz intervals we are able to utilize is
; limited to a word size by the value that is returned in D0 via GEMDOS
; function $4C, the time converted to milliseconds can extend beyond that
; word size limitation.
trap #9 ; See description in TRAP_9.S.
close_file:
move.w file_handle, -(sp)
move.w #$3E, -(sp) ; Function = fclose = GEMDOS $3E.
trap #1
addq.l #4, sp
terminate:
move.w #0, -(sp)
trap #1
print_string: ; Expects address of string to be in A0.
pea (a0) ; Push address of string onto stack.
move.w #9, -(sp) ; Function = c_conws = GEMDOS $9.
trap #1 ; GEMDOS call
addq.l #6, sp ; Reset stack pointer to top of stack.
rts
data
heading: dc.b $D,$A,"SPEED_1.TTP Execution Results",$D,$A
dc.b "for ",0
drive_msg: dc.b ", loaded from drive: "
drive: dc.b "A",$D,$A,0
load_time_msg: dc.b $D,$A," Load time: ",0
; NOTE: Custom trap #6 checks the environmental string pointer of each
; program that invokes it to see if the pointer contains the address
; of the label "environ_string" below. That test is performed by
; comparing the contents of the address contained in the pointer to
; the ASCII string "TERM" declared below.
; When a match occurs, it means that the program invoking trap #6 has
; been spawned by SPEED_1 (or by a similar program), therefore, trap
; #6 sets the value of the boolean variable "spawned", declared by
; TRAPS.PRG, to all ones = true.
; When custom trap #8 is invoked by a program, the state of the
; variable "spawned" is tested. If the state is true, the program
; invoking custom trap #8 is terminated with GEMDOS function $4C and
; the after-load time, which was saved by custom trap #6, is returned
; to the parent program.
; If the state of "spawned" is false, GEMDOS function $8 is executed
; so that execution will pause for a keypress. When the keypress is
; received, GEMDOS function $0 is executed.
; In this manner, custom trap #8, working in conjunction with custom
; trap #6, eliminates the "wait for keypress" algorithm automatically
; when a program is spawned by SPEED_1 (or a similar program). This
; prevents the computed execution time from being corrupted by a time
; period that involves a wait for keyboard input.
environ_string: dc.b "TERM",0
command_line: dc.b 0
align
bss
start_time: ds.w 1 ; Value in $4BA just before spawning.
file_handle: ds.w 1 ; Handle for the filename below.
filename: ds.l 4 ; File name for execution results.
program_name: ds.l 4 ; Filename buffer. Must be NULL terminated.
ds.l 96 ; Program stack.
stack: ds.l 0 ; Address of program stack.
program_end: ds.l 0
end
Program 17 was prepared as a simple example to be
executed by program 16 and the other programs in the series.
Program 17 illustrates the use of custom traps #3, #6 and
#8. Assemble programs 16 and 17, then, with their
executable files in the same directory, execute program 16.
Type the name of program 17's executable file on program
16's command line. Figure 5.1 shows the contents of the
file produced by program 16. The values stored in the file
depend on the variables mentioned in program 16's
documentation.
Program 17. Execute this program by typing PRG_5AP.TOS on
SPEED_1.TTP's command line.
; Program Name: PRG_5AP.S
; Version 1.003
; Assembly Instructions:
; Assemble in PC-relative mode and save with a TOS extension.
; Execution Note:
; This program invokes custom traps which must be installed by
; TRAPS.PRG prior to its execution.
; Program Function:
; This program illustrates the use of custom traps #3, #6 and #8.
; If the program is executed from the desktop, trap #8 will execute the
; wait_for_keypress algorithm, then, when a key is pressed it will execute
; GEMDOS function 0.
; If, instead, this program is executed by typing its name on
; SPEEDTST.TTP's input parameter line, trap #8 will not execute the
; wait_for_keypress algorithm, but it will immediately execute GEMDOS
; function $4C.
; Trap #3 returns, in D0, the value of the system clock as it is
; immediately after this program has been loaded. The value in D0 is not
; corrupted before trap #6 is invoked, therefore, it is still valid when
; the trap #6 routine begins to execute. Trap #6 saves the "after-load"
; value of the system clock in its own local variable, where it is available
; for processing during the execution of trap #8.
; Trap #6 also calculates the memory occupied by this program and releases
; the memory not occupied by this program to the operating system.
fetch_load_time:
trap #3 ; Returns value of system clock in D0.
release_excess_memory: ; Also stores after-load time in TRAPS bss.
lea program_end, a0 ; Put "end of program" address in A0.
movea.l 4(a7), a1 ; Put "basepage" address in A1.
trap #6 ; Calculate program size and release memory.
waste_time:
move.l #$1, d0
outer_loop:
move.l #$FDE8, d1
inner_loop:
move.l #$FDE8, d2
dbra d1, inner_loop
dbra d0, outer_loop
lea heading, a0
bsr.s print_string
lea string, a0
bsr.s print_string
trap #8 ; Terminate.
print_string:
pea (a0)
move.w #9, -(sp)
trap #1
addq.l #6, sp
rts
data
heading: dc.b 'PRG_5AP.TOS Execution Results',$D,$A,$D,$A,0
string: dc.b ' When executed from the desktop, this program will print '
dc.b 'this string on the',$D,$A
dc.b ' video screen and pause for a keypress. But, when this '
dc.b 'program is spawned by',$D,$A
dc.b ' SPEED_1, SPEED_2, SPEED_3 or SPEEDTST, the string will '
dc.b 'be stored in a file ',$D,$A
dc.b ' named PRG_5AP.DAT and the program will not pause for a '
dc.b ' keypress.',$D,$A,0
bss
align
program_end: ds.l 0
end ; Assembler pseudo-op.
PRG_5AP.TOS Execution Results
When executed from the desktop, this program will print this string on the
video screen and pause for a keypress. But, when this program is spawned by
SPEED_1, SPEED_2, SPEED_3 or SPEEDTST, the string will be stored in a file
named PRG_5AP.DAT and the program will not pause for a keypress.
Figure 5.1. Contents of PRG_5AP.DAT, the data file produced
by program 16 to contain program 17's load and execution
times.
PRG_5AP.TOS Execution Results
When executed from the desktop, this program will print this string on the
video screen and pause for a keypress. But, when this program is spawned by
SPEED_1, SPEED_2, SPEED_3 or SPEEDTST, the string will be stored in a file
named PRG_5AP.DAT and the program will not pause for a keypress.
SPEED_1.TTP Execution Results
for PRG_5AP.TOS, loaded from drive: G
Load time: 45 milliseconds
Execute time: 680 milliseconds
The Second Model
After program 16 was operational, I began to think
about ways I might improve the command line processing
algorithm. Also, I decided to try to improve the accuracy
of the calculated load time by initializing the stack for
GEMDOS $4B, withholding the invocation of trap #1, then
invoking trap #3 to get the start time, just before invoking
trap #1 to load and execute program 17.
The improvements are incorporated in program 18, the
next program in the series. In SPEED_2, the movem.l
instruction is used to move the command line to four
registers, then from there to a declared location in the
data section. Since this program is simply a model, and
since the algorithms which create the disk file were
developed in SPEED_1, I decided that there was no reason to
repeat those algorithms in SPEED_2.
However, I discovered that, for no apparent reason, the
load time reported by SPEED_2 increased significantly, even
though the experiments with SPEED_1 and SPEED_2 were
executed under identical conditions. By eliminating each of
SPEED_1's algorithms that are involved with the disk file,
in turn, I learned that, for some reason, the load time is
shorter when a file is created. Therefore, in order to
maintain a valid experiment, I created a dummy file in
SPEED_2, but wrote nothing to it.
But by the time I got to SPEEDTST, I realized that the
file name creation algorithm was actually a part of the
command line processing algorithm, therefore, in order to
validate comparisons between the three models, I had to redo
SPEED_2 and SPEED_3, including a file name creation
algorithm in each. While doing that, I was able to use the
movem.l instruction to develop a faster creation algorithm
than that used in SPEED_1.
Program 18. The next stage of SPEEDTST.TTP's development.
; Program Name: SPEED_2.S
; Version 1.003
; NOTE: This program is similar to SPEED_1. The differences between the
; the two is that this one uses a different algorithm to process
; the command line, and it fetches the start time at a more appropriate
; place in the program.
; Assembly Instructions:
; Assemble in "PC-relative" mode and save with a TTP extension.
; Function:
; Spawn a process and calculate the spawned program's load and execution
; times. Pause for a keypress before terminating.
release_excess_memory:
lea program_end, a0 ; Put "end of program" address in A0.
movea.l 4(a7), a1 ; Put "basepage" address in A1.
movea.l a1, a4 ; Copy to A4 for command line access.
trap #6 ; Calculate program size and release memory.
; NOTE: A local stack is not declared in PRG_5AP.TOS. Because of the long
; string that is printed by that program, this program will bomb when
; it spawns PRG_5AP.TOS, if a local stack is not declared here.
lea stack, a7 ; Point A7 to this program's stack.
; NOTE ABOUT THE COMMAND LINE PROCESSING ALGORITHM
; Refer to figure 2.13 of chapter 2 for an image of a command line
; that is stored in a program's basepage. The first byte of the command
; line is a count of the ASCII characters contained therein. The second
; byte is the first character in the command line. The last character in
; the command line is followed by the ASCII code for a carriage return;
; the carriage return is not included in the character count.
; For program SPEED_2 we know that the command line character count
; cannot exceed 12 characters = 12 bytes = 3 longwords. Therefore, it
; would be convenient if those 3 longwords could be transfered directly to
; three data registers. Unfortunately, the MC68000 will not permit the
; movem instruction to transfer data which begins at an odd address.
; Because of this restriction, it would be convenient if the operating
; system stored the first command line character at an even address.
; Unfortunately, it does not. Therefore, we are forced to fetch 4 longwords
; from the vicinity of the command line. That's why we must use four data
; registers instead of three.
; To complicate things, the command line ASCII string will be corrupted
; by the first byte in the first register, because it is the character count,
; not a valid character. So, when the data contained in the data registers
; are transferred to a declared variable location, this byte must be stripped
; from the command line ASCII string.
; I accomplish this with no wasted time by declaring two variable
; locations, input_line and program_name. Since input_line is one byte in
; length, and the first location for program_name immediately follows that
; byte, when the contents of the data registers is moved to the location of
; input_line, the variable program_name will point to the first character
; of the command line ASCII string, as it should.
; The carriage return at the end of the ASCII string is also transferred
; to the 15 byte array addressed by program_name. It must be overwritten by
; a NULL so that the ASCII string is NULL terminated. That is accomplished
; fetching the command line character count as a byte length value, extending
; it to word length and using the result in an operand that uses "address
; register indirect with index" addressing.
process_command_line:
lea input_line, a3 ; Fetch location to contain command line.
lea output_line, a5 ; A second location: for filename.
movem.l $80(a4), d0-d3 ; Move 16 bytes of command line to 4 registers.
movem.l d0-d3, (a3) ; Move them to address "input_line".
movem.l d0-d3, (a5) ; Move them to address "output_line".
move.b $80(a4), d0 ; Fetch command line ASCII character count.
ext.w d0 ; Extend to word for next instruction.
move.b #0, 1(a3,d0.w) ; Store a null at end of command line input.
move.b #0, 1(a5,d0.w) ; Same for filename buffer.
insert_filename_suffix:
move.b #$44, -2(a5,d0.w) ; Insert letter 'D'.
move.b #$41, -1(a5,d0.w) ; Insert letter 'A'.
move.b #$54, 0(a5,d0.w) ; Insert letter 'T'.
create_file:
move.w #0, -(sp) ; File attribute = read/write.
pea filename ; Will be name of spawned process + .DAT.
move.w #$3C, -(sp) ; Function = f_create = GEMDOS $3C.
trap #1 ; File handle is returned in D0.
addq.l #8, sp
lea file_handle, a0
move.w d0, (a0)
redirect_output: ; Exchange file handle with screen's handle.
move.w file_handle, -(sp) ; This is the disk file's handle.
move.w #1, -(sp) ; This is the video screen's handle.
move.w #$46, -(sp) ; Function = f_force = GEMDOS $46.
trap #1
addq.l #6, sp
; NOTE: In order to increase the accuracy of the start time, the stack is
; prepared for the spawning process, then, just before trap #1 is
; invoked, custom trap #3 is invoked and the start time is saved.
prepare_stack_for_load_and_execute_program:
pea environ_string
pea command_line
pea program_name
move.w #0, -(sp)
move.w #$4B, -(sp) ; Function = GEMDOS $4B = p_exec.
get_start_time:
lea start_time, a3 ; Fetch address of variable "start_time".
trap #3 ; Returns value of system clock in D0.
move.w d0, (a3) ; Save start time.
load_and_execute_program:
trap #1
move.w d0, d3 ; Copy after-load value to D3 for calculation.
get_end_time:
trap #3 ; Returns value of system clock in D0.
move.w d0, d5 ; Copy to D5 for calculation.
sub.w d3, d5 ; Subtract after-load time from end time.
ext.l d5 ; Extend to 32 bits.
reposition_stack_pointer:
lea $10(sp), sp
get_drive:
move.w #$19, -(sp) ; Function = dgetdrv = GEMDOS $19.
trap #1
addq.l #2, sp
add.b #'A', d0
lea drive, a0
move.b d0, (a0)
print_heading:
lea heading, a0
bsr print_string
lea program_name, a0
bsr print_string
print_drive_for_spawned_program:
lea drive_msg, a0
bsr print_string
compute_load_time:
lea load_time_msg, a0
bsr print_string
lea start_time, a3
sub.w (a3), d3 ; Subtract start time from after-load time.
ext.l d3 ; Extent to 32 bits.
trap #9 ; See description in TRAPS.S.
close_file:
move.w file_handle, -(sp)
move.w #$3E, -(sp) ; Function = fclose = GEMDOS $3E.
trap #1
addq.l #4, sp
terminate:
move.w #0, -(sp)
trap #1
print_string: ; Expects address of string to be in A0.
move.l a0, -(sp) ; Push address of string onto stack.
move.w #9, -(sp) ; Function = c_conws = GEMDOS $9.
trap #1 ; GEMDOS call
addq.l #6, sp ; Reset stack pointer to top of stack.
rts
data
heading: dc.b $D,$A,"SPEED_2.TTP Execution Results",$D,$A
dc.b "for ",0
drive_msg: dc.b ", loaded from drive: "
drive: dc.b "A",$D,$A,0
load_time_msg: dc.b $D,$A," Load time: ",0
environ_string: dc.b "TERM",0
command_line: dc.b 0
align
bss
start_time: ds.w 1
file_handle: ds.w 1
input_line: ds.b 1
program_name: ds.b 15 ; Program name buffer.
output_line: ds.b 1
filename: ds.b 15 ; Filename buffer.
ds.l 96 ; Program stack.
stack: ds.l 0 ; Address of program stack.
program_end: ds.l 0
end
SPEED_2.TTP Execution Results
PRG_5AP.TOS Execution Results
When executed from the desktop, this program will print this string on the
video screen and pause for a keypress. But, when this program is spawned by
SPEED_1, SPEED_2, SPEED_3 or SPEEDTST, the string will be stored in a file
named PRG_5AP.DAT and the program will not pause for a keypress.
SPEED_2.TTP Execution Results
for PRG_5AP.TOS, loaded from drive: G
Load time: 45 milliseconds
Execute time: 680 milliseconds
Program 19 serves as the final model to be considered
in the development of SPEEDTST.TTP. Within it, a new
command line processing algorithm is developed and the user
declared stack is discarded. As in the two previous models,
the command line algorithm must prepare two strings: one to
be used as the name of a disk file, the other to be passed
as a parameter when GEMDOS $4B is invoked. The latter
string is also used as part of the utility's header. The
algorithm in this model takes advantage of the presence of
the string in the command line to eliminate movement to
prepare the latter string. Instead, the string is altered
in place, at its location in the basepage. The movement
which is required is a prerequisite for preparation of the
filename string.
Program 19. The final program model in SPEEDTST.TTP's
development.
; Program Name: SPEED_3.S
; Version 1.004
; NOTE: This program is similar to SPEED_2. The differences are that a
; different algorithm is used to process the command line, and no user
; stack is declared in SPEED_3.
; Assembly Instructions:
; Assemble in "PC-relative" mode and save with a TTP extension.
; Function:
; Spawn a process and calculate the spawned program's load and execution
; times. Pause for a keypress before terminating.
release_excess_memory:
lea program_end, a0 ; Put "end of program" address in A0.
movea.l 4(a7), a1 ; Put "basepage" address in A1.
lea $80(a1), a3 ; Put "command line" address in A3.
trap #6 ; Calculate program size and release memory.
; NOTE: A local stack is not declared in PRG_5AP.TOS. Because of the long
; string that is printed by that program, this program will bomb when
; it spawns PRG_5AP.TOS, if a local stack is not declared here.
lea stack, a7 ; Point A7 to this program's stack.
; COMMAND LINE PROCESSING NOTE
; At this point register A3 contains the address of the command line.
; In the algorithm below, the address of the first ASCII character in the
; command line input is stored at the pointer "program_name". Then a NULL
; character is written over the carriage return code at the end of the
; command line input. Thus the command line input itself becomes the
; string, the address of which must be pushed on the stack during the p_exec
; invocation.
; Even though register A3 contains the address of the program name string,
; and the contents of A3 can be pushed during the p_exec invocation, the
; address of the string must be stored in a declared location because
; register A3 might be used by the spawned program. And the address of the
; string is still needed to print the spawned program's name in SPEED_3's
; output heading.
process_command_line:
lea command_line, a4 ; Fetch location to contain command line.
movem.l (a3), d0-d3 ; Move 16 bytes of command line to 4 registers.
movem.l d0-d3, (a4) ; Move them to address "command_line".
move.b (a3)+, d0 ; Fetch parameter line input character count.
ext.w d0 ; Extend to word for next instruction.
move.b #0, 1(a4,d0.w) ; Store a null at end of string.
lea program_name, a0 ; Fetch address of pointer to program name.
move.l a3, (a0) ; Store address of program name in pointer.
move.b #0, 0(a3,d0.w) ; Replace $0D at end of program name with NULL.
insert_filename_suffix:
move.b #$44, -2(a4,d0.w) ; Insert letter 'D'.
move.b #$41, -1(a4,d0.w) ; Insert letter 'A'.
move.b #$54, 0(a4,d0.w) ; Insert letter 'T'.
create_file:
move.w #0, -(sp) ; File attribute = read/write.
pea filename ; Will be name of spawned process + .DAT.
move.w #$3C, -(sp) ; Function = f_create = GEMDOS $3C.
trap #1 ; File handle is returned in D0.
addq.l #8, sp
lea file_handle, a0
move.w d0, (a0)
redirect_output: ; Exchange file handle with screen's handle.
move.w file_handle, -(sp) ; This is the disk file's handle.
move.w #1, -(sp) ; This is the video screen's handle.
move.w #$46, -(sp) ; Function = f_force = GEMDOS $46.
trap #1
addq.l #6, sp
prepare_stack_for_load_and_execute_program:
pea environ_string
pea command_string
pea (a3) ; Push address of program name string.
move.w #0, -(sp)
move.w #$4B, -(sp) ; Function = GEMDOS $4B = p_exec.
get_start_time:
lea start_time, a3 ; Fetch address of variable "start_time".
trap #3 ; Returns value of system clock in D0.
move.w d0, (a3) ; Save start time.
load_and_execute_program:
trap #1
move.w d0, d3 ; Copy after-load value to D3 for calculation.
get_end_time:
trap #3 ; Returns value of system clock in D0.
move.w d0, d5 ; Copy to D5 for calculation.
sub.w d3, d5 ; Subtract after-load time from end time.
ext.l d5 ; Extend to 32 bits.
reposition_stack_pointer:
lea $10(sp), sp
get_drive:
move.w #$19, -(sp) ; Function = dgetdrv = GEMDOS $19.
trap #1
addq.l #2, sp
add.b #'A', d0
lea drive, a0
move.b d0, (a0)
print_heading:
lea heading, a0
bsr print_string
lea program_name, a0 ; Fetch address of program name string.
movea.l (a0), a0
bsr print_string ; Print spawned program's name.
print_drive_for_spawned_program:
lea drive_msg, a0 ; Print drive from which spawned program was
bsr print_string ; loaded.
compute_load_time:
lea load_time_msg, a0
bsr print_string
lea start_time, a3
sub.w (a3), d3 ; Subtract start time from after-load time.
ext.l d3 ; Extent to 32 bits.
trap #9 ; See description in TRAPS.S.
close_file:
move.w file_handle, -(sp)
move.w #$3E, -(sp) ; Function = fclose = GEMDOS $3E.
trap #1
addq.l #4, sp
terminate:
move.w #0, -(sp)
trap #1
print_string: ; Expects address of string to be in A0.
move.l a0, -(sp) ; Push address of string onto stack.
move.w #9, -(sp) ; Function = c_conws = GEMDOS $9.
trap #1 ; GEMDOS call
addq.l #6, sp ; Reset stack pointer to top of stack.
rts
data
heading: dc.b $D,$A,"SPEED_3.TTP Execution Results",$D,$A
dc.b "for ",0
drive_msg: dc.b ", loaded from drive: "
drive: dc.b "A",$D,$A,0
load_time_msg: dc.b $D,$A," Load time: ",0
environ_string: dc.b "TERM",0
command_string: dc.b 0
align
bss
start_time: ds.w 1
program_name: ds.l 1 ; Pointer to string in basepage command line.
file_handle: ds.w 1
command_line: ds.b 1 ; Unused character count will go here.
filename: ds.b 15 ; File name for redirected output.
ds.l 96 ; Program stack.
stack: ds.l 0 ; Address of program stack.
program_end: ds.l 0
end
SPEED_3.TTP Execution Results
PRG_5AP.TOS Execution Results
When executed from the desktop, this program will print this string on the
video screen and pause for a keypress. But, when this program is spawned by
SPEED_1, SPEED_2, SPEED_3 or SPEEDTST, the string will be stored in a file
named PRG_5AP.DAT and the program will not pause for a keypress.
SPEED_3.TTP Execution Results
for PRG_5AP.TOS, loaded from drive: G
Load time: 40 milliseconds
Execute time: 685 milliseconds
Each of the three programs, SPEED_1, SPEED_2 and
SPEED_3 are models which fixate attention to a particular
phase of a continuous development cycle. It would be very
difficult, if not impossible, to pause as each instruction
of each algorithm is chosen in order to describe the
creative processes which instigate the choice. Furthermore,
the algorithmic development process is rhythmically
recursive. At intervals, the duration of which is dictated
by personal education and experience, the programmer is
drawn back to the beginning of the process to verify what
has been done and, perhaps, to refine portions of the
product.
The final stage of the development process involves an
assimilation of the best features of the three programs into
a utility that is a fast as possible, while consuming
minimum requisite memory. In order to choose the best
command line processing algorithm from the three that were
introduced in the models, a comparison of their relative
speeds and requisite memory is needed. Program 20 was
written to perform that chore.
Program 20. This program was used to compare the speeds of
the command line processing algorithms used in programs 16,
18 and 19.
; Program Name: CMD_TEST.S
; Version 1.004
; Assembly Instructions:
; Assemble in "PC-relative" mode and save with a TOS extension.
; Execution Instructions:
; Execute program CMD_TEST.TOS from the desktop. After reading the
; program's output on the screen, terminate execution by pressing the
; Return key.
; Function:
; This program is used to compare the relative speed of the command
; line processing algorithms used in SPEED_1, SPEED_2 and SPEED_3.
; Description:
; Three command line processing algorithms are executed 10,000 times.
; The elapsed time and requisite memory for each algorithm is printed to
; the screen. So that this program need not be executed as a TTP program,
; the command line is salted with a declared string.
release_excess_memory:
lea program_end, a0 ; Put "end of program" address in A0.
movea.l 4(a7), a1 ; Put "basepage" address in A1.
movea.l a1, a5 ; Copy to A5 for command line access.
trap #6 ; Calculate program size and release memory.
lea stack, a7 ; Point A7 to this program's stack.
mainline:
lea heading, a0
bsr print_string
salt_command_line:
lea salt, a0 ; Fetch pointer to ersatz command line input.
movem.l (a0), d0-d3 ; Move it to registers.
movem.l d0-d3, $80(a5) ; Copy to actual command line address.
speed_1_algorithm:
lea speed_1_msg, a0
bsr print_string
move.l #9999, d4 ; Initialize counter for 10000 executions.
trap #3 ; Get start time.
move.l d0, d5 ; Copy for calculations.
speed_1_loop:
lea $80(a5), a4 ; Fetch address of parameters.
move.b (a4)+, d0 ; Fetch parameter line character count.
lea program_name_1, a3 ; Load buffer address.
subq.b #1, d0 ; Set up counter.
ext.w d0 ; Extend to match the size of the dbra
; instruction.
fetch_character:
move.b (a4)+, (a3)+ ; Store character.
dbra d0, fetch_character ; Loop until D0 becomes negative.
move.b #0, (a3) ; Finish with a NULL.
create_file_name: ; Create a file to accept standard output.
lea filename_1, a4 ; Load buffer address.
lea program_name_1, a3 ; Load buffer address.
copy_name:
move.b (a3)+, (a4)+
cmpi.b #$2E, (a3) ; Is next byte of program_name the period?
bne.s copy_name ; Continue looping until period is seen.
move.b #$2E, (a4)+ ; Add a period.
move.b #$44, (a4)+ ; Add letter 'D'.
move.b #$41, (a4)+ ; Add letter 'A'.
move.b #$54, (a4)+ ; Add letter 'T'.
move.b #0, (a4) ; Add a NULL.
speed_1_memory:
dbra d4, speed_1_loop ; Loop until D4 becomes negative.
trap #3 ; Get end time.
bsr convert_and_print_time
speed_2_algorithm:
lea speed_2_msg, a0
bsr print_string
move.l #9999, d4 ; Initialize counter for 10000 executions.
trap #3 ; Get start time.
move.l d0, d5 ; Copy for calculations.
speed_2_loop:
lea input_line, a3 ; Fetch location to contain command line.
lea output_line_2, a4 ; A second location: for filename.
movem.l $80(a5), d0-d3 ; Move 16 bytes of command line to 4 registers.
movem.l d0-d3, (a3) ; Move them to address "input_line".
movem.l d0-d3, (a4) ; Move them to address "output_line".
move.b $80(a5), d0 ; Fetch command line ASCII character count.
ext.w d0 ; Extend to word for next instruction.
move.b #0, 1(a3,d0.w) ; Store a null at end of command line input.
move.b #0, 1(a4,d0.w) ; Same for filename buffer.
insert_filename_suffix:
move.b #$44, -2(a4,d0.w) ; Insert letter 'D'.
move.b #$41, -1(a4,d0.w) ; Insert letter 'A'.
move.b #$54, 0(a4,d0.w) ; Insert letter 'T'.
speed_2_memory:
dbra d4, speed_2_loop ; Loop until D4 becomes negative.
trap #3 ; Get end time.
bsr convert_and_print_time
speed_3_algorithm:
lea speed_3_msg, a0
bsr print_string
move.l #9999, d4 ; Initialize counter for 10000 executions.
lea $80(a5), a5 ; Fetch command line address.
trap #3 ; Get start time.
move.l d0, d5 ; Copy for calculations.
speed_3_loop:
; NOTE: The first instruction, below, is not used in the actual SPEED_3
; algorithm, but it must be included here to reset A3 to the
; correct address each time through the loop. This instruction
; adds 4 clock periods per loop, 40000 clock periods for the
; 10000 loops, which is 5 milliseconds. The accuracy of this error
; calculation was confirmed by executing CMD_TEST.TOS with and
; without the instruction in the loop. The 5 msec error is equal to
; one system clock tick, therefore, when the loop end-time is obtained
; with the trap #3 invocation, 1 clock tick is subtracted before the
; loop time is calculated.
; The memory occupied by this instruction is not included in the
; value reported for the algorithm's requisite memory.
movea.l a5, a3
start_memory:
lea output_line_3, a4 ; Fetch location to contain command line.
movem.l (a3), d0-d3 ; Move 16 bytes of command line to 4 registers.
movem.l d0-d3, (a4) ; Move them to address "command_line".
move.b (a3)+, d0 ; Fetch command line ASCII character count.
ext.w d0 ; Extend to word for next instruction.
move.b #0, 1(a4,d0.w) ; Store a null at end of string.
lea program_name_ptr, a0 ; Fetch address of pointer to program name.
move.l a3, (a0) ; Store address of filename string in pointer.
move.b #0, 0(a3,d0.w) ; Replace $0D at end of program name with NULL.
_insert_filename_suffix:
move.b #$44, -2(a4,d0.w) ; Insert letter 'D'.
move.b #$41, -1(a4,d0.w) ; Insert letter 'A'.
move.b #$54, 0(a4,d0.w) ; Insert letter 'T'.
speed_3_memory:
dbra d4, speed_3_loop
trap #3
subq.w #1, d0 ; Subtract 1 clock tick to correct time.
bsr.s convert_and_print_time
speed_1_requisite_memory:
lea speed_1_memory_msg, a0
bsr.s print_string
lea speed_1_loop, a1 ; Calculate number of bytes occupied by the
lea speed_1_memory, a0 ; instructions in the loop, then print.
bsr.s calculate_and_print_requisite_memory
speed_2_requisite_memory:
lea speed_2_memory_msg, a0
bsr.s print_string
lea speed_2_loop, a1 ; Calculate number of bytes occupied by the
lea speed_2_memory, a0 ; instructions in the loop, then store.
bsr.s calculate_and_print_requisite_memory
speed_3_requisite_memory:
lea speed_3_memory_msg, a0
bsr print_string
lea start_memory, a1 ; Calculate number of bytes occupied by the
lea speed_3_memory, a0 ; instructions in the loop, then print.
bsr.s calculate_and_print_requisite_memory
wait_for_keypress:
move.w #8, -(sp) ; Function = c_necin = GEMDOS $8.
trap #1 ; GEMDOS call.
addq.l #2, sp ; Reposition stack pointer at top of stack.
terminate:
move.w #0, -(sp)
trap #1
print_string: ; Expects address of string to be in A0.
pea (a0) ; Push address of string onto stack.
move.w #9, -(sp) ; Function = c_conws = GEMDOS $9.
trap #1 ; GEMDOS call
addq.l #6, sp ; Reset stack pointer to top of stack.
rts
convert_and_print_time:
sub.l d5, d0 ; Subtract start time from end time.
mulu #5, d0 ; Convert to milliseconds.
move.l d0, d1 ; Convert to ASCII decimal.
trap #4
bsr print_string
lea time_label, a0
bsr print_string
rts
calculate_and_print_requisite_memory:
suba.l a1, a0
move.l a0, d1 ; Transfer requisite memory for trap call.
print_speed_1_requisite_memory:
trap #4 ; Returns address of decimal string in A0.
bsr print_string
lea memory_label, a0
bsr print_string
rts
data
salt: dc.b $B,"PRG_5AP.TOS",$D,0,0,0,0
heading: dc.b $D,$A,"CMD_TEST Execution Results",$D,$A,$D,$A,0
speed_1_msg: dc.b " SPEED_1 algorithm time: ",0
speed_2_msg: dc.b " SPEED_2 algorithm time: ",0
speed_3_msg: dc.b " SPEED_3 algorithm time: ",0
time_label: dc.b " milliseconds",$D,$A,0
speed_1_memory_msg: dc.b $D,$A," SPEED_1 algorithm requisite memory: ",0
speed_2_memory_msg: dc.b " SPEED_2 algorithm requisite memory: ",0
speed_3_memory_msg: dc.b " SPEED_3 algorithm requisite memory: ",0
memory_label: dc.b " bytes",$D,$A,0
align
bss
program_name_1: ds.l 4 ; Program name buffer for SPEED_1 algorithm.
filename_1: ds.l 4 ; Filename buffer for SPEED_1 algorithm.
input_line: ds.b 1 ; Command line buffer for SPEED_2 algorithm.
program_name_2: ds.b 15 ; Program name buffer for SPEED_2 algorithm.
output_line_2: ds.b 1 ; Second command line buffer for SPEED_2.
filename_2: ds.b 15 ; Filename buffer for SPEED_2 algorithm.
program_name_ptr: ds.l 4 ; Pointer to filename in command line for SPEED_3.
output_line_3: ds.b 1 ; Command line buffer for SPEED_3 algorithm.
filename_3: ds.b 15 ; Filename buffer for SPEED_3 algorithm.
ds.l 96 ; Program stack.
stack: ds.l 0 ; Address of program stack.
program_end: ds.l 0
end
CMD_TEST Execution Results
SPEED_1 algorithm time: 830 milliseconds
SPEED_2 algorithm time: 350 milliseconds
SPEED_3 algorithm time: 300 milliseconds
SPEED_1 algorithm requisite memory: 60 bytes
SPEED_2 algorithm requisite memory: 58 bytes
SPEED_3 algorithm requisite memory: 52 bytes
Authenticating The Results
Because the final configuration of the utility will
depend primarily on the results displayed by CMD_TEST.TOS,
the validity of those results must be beyond question. I
have used three validation techniques. First, I single-
stepped through each instruction. Then I verified the data
written by the program to its basepage command line and bss
section. Finally, I compared the execution times reported
to values calculated using the Motorola Programmer's
Reference Manual.
Figure 5.1 is a partial disassembly of program 20 as it
was in memory after execution. There you can see that the
basepage command line contains the salt data and has been
altered as specified by the SPEED_3 command line processing
algorithm. To wit: the carriage return has been replaced by
a NULL. Also evident are the strings stored in the bss
segment by the three algorithms. Table 5.1 lists the
relevant declared variables, their lengths and their
addresses in the disassembly listing.
Table 5.1 Match the variable names listed and their
addresses to the data shown in the disassembly listing.
Variable Length Address
program_name_1: ds.l 4 $0919C8
filename_1: ds.l 4 $0919D8
input_line: ds.b 1 $0919E8
program_name_2: ds.b 15 $0919E9
output_line_2: ds.b 1 $0919F8
filename_2: ds.b 15 $0919F9
program_name_ptr: ds.l 4 $091A08
output_line_3: ds.b 1 $091A0C
filename_3: ds.b 15 $091A0D
Figure 5.1. Partial disassembly of CMD_TEST.TOS after
execution, showing the basepage command line, and the
command line relevant portion of the bss section.
Table 5.2 lists the instructions used in each of the
command line processing algorithms and their required
execution clock periods as specified in the Motorola manual.
I want to give you the values I calculated for two reasons;
first, to show you that it can be done from the tables in
the Motorola guide; second, to serve as verification that
program 20 performs its task accurately. If you desire, you
can use this table to practice your interpretation of the
data in the Motorola tables. A short tutorial follows the
table.
Table 5.2 The instructions used in command line processing
algorithms of SPEED_1, SPEED_2 and SPEED_3.
Instruction Clock Periods
speed_1_loop:
lea $80(a5), a4 8
move.b (a4)+, d0 8
lea program_name_1, a3 8
subq.b #1, d0 4
ext.w d0 4
Total = sum of 5 = 32
fetch_character:
(There are 11 characters.)
move.b (a4)+, (a3)+ 12
dbra d0, fetch_character 10/14
Total = 11(12) + 10(10) + 14 = 246
move.b #0, (a3) 12
create_file_name:
lea filename_1, a4 8
lea program_name_1, a3 8
Total = sum of 3 = 28
copy_name:
(There are 7 characters.)
move.b (a3)+, (a4)+ 12
cmpi.b #$2E, (a3) 12
bne.s copy_name 10/8
Total = 7(24) + 6(10) + 8 = 236
move.b #$2E, (a4)+ 12
move.b #$44, (a4)+ 12
move.b #$41, (a4)+ 12
move.b #$54, (a4)+ 12
move.b #0, (a4) 12
Total = sum of 5 = 60
Algorithm total = 32 + 246 + 28 + 236 + 60 = 602
speed_1_memory:
dbra d4, speed_1_loop
speed_2_loop:
lea input_line, a3 8
lea output_line_2, a4 8
movem.l $80(a5), d0-d3 = 16+8(4) = 48
movem.l d0-d3, (a3) = 8+8(4) = 40
movem.l d0-d3, (a4) 8+8(4) = 40
move.b $80(a5), d0 12
ext.w d0 4
move.b #0, 1(a3,d0.w) 18
move.b #0, 1(a4,d0.w) 18
insert_filename_suffix:
move.b #$44, -2(a4,d0.w) 18
move.b #$41, -1(a4,d0.w) 18
move.b #$54, 0(a4,d0.w) 18
Algorithm total = sum of 12 = 250
speed_2_memory:
dbra d4, speed_2_loop
speed_3_loop:
movea.l a5, a3 (not counted)
start_memory:
lea output_line_3, a4 8
movem.l (a3), d0-d3 = 12+8(4) = 44
movem.l d0-d3, (a4) 8+8(4) = 40
move.b (a3)+, d0 8
ext.w d0 4
move.b #0, 1(a4,d0.w) 18
lea program_name_ptr, a0 8
move.l a3, (a0) 12
move.b #0, 0(a3,d0.w) 18
_insert_filename_suffix:
move.b #$44, -2(a4,d0.w) 18
move.b #$41, -1(a4,d0.w) 18
move.b #$54, 0(a4,d0.w) 18
Algorithm total = sum of 12 = 214
speed_3_memory:
dbra d4, speed_3_loop
Instruction Execution Times Tutorial
In my copy of the M68000 Programmer's Reference Manual,
which may not be the same as yours, MC68000 instruction
execution times are presented in Appendix D. Times for the
MC68008 are presented in Appendix E, and those for the
MC68010/MC68012 are in Appendix F. I am pointing out the
locations for the other processors so that you can avoid
them. When you are looking for MC68000 times, make sure
that you are doing so in Appendix D.
The Introduction to the appendix contains information
concerning wait states that is not applicable to the Atari
ST. The only thing in the introduction which concerns us
are the notes stating that the instruction execution times
are given in terms of external (system) clock periods and
that the number of periods includes instruction fetch and
all applicable operand fetches and stores. The ST's clock
period is 1 divided by 8,000,000 = .000000125 second = 1.25
x 10-7 sec, because the system operates with an 8 megahertz
(mhz) clock.
The first table, D-1, lists the Effective Address
Calculation Times for the addressing modes. This table is
one to which you must refer back when so directed by other
tables. Reference is made to this table via a + sign
following the number of clock periods given for a particular
instruction. The reference indicates that you should go
back to table D-1, fetch the appropriate time for the
appropriate addressing mode and data length (byte,word or
long) and add that time to the number of clock periods
preceding the + sign.
The other tables list base times for the instructions;
I say base because of the need to add an effective address
time for many instructions. The tables are arranged so that
data is presented for groups of similar instructions. You
use these tables by finding the one which lists the
instruction of interest; then, if a source operand is
involved, you locate the row specified by the source
operand, if there are rows of source operands; then, if a
destination operand is involved, you locate the column
specified by the destination operand, if there are columns
of destination operands; then, locate the data at the row-
column intersection; then, if a + sign follows the data, go
back to table D-1, fetch the effective address time and add
it to the data.
When you have done all of that, you will have the
instructions execution time in clock periods. Not all
instructions contain both a source operand and a destination
operand. Not all of the tables explicitly reference both
operand types. Not all tables list destination operands in
columns; some of them list source operands in columns.
Therefore, your mind must be on what you are doing when you
are reading the tables. For example, times for ADD/ADDA,
AND, CMP/CMPA, DIVS, DIVU, EOR, MULS, MULU, OR and SUB are
listed in table D-4. Following each time given is a + sign,
therefore, an effective address time from table D-1 is
needed for each item in the table.
You might ask, "Does the data in table D-4 pertain to
source operands or destination operands? Does the reference
to table D-1 pertain to source operands or destination
operands?". The answer to both questions is, "Yes.".
Because these instructions can be written so that the
effective addresses which head the columned data in table D-
4 can be either source or destination operands. To see what
I mean, look at the Assembler Syntax for the ADD
instruction. There you see the following notation:
ADD <ea>,Dn
ADD Dn,<ea>.
The reason that Motorola's use of the term effective
address is confusing is that, in their manual, all
addressing modes are discussed as if the location specified
in operands are somehow implied by operand format. It seems
as though the authors of the manual had originally intended
that the term effective address be used to indicate a
location specified by an operand to be ultimately found in
memory external to the processor, in contrast to processor
registers, which are internal addresses.
But, in fact, even when discussing Register Direct
Modes, the manual states, "These effective addressing modes
specify that the operand is in one of the 16 multifunction
registers.". So, I say, let it all be effective addresses,
as the authors apparently decided to do. But then the
descriptive effective is redundant, and it renders the
instruction, add effective address calculation time, which
is indicated by the + sign, ineffective.
What that instruction should instruct one to do is
this: for the appropriate operand, add the additional time
indicated in table D-1 for the appropriate addressing mode.
Of course, one must determine the appropriate operand and
the appropriate addressing mode. But this must be done
regardless of terminology. However, the manual does not
make that clear, nor does it indicate the manner in which it
can be accomplished. I shall.
Using instructions selected from those listed in table
5.2, I will conclude this tutorial by showing you how I
obtained the execution time for those instruction, then I
will show you how to obtain the time for at least one
instruction listed in the Motorola tables which none of the
instructions in table 5.2 access. I think that the
exploration will be sufficiently comprehensive.
lea input_line, a3
The lea instruction is found in table D-10, JMP, JSR,
LEA, PEA, and MOVEM Instruction Execution Times. For all of
these instructions, the destination operand is implicit: for
JMP and JSR the destination is the program counter (PC); for
LEA the destination is an address register; for PEA the
destination is a stack; for MOVEM (M->R, memory to
registers) the destination is a register group; for MOVEM
(R->M, registers to memory) the destination is a group of
memory addresses. This means that the columns containing
times in the table refer to source operands.
The source operand for lea input_line, a3 is a label,
therefore, the addressing mode used might seem to be
absolute, but the program in which the instruction is used
was assembled in AssemPro's PC-relative mode. Therefore,
the addressing mode is program counter with displacement.
The execution time for the instruction is found where the
LEA row intersects with the d16(PC) column. The time is 8
clock periods.
Eight clock periods translates to 8(.000000125 sec) =
.000001 sec = 1 microsecond = .001 msec. As you can see,
this is a very short period of time. It is not possible to
measure times that are this short with a clock that has a
resolution of 5 msec. That's why it is necessary to execute
instructions and entire algorithms within loops that extend
the time period being measured. A time period being
measured with the system clock should be sufficiently long
to render the 5 msec resolution of the clock insignificant.
Because the loops which execute the algorithms many
times contain branching overhead, it is easier to compare
relative execution times, instead of absolute execution
times, when performing the comparisons with computer
generated data. When absolute times are desired, it is
easier to compute them using the tables in the Motorola
manual.
lea $80(a5), a4
Here the addressing mode of the source operand is
address register indirect with displacement. The execution
time for the instruction is found at the point of
intersection specified by the LEA row and the d16(An)
column. The time is 8 clock periods.
movem.l $80(a5), d0-d3
The row labeled MOVEM M->R is specified for this
instruction. Furthermore, this row is divided into two
subrows: Word and Long. The instruction specifies a
longword operation, so the Long subrow must be used. The
source operand uses address register indirect with
displacement addressing = d16(An).
For this instruction, the data found at the
intersection of the specified row and column is not the
instruction execution time. Instead, there is a formula
from which the execution time must be calculated. The
parameter n specified by the formula is a variable for the
number of registers specified in the instruction. In this
case the transfer from memory is to use 4 registers. The
instruction execution time is 16 + 8(4) = 48 clock periods.
movem.l d0-d3, (a3)
Refer to the row labeled MOVEM R->M in the D-10 table.
The formula shown at the intersection of the Long subrow and
(An) column is similar to that for the MOVEM M->R
instruction. The instruction execution time is 8 + 8(4) =
40 clock periods.
move.b (a4)+, d0
The execution times for move instructions are contained
in two tables. The first table, D-2 (Move Byte and Word
Instruction Execution Times), must be used for this
instruction because a byte operation is specified. The
addressing mode used by the source operand is address
register indirect with postincrement. That used in the
destination operand is data register direct. The
instruction time of 8 clock periods is found at the
intersection of the (An)+ row and the Dn column. Note that
these tables are used for MOVE and MOVEA instructions.
subq.b #1, d0
The table to use is D-5 (Immediate Instruction
Execution Times). All of the instructions in this table
require a source operand which uses the immediate data
addressing mode. The three columns in the table specify
permissible destination operands. In this case, the
instruction specifies data register direct. At the
intersection of the SUBQ row and op #, Dn column, for a byte
size operation, the time given is 4 clock periods.
ext.w d0
This instruction found in table D-12 (Miscellaneous
Instruction Execution Times). Although there are two
subrows shown for the EXT row, the times for both are
identical. This instruction requires no source operand, and
the time is simply 4 clock periods.
dbra d0, fetch_character
The DBcc instruction is used to control loop exits.
Therefore, we are most often concerned with multiple
executions of the instruction and with a sum of execution
times. Also, the execution time of a single DBcc execution
depends on the state of the condition code register (CCR)
and the state of the loop counter when loop exit takes
place. Loop exit is forced when the DBcc condition code
becomes true or when the value in the counter becomes
negative.
Refer to table D-9 (Conditional Instruction Execution
Times). Note that the DBcc instruction is the only
instruction in the table for which the displacement between
the instruction and the destination does not affect the
execution time. Depending on the manual you are using, the
DBcc row may be divided into 2 or 3 subrows. Figure 5.4
shows the row divided into 3 subrows.
Figure 5.4. Subrows for the DBcc Instruction.
Displacement Branch Taken Branch Not Taken
cc true - 12
cc false, Count
Not Expired 10 -
cc false, Counter
Expired - 14
The information contained in the second and third rows
can be combined so that only one row need be used to express
it. In that case, the second row would be:
cc false 10 14
This makes sense because when cc is false the branch can be
taken only if the count has not expired, while it cannot be
taken if the count has expired.
Except for the DBT instruction, which never branches
and never decrements, for any condition specified in a DBcc
instruction (For DBRA = DBF, the condition is always
false.), a branch will be taken if the condition is true or
if the value in the counter is not negative, and the
execution time for the instruction will be 10 clock periods.
If the condition becomes true, a branch will not be taken,
and the execution time for the instruction will be 12 clock
periods, regardless of the value in the counter. If the
value in the counter becomes negative before the condition
becomes true, then the execution time for the instruction
will be 14 clock periods.
For a counter value n, the DBcc instruction will be
executed N times if exit from the loop takes place because
the condition becomes true and the sum of DBcc instruction
execution times will be (N)(10) + 12, where N is the number
of branches which actually took place, not the value stored
in the counter. The sum of execution times will be
(n)(10) + 14 if exit from the loop takes place because the
counter becomes negative.
For the instruction being used as an example, n is
equal to one less than the number of characters in the
string being copied. There are 11 characters, so n equals
10 because the value in the counter must be one less than
the number of times the loop is to be executed. The
condition for the DBRA instruction is never true, so exit
from the loop can only take place when the value in the
counter becomes negative. The sum of execution times for
the instruction is (10)(10) + 14 = 114 clock periods.
cmpi.b #$2E, (a3)
This instruction is found in table D-5 (Immediate
Instruction Execution Times). This table was discussed in
the section under subq.b #1, d0. The source operand must
use, and does use, the immediate data addressing mode.
Unlike that of the previously referenced instruction, the
destination operand of this one uses the address register
indirect addressing mode. And at the intersection of the
CMPI.B row and op #, M column, we find that the instruction
execution time of 8 clock periods is following by a + sign.
The + sign indicates a reference to table D-1 (Effective
Address Calculation Times). But what value is it that we
seek there? Just under the heading for table D-5 is the
statement that implies this information. The statement
tells us that the time shown at the intersection is that
which is required to fetch the immediate operand.
We can deduce that the time we seek is that for the
addressing mode of the destination operand. In table D-1,
at the (An) row/byte size operation intersection we find the
value 4, which means that we must add 4 clock periods to the
8 shown in table D-5. Thus the instruction execution time
is 12 clock periods.
bne.s copy_name
The Bcc instruction is listed in table D-9, the same
table which lists the DBcc instruction. The Bcc instruction
also has a Branch Taken and a Branch Not Taken column; and
like the DBcc instruction, the Bcc instruction's execution
time depends on the state of the CCR; but unlike the DBcc
instruction, it also depends on the size of the displacement
between the instruction and the branch destination.
For the instruction being discussed, the displacement
is short = byte size. For a byte size displacement the
execution time is 10 clock periods for a branch taken, 8
clock periods for a branch not taken. There are two
instructions within the SPEED_1 copy_name loop, each of
which require 12 clock periods per execution. The body of
the loop is executed 7 times, and the bne.s instruction is
executed 7 times. But the branch is taken only 6 times.
The sum of the Bcc instruction execution times will be 6(10)
+ 8 = 68 clock periods.
add.l d0, d5
Refer to table D-4, Standard Instruction Execution
Times. There are two subrows, labeled according to the size
of the operation. At the intersection of the Long subrow
and the op<ea>, Dn column, there is this notation:
6(1/0)+**. Referring to the notes under the table, we find
that the + means that we must fetch the address calculation
time for the source operand, and the ** means that the 6
must be increased to 8 if the addressing mode of the source
operand is register direct or immediate.
Well, the addressing mode of the source operand is
register direct, so the 6 becomes 8. Glancing back at table
D-1, we see that the address calculation time for the
register direct addressing mode is 0. Therefore, the
execution time for the instruction is simply 8 clock
periods.
asl #2, d5
The execution times for SHIFT and ROTATE instructions
are listed in table D-7. Using the formula shown at the
intersection of the ASL instruction's Long subrow and the
Register column, the calculated execution time for the
example is 8 + 2(2) = 12 clock periods. Here I have
replaced n with the immediate value of the source operand.
seq (a0)
Refer to table D-6, Single Operand Instruction
Execution Times. The Scc instruction row is divided into
two subrows labeled Byte, False and Byte, True. So we see
that the execution time depends on the state of the
condition code, which is eq in the example, if the
addressing mode of the operand is register direct.
For all other modes, the execution time is 8 clock
periods plus the address calculation time obtained from
table D-1. The example operand's addressing mode is address
register indirect, and in table D-1 the address calculation
time for that mode is 4 clock periods for a byte size
operation. The instruction execution time is 8 + 4 = 12
clock periods.
bset #5, (sp)
Table D-8 lists the execution times for the Bit
Manipulation instructions. For all of the instructions
listed in the table, the bit to be manipulated is specified
by the source operand; the location of the bit to be
manipulated is specified by the destination operand.
There are two major columns in this table: Dynamic and
Static. The Dynamic major column is used if the number of
the bit to be manipulated is specified with the contents of
a register; the Static major column is used if the number of
the bit to be manipulated is specified with immediate data,
such as shown in the example.
Each of the major columns is composed of two minor
columns. A Register minor column is used if the bit to be
manipulated resides in a register, a Memory minor column is
used if the bit to be manipulated resides in memory external
to the processor. The bit to be manipulated in the example
resides in a stack, which is memory external to the
processor.
An operation size indicator for any of the instructions
shown in this table would be redundant because the size of
the operation must be long if the bit to be manipulated
resides in a register and it must be byte otherwise. So at
the intersection of BSET's Byte subrow and Static-Memory
column we find the notation: 12(2/1)+. Fetching the address
calculation time for (sp) = (An) from table D-1, which is 4
for a byte size operation, and adding it to the 12, we
calculate the example instruction time as 16 clock periods.
This concludes my Instruction Execution Times Tutorial.
I have not dealt with table D-13, which lists the single
instruction MOVEP, because this instruction is a little
tricky. I will use this instruction in a later chapter, and
I hope to remember to discuss its execution time then.
Neither have I dealt with table D-14, which lists Exception
Processing Execution Times because they are so easily
derived. For example, the execution time for any trap #n
instruction is simply 34 clock periods.
Execution Speed Ratios
The execution speed ratios of figure 5.2 are obtained
from the results of one execution of CMD_TEST.TOS. On
subsequent executions the results for SPEED_1 were sometimes
835, and the results for SPEED_3 were sometimes 305. At
times, both differences appeared simultaneously. These
differences for multiple executions are to be expected
because the system variable _hz_200 (memory location $4BA)
is incremented only every 200hz, which means that the period
between increments is 1/200 = .005 second = 5 milliseconds.
This means that the variable measures time with a resolution
of 5 milliseconds (msec).
Unexpectedly, the time for SPEED_2 rarely varied. At
first, that made me wonder if I had made an error in its
algorithm as it is in CMD_TEST.TOS, but I have checked
extensively and found nothing wrong. However, I mention my
concern, just so you'll know, although it does not really
affect the decision concerning which algorithm to choose for
SPEEDTST.TTP.
Figure 5.2. CMD_TEST.TOS execution speed ratios. As you can
see, SPEED_2's command line processing algorithm is about
2.37 times faster than SPEED_1's, while SPEED_3's is about
2.77 times faster.
SPEED_1 830
------- = --- = 2.37
SPEED_2 350
SPEED_1 830
------- = --- = 2.77
SPEED_3 300
SPEED_2 350
------- = --- = 1.17
SPEED_3 300
The execution speed ratios shown in figure 5.3 are
obtained from the data in table 5.2. I have also checked
and rechecked this data many times, but I warn you not to
trust me, although I trust the data. Actually, the ratios
below agree very closely with the those of figure 5.2,
especially when one considers the 5 msec resolution of the
clock that is being used to measure execution time. In any
case, we are much more interested in relative execution
speeds than we are in absolute speeds.
Figure 5.3. Execution speed ratios calculated from
instruction execution timing information in the Motorola
manual.
SPEED_1 602
------- = --- = 2.41
SPEED_2 250
SPEED_1 602
------- = --- = 2.81
SPEED_3 214
SPEED_2 250
------- = --- = 1.17
SPEED_3 214
Putting the Pieces Together
The final algorithm is prepared by extracting the best
algorithms from the three models, and installing the
instructions implemented by custom trap #9. All of the
programs of the series, SPEED_1.TTP, SPEED_2.TTP and
SPEED_3.TTP, as well as programs PRG_5AP.TOS, CMD_TEST.TOS,
TRAPS.S and TRAP_9.S along with all of the execution results
are included in the documentation package for program 21.
In addition, program 21 contains some documentation that was
not previously disclosed.
Program 21. The final algorithm.
; Program Name: SPEEDTST.S
; Version: 1.006
; Assembly Instructions:
; Assemble in "PC-relative" mode and save with a TTP extension.
; Function:
; Spawn the TOS or PRG process typed on the command line. Create a disk
; file which is to be identified by the name of the spawned program with a
; DAT suffix. The disk file is to reside in the same directory as does the
; spawned process.
; Calculate the spawned program's load and execution times and store them
; in the file. If the spawned process directs output to the video screen via
; GEMDOS function $9, redirect that output to the file.
; Execution Instructions:
; SPEEDTST.TTP will not execute unless the custom traps in program
; TRAPS.PRG have previously been installed.
; Execute from the desktop. Type the name of an executable file which
; has a TOS or PRG extension on SPEEDTST.TTP's input parameter line. The
; name of the program you type on the parameter line must be in the same
; directory as is SPEEDTST.TTP. The program must terminate with GEMDOS
; function $4C, and, via that function, it must pass to SPEEDTST.TTP the
; word length portion of the value that was in memory location $4BA
; immediately after it was loaded.
; The longword value in $4BA can be obtained by invoking custom trap #3
; (get_time). SPEEDTST.TTP uses the word length portion of that value,
; which is returned in D0 by GEMDOS $4C, to calculate the spawned program's
; load and execution times.
; If the spawned program contains any instructions that cause it to pause,
; such as those that wait for a keypress or some other event, those should be
; commented out, and the program should be assembled especially for the speed
; test. Otherwise the execution time computed by SPEEDTEST.TTP will include
; the time that the spawned program was waiting for the event to occur.
; If custom trap #8 is used to terminate the spawned program, the trap
; will execute a wait_for_keypress algorithm when the program is executed from
; the desktop, but it will omit the wait algorithm when the program is spawned
; by SPEEDTST.TTP. In addition, trap #8 will return the after-load value to
; SPEEDTST.TTP and terminate the spawned program with GEMDOS function $4C.
; Both trap #8 and SPEEDTST.TTP require that the spawned program be
; initialized with custom trap #6 or a similar algorithm. See TRAPS.S for
; details about custom traps #6 and #8.
release_excess_memory:
lea -$82(pc), a3 ; Put "command line" address in A3.
lea -$80(a3), a1 ; Put "basepage" address in A1.
lea program_end, a0 ; Put "end of program" address in A0.
trap #6 ; Calculate program size and release memory.
; NOTE: A local stack is not declared in PRG_5AP.TOS. Because of the long
; string that is printed by that program, this program will bomb when
; it spawns PRG_5AP.TOS, if a local stack is not declared here.
lea stack, a7 ; Point A7 to this program's stack.
process_command_line:
lea command_line, a4 ; Fetch location to contain command line.
movem.l (a3), d0-d3 ; Move 16 bytes of command line to 4 registers.
movem.l d0-d3, (a4) ; Move them to address "command_line".
move.b (a3)+, d0 ; Fetch command line ASCII character count.
ext.w d0 ; Extend to word for next instruction.
move.b #0, 1(a4,d0.w) ; Store a null at end of string.
lea program_name, a0 ; Fetch address of pointer to command line.
move.l a3, (a0) ; Store address of command line string at
; pointer.
move.b #0, 0(a3,d0.w) ; Replace $0D at end of command line input
; in basepage with a NULL.
insert_filename_suffix:
move.b #$44, -2(a4,d0.w) ; Insert letter 'D'.
move.b #$41, -1(a4,d0.w) ; Insert letter 'A'.
move.b #$54, 0(a4,d0.w) ; Insert letter 'T'.
create_file:
move.w #0, -(sp) ; File attribute = read/write.
pea filename ; Will be name of spawned process + .DAT.
move.w #$3C, -(sp) ; Function = f_create = GEMDOS $3C.
trap #1 ; File handle is returned in D0.
addq.l #8, sp
lea file_handle, a0 ; Store returned file handle.
move.w d0, (a0)
redirect_output: ; Exchange file handle with screen's handle.
move.w file_handle, -(sp) ; This is the disk file's handle.
move.w #1, -(sp) ; This is the video screen's handle.
move.w #$46, -(sp) ; Function = f_force = GEMDOS $46.
trap #1
addq.l #6, sp
prepare_stack_for_load_and_execute_program:
pea environ_string
pea command_string
pea (a3) ; Push address of program name string.
move.w #0, -(sp)
move.w #$4B, -(sp) ; Function = GEMDOS $4B = p_exec.
get_start_time:
lea start_time, a3 ; Fetch address of variable "start_time".
trap #3 ; Returns value of system clock in D0.
move.w d0, (a3) ; Save start time.
load_and_execute_program:
trap #1
move.w d0, d3 ; Copy after-load value to D3 for calculation.
get_end_time:
trap #3 ; Returns value of system clock in D0.
move.w d0, d5 ; Copy to D5 for calculation.
sub.w d3, d5 ; Subtract after-load time from end time.
ext.l d5 ; Extend to 32 bits.
reposition_stack_pointer:
lea $10(sp), sp
get_drive:
move.w #$19, -(sp) ; Function = dgetdrv = GEMDOS $19.
trap #1 ; Returns 0 for drive A, 1 for B, etc.
addq.l #2, sp
add.b #$41, d0 ; Add ASCII value for A to compute ASCII
lea drive, a0 ; letter code for the drive value returned.
move.b d0, (a0) ; Save drives ASCII leter code.
print_heading:
lea heading, a0
bsr print_string
lea program_name, a0 ; Fetch address of program name string.
movea.l (a0), a0
bsr print_string
print_drive_for_spawned_program:
lea drive_msg, a0
bsr print_string
compute_load_time:
lea load_time_msg, a0
bsr print_string
lea start_time, a3
sub.w (a3), d3 ; Subtract start time from after-load time.
ext.l d3 ; Extent to 32 bits.
multiply_by_five: ; Convert to milliseconds.
move.l d3, d0 ; Save a copy to add.
asl.l #2, d3 ; Shift to multiply by 4.
add.l d0, d3 ; To complete multiplication by 5.
print_load_time:
cmpi.l #999, d3 ; If load time is less than 1000, then
bgt no_space ; print a leading blank space for output
lea space, a0 ; alignment.
bsr print_string
cmpi.l #99, d3 ; If load time is less than 100, then
bgt no_space ; print another leading blank space.
lea space, a0
bsr print_string
no_space:
move.l d3, d1 ; Copy load time to D1 for decimal conversion.
trap #4 ; Returns address of decimal string in A0.
bsr.s print_string
lea units_label, a0
bsr.s print_string
compute_execution_time: ; D5 already contains the execution time.
lea execute_time_msg, a0; Here, it must only be multiplied by 5 to
bsr.s print_string ; be converted to milliseconds.
move.l d5, d0 ; Save a copy to add.
asl.l #2, d5 ; Shift to multiply by 4.
add.l d0, d5 ; To complete multiplication by 5.
print_execution_time:
cmpi.l #999, d5 ; If execute time is less than 1000, then
bgt _no_space ; print a leading blank space for output
lea space, a0 ; alignment.
bsr print_string
cmpi.l #99, d5 ; If execute time is less than 100, then
bgt _no_space ; print another leading blank space.
lea space, a0
bsr print_string
_no_space:
move.l d5, d1 ; Copy execute time for decimal conversion.
trap #4 ; Returns address of decimal string in A0.
bsr.s print_string
lea units_label, a0
bsr.s print_string
close_file:
move.w file_handle, -(sp)
move.w #$3E, -(sp) ; Function = fclose = GEMDOS $3E.
trap #1
addq.l #4, sp
terminate:
move.w #0, -(sp)
trap #1
print_string: ; Expects address of string to be in A0.
pea (a0) ; Push address of string onto stack.
move.w #9, -(sp) ; Function = c_conws = GEMDOS $9.
trap #1 ; GEMDOS call
addq.l #6, sp ; Reset stack pointer to top of stack.
rts
data
space: dc.b " ",0
heading: dc.b $D,$A,"SPEEDTST.TTP Execution Results",$D,$A
dc.b "for ",0
drive_msg: dc.b ", loaded from drive: "
drive: dc.b "A",$D,$A,0
load_time_msg: dc.b $D,$A," Load time: ",0
execute_time_msg: dc.b " Execution time: ",0
units_label: dc.b " milliseconds",$D,$A,0
environ_string: dc.b "TERM",0
command_string: dc.b 0
align
bss
start_time: ds.w 1 ; Value in $4BA just before spawning.
program_name: ds.l 1 ; Pointer to name in basepage command line.
file_handle: ds.w 1 ; Handle for the filename below.
command_line: ds.b 1 ; Unused character count will go here.
filename: ds.b 15 ; File name for redirected output.
ds.l 96 ; Program stack.
stack: ds.l 0 ; Address of program stack.
program_end: ds.l 0
end
SPEEDTST.TTP Execution Results
PRG_5AP.TOS Execution Results
When executed from the desktop, this program will print this string on the
video screen and pause for a keypress. But, when this program is spawned by
SPEED_1, SPEED_2, SPEED_3 or SPEEDTST, the string will be stored in a file
named PRG_5AP.DAT and the program will not pause for a keypress.
SPEEDTST.TTP Execution Results
for PRG_5AP.TOS, loaded from drive: G
Load time: 40 milliseconds
Execution time: 685 milliseconds
The Second Utility
I conclude this chapter with a utility that spawns a
program and create a file for redirected output, but which
does not measure load and execution times. This program is
used when I want to save the output from a program in a disk
file for documentation, leisurely viewing or for comparison
with the output of one or more other programs. Program 22
is simply a subset of program 21.
Program 22. A program that simply spawns a process and saves
its redirected output in a disk file.
; Program Name: SPAWN.S
; Version 1.003
; Assembly Instructions:
; Assemble in "PC-relative" mode and save with a TTP extension.
; Program Function:
; Spawn the TOS or PRG process typed on the command line. Create a disk
; file which is to be identified by the name of the spawned program with a
; DAT suffix. The disk file is to reside in the same directory as does the
; spawned process.
; If the program to be executed has any halt or wait instructions, such
; as wait for a keypress, etc., you must remember that execution of the
; spawned process will not terminate until those conditions are satisfied.
release_excess_memory:
lea -$82(pc), a3 ; Put "command line" address in A3.
lea -$80(a3), a1 ; Put "basepage" address in A1.
lea program_end, a0 ; Put "end of program" address in A0.
trap #6 ; Calculate program size and release memory.
lea stack, a7 ; Point A7 to this program's stack.
process_command_line_parameters:
lea command_line, a4 ; Fetch location to contain command line.
movem.l (a3), d0-d3 ; Move 16 bytes of command line to 4 registers.
movem.l d0-d3, (a4) ; Move them to address "command_line".
move.b (a3)+, d0 ; Fetch command line ASCII character count.
ext.w d0 ; Extend to word for next instruction.
move.b #0, 1(a4,d0.w) ; Store a null at end of string.
lea program_name, a0 ; Fetch address of pointer to command line.
move.l a3, (a0) ; Store address of command line string at
; pointer.
move.b #0, 0(a3,d0.w) ; Replace $0D at end of command line input
; in basepage with a NULL.
insert_filename_suffix:
move.b #$44, -2(a4,d0.w) ; Insert letter 'D'.
move.b #$41, -1(a4,d0.w) ; Insert letter 'A'.
move.b #$54, 0(a4,d0.w) ; Insert letter 'T'.
create_file:
move.w #0, -(sp) ; File attribute = read/write.
pea filename ; Will be name of spawned process + .DAT.
move.w #$3C, -(sp) ; Function = f_create = GEMDOS $3C.
trap #1 ; File handle is returned in D0.
addq.l #8, sp
lea file_handle, a0 ; Store returned file handle to be used when
move.w d0, (a0) ; the file is closed later.
redirect_output: ; Exchange file handle with screen's handle.
move.w d0, -(sp) ; This is the disk file's handle.
move.w #1, -(sp) ; This is the video screen's handle.
move.w #$46, -(sp) ; Function = f_force = GEMDOS $46.
trap #1
addq.l #6, sp
load_and_execute_program:
pea environ_string
pea command_string
pea (a3) ; A3 contains address of program name string.
move.w #0, -(sp) ; Load and Go option.
move.w #$4B, -(sp) ; Function = GEMDOS $4B = p_exec.
trap #1
lea $10(a7), sp ; Reposition stack pointer.
close_file:
move.w file_handle, -(sp)
move.w #$3E, -(sp) ; Function = GEMDOS $3E = f_close.
trap #1
addq.l #4, sp
terminate:
move.w #0, -(sp)
trap #1
data
environ_string: dc.b "TERM",0
command_string: dc.b 0
align
bss
file_handle: ds.w 1 ; Handle for the disk file named below.
command_line: ds.b 1 ; Unused character count will go here.
filename: ds.b 15 ; File name for redirected output.
program_name: ds.l 1 ; Pointer to name in basepage command line.
ds.l 96 ; Program stack.
stack: ds.l 0 ; Address of program stack.
program_end: ds.l 0
end
Execution results for PRG_5AP.TOS as a spawned process.
PRG_5AP.TOS Execution Results
When executed from the desktop, this program will print this string on the
video screen and pause for a keypress. But, when this program is spawned by
SPEED_1, SPEED_2, SPEED_3 or SPEEDTST, the string will be stored in a file
named PRG_5AP.DAT and the program will not pause for a keypress.
Conclusion
Performance testing and utilities with which such
testing may be accomplished has been the subject of this
chapter. But the material in this chapter represents only a
beginning. Software testing as a subject is complicated
enough, but implementing such testing is a horrendous task.
At this point, I have provided you with a few simple
tools and a tutorial which should assist you in calculating
instruction execution times. I have said that single-
stepping through a program with AssemPro's debugger is one
method I use to verify a program's performance. The
debugger permits one to view registers and memory locations
while tracing through a program in this manner.
For short, uncomplicated programs, if you are able to
keep your wits sharp while doing so, this is a viable method
of verification. But many programs cannot be tested within
the debugger. Furthermore, it is virtually impossible to
keep track of register and memory activity for larger
programs. Therefore, programs which do this automatically
will be introduced in a later chapter.
For now, it is time to take advantage of the two
utilities introduced here to investigate the questions
raised by material in earlier chapters. I do this in
chapter 6. There I will compare programs assembled in each
of three assembly modes, and I will compare the performance
of certain instructions, so that you can see early on why I
choose to use them in future programs.