home *** CD-ROM | disk | FTP | other *** search
Text File | 1993-10-23 | 122.1 KB | 2,996 lines |
- Atari ST Machine Specific Programming In Assembly
-
- Chapter 7: An Adaptive Algorithm
-
-
- Algorithmic Intelligence
-
- By saying to you now that I consider every item of
- software to be intelligent to at least some degree, I
- restrict the range of my future comments concerning any
- comparisons between the levels of intelligence of items of
- software to the extent that I will not be able to refer to
- one item as intelligent and another as being devoid of
- intelligence. My comments would not be as restricted if
- there were a system of intelligent quotients for software,
- because then I would be able to assign an IQ to each item
- under discussion in order to specify its relative level of
- intelligence.
- I am not aware of the existence of such a system and I
- have no desire to develop one myself, at least not at this
- time. Therefore, in order to differentiate between levels
- of software intelligence, I must use descriptive phrases
- such as smarter than or more intelligent than. I place
- these limits on the terminology I permit myself to use in
- this chapter because I want to avoid extensive,
- nonproductive philosophical discussions of relative levels
- of intelligence.
- Most of the algorithms that I have discussed so far
- have been designed to perform their functions in an
- unsophisticated fashion, without much regard for external
- stimuli. Notable exceptions are the custom traps which
- adjust their output based on the number of digits or the
- number of leading zeroes in a passed parameter and trap #8,
- which executes a wait for keypress algorithm only if the
- invoking process was not spawned by another. To the degree
- that response to external stimuli is a measure of
- intelligence, those custom traps which adapt their output
- according to input are more intelligent than those which
- simply alter a memory location, such as does trap #0, or
- return a value, such as does trap #3.
- Much as I have declared custom traps which alter their
- output in response to their input to be the more intelligent
- in the paragraph above, I shall simply declare that an
- algorithm which can alter itself in response to external
- stimuli to be more intelligent than one which cannot do so.
- And, in order to reduce the length of descriptive phrases, I
- shall refer to such algorithms as adaptive algorithms. In
- fact, I shall simply declare that the term adaptive applies
- to any algorithm which can alter itself in response to any
- stimuli, whether it be externally or internally generated.
- If you have read the suggested material in Christopher
- Evans' The Micro Millennium, I believe that you will grasp
- my line of reasoning and deduce that I am laying the
- foundation for a discussion of an item of software to which
- Mr. Evans' six major constituents of intelligence may be
- applied. I am not concerned about adhering religiously to
- his definitions, nor do I wish to confirm or refute his
- conclusions. My only concern is that you understand the
- discussions in this chapter, and I feel that I could not
- have found a better reference than his for your perusal.
- I am very aware of the hundreds, perhaps even
- thousands, of books which have been jammed on bookshelves in
- recent years simply because a number of people with nothing
- to say realized that money could be had if it was said in a
- book bearing the words Artificial Intelligence. And if one
- of these books does actually contain useful information, I
- shall probably never know of it because I have vowed not to
- touch a book bearing those words on its front cover. In
- fact, I would consider myself honored if you would now ink
- over those words in the sentence above and pretend that I
- never wrote them.
- I have recommended Mr. Evans' book because it is not at
- all like those others that are filled with buzz word
- gibberish copied from articles or other books filled with
- identical gibberish. I have enjoyed the Evans book
- immensely, and I have just finished reading it for the third
- time, so as to have it freshly in mind as I write this
- chapter. If you disagree with my choice of references, I
- can live with that, but I would be negligent if I did not
- suggest to you the material which has had the greatest
- influence on the material which I present.
-
- Onanism Corruptus: The Sin of Self-Modification
-
- At this time, there is a notion that I would like you
- to store someplace in a corner of your mind, and, from time
- to time, as I spend my energy in conveying to you the
- concept and the reality of a software tool that I believe is
- as powerful as the concept and reality of the computer
- itself, I would like you to recall that notion and consider
- the limiting effect that would be placed on software utility
- were it to be universally enforced. This notion is
- expressed most succinctly on page 92 of 680x0 Programming by
- Example, written by Stan Kelly-Bootle, under the heading
- Logic Behind Rule C. The precise sentence which I would
- like you to remember, if for no other reason than that it
- alone justifies the existence of this entire chapter, is
- that which follows:
-
- "Motorola discourages self-modifying code, so they
- make it difficult, but not impossible, for a program
- to write over itself."
-
- I have chosen to use Mr. Kelly-Bootle's remarks, not because
- I consider his to be any more naive than those of other
- authors concerning the same subject. Rather, I have chosen
- his book to be a very reputable source; to what end would I
- choose a non-reputable source?
- I cannot dispute the sincerity of Mr. Kelly-Bootle's
- remarks, nor can I verify or refute his claim of Motorola's
- concerns and intentions. But I most certainly can say that
- the notion of a microprocessor manufacturer deliberately
- hindering the processing power of a microprocessor leaves me
- breathless. OK, that's a little strong. Actually, the
- vehemency expressed in that sentence is somewhat pretended,
- and is merely a ploy which permits me to ventilate my
- opinion about the notion while indulging in a little private
- sick humor.
- But there is nothing sick about the concept of self-
- modifying code, and there is nothing humorous about the idea
- of making it difficult. Because it is this tool alone which
- permits the intelligence level of software to rise above
- mediocrity. And say what you will about the power of the
- human brain, in some areas in is overwhelmingly bested by
- even mediocre software; and without the ability to self-
- modify its programs, the human brain would be about as
- intelligent as a computer program with the same limitation.
- Unlike the sin of self-gratification, the sin of self-
- modification will not cause blindness; however, it can lead
- to coitus interruptus. This condition becomes evident when
- a row of bombs appear across the video screen. It means, of
- course, that your self-modifying program has screwed things
- up enough to cause the generation of a software interrupt.
- I think that it is safe for me to say that the more
- intelligence you insert into your algorithms, the more
- likely they are to acquire the attribute of
- unpredictability. So, if you permit them to engage in the
- perverse activity of self-modification, you had better be
- prepared to protect yourself from the consequences if they
- decide to run amuck. Of course, your first defensive action
- involves quality control. You can maintain rigid control by
- thoroughly understanding each instruction used in an
- adaptive algorithm and by being able to precisely predict
- the effects upon the entire computer system as each is
- executed.
-
- Scope of Chapter 7 and Relevant Considerations
-
- The algorithm that is the focus of this chapter is a
- custom trap that reconstructs itself in a disciplined manner
- to form an execution loop about an algorithm under test
- (AUT). The address of the AUT is passed to the custom trap
- by an invoking program. The trap copies the AUT to a area
- of memory that is reserved within the boundaries of the
- trap, constructs the execution loop, implements the loop and
- generates appropriate performance data concerning the AUT.
- According to the definition previous established, the
- algorithm to be developed in this chapter is an adaptive
- algorithm.
- As far as I know, there are two ways in which an
- adaptive algorithm can modify itself: it can generate
- executable code within its execution space or it can copy
- executable code from a designated location into that space.
- Of course, there are no restrictions concerning combinations
- of the two methods of adaptation. In this discussion, I do
- not consider the simple alteration of declared variables to
- be adaptive, although I can understand why some would label
- this as self-modifying. Neither do I consider the
- alteration of an algorithm's code by an outside agent to be
- adaptive. When I say that an algorithm is adaptive, I mean
- that its structure permits it to modify its structure.
- While explaining the design and operation of the
- adaptive algorithm, I will mention all of the considerations
- that apply to the programming environment that I have
- adopted; that is the AssemPro environment. These may not
- apply when other environments are used. Furthermore, it
- should be understood that I will mention the considerations
- of which I am aware; I can't promise to be infallible.
- As you know, AssemPro provides more than one assembly
- option. I have shown you some of the effects on produced
- code by each. As you shall see, the mode used to assemble
- the adaptive algorithm partly determines its structure. In
- addition, there is the mode of assembly used to generate the
- AUT's code to consider. The results generated by the
- adaptive algorithm are determined to some extent by the
- assembly mode used on both the adaptive algorithm and the
- algorithm under test.
-
- Initial Structure of the Adaptive Algorithm
-
- Figure 7.1A is a representation of the structure of the
- adaptive algorithm as it exists in memory before it has ever
- been invoked. The structure is defined by four distinct
- sections: an initialization section; a fixed section which
- installs the AUT, replicates the fourth section and
- initiates execution of the loop; a block of memory in which
- the AUT and the replicated fourth section are to reside; and
- the section which contains the loop branch instruction, code
- to generate the performance report and a data section.
-
- Figure 7.1A. Structure of the adaptive algorithm prior to
- initial invocation.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Figure 7.1B illustrates the structure of the program
- that contains the algorithm to be tested. The hands in this
- figure, and those in figure 7.1C, are there to represent the
- movement of the algorithm under test from the invoking
- program to the adaptive algorithm. These pictorials are
- actually perverse demonstrations of what does not happen. I
- wish to graphically impress you with the fact that something
- much more powerful than mere movement of data is involved in
- this and in similar data transactions. The code which forms
- the algorithm under test is not moved; it is copied. I feel
- that the implications of this fact is deluded by the
- instruction used to perform the transaction. The situation
- would be expressed much more clearly if the mnemonic for the
- instruction were copy instead of move. Nevertheless, I too
- use the word move when I mean copy because that is the
- terminology I was taught. As long as one remembers that
- this transaction we call move is non-destructive, and that
- the copied data remains intact and can be cloned as often as
- desired, either word can be used to indicated the
- transaction which actually takes place.
-
- Figure 7.1B. Structure of the adaptive algorithm invoking
- program.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- The thoughts concerning the copying of data, versus the
- movement of data, expressed about figure 7.1B also apply to
- figure 7.1C. There you see the AUT being moved into the
- space reserved within the adaptive algorithm. The section
- of reserved space under the block pictorial of the AUT
- implies that the AUT is small enough so that it never fills
- the reserved space. Of course, the length of the reserved
- space is programmable within the adaptive algorithm, and, in
- fact, the length of the reserved space could be set
- dynamically at run-time.
-
- Figure 7.1C. Algorithm under test being installed within the
- reserved space of the adaptive algorithm.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- After the AUT has been copied into the reserved space,
- the remaining instructions which form the adaptive algorithm
- and its data must be moved up into the reserved space,
- adjacent to the AUT. The reason for this will be clearer
- when you view the source and the disassembly listing, but,
- briefly, this is done so that the loop branching instruction
- and attendant data is placed appropriately within the newly
- defined adaptive algorithm structure. Figure 7.1D
- illustrates the appearance of a portion of the new
- structure. An important section has been omitted from this
- drawing.
-
- Figure 7.1D. Closing the gap between the algorithm under
- test and the fourth section to form the executable loop and
- attendant code.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Figure 7.1E is a pictorial of the adaptive algorithm
- structure after its initial invocation. There you see the
- section that was omitted from figure 7.1D. Remember that
- the Movable Loop Section and Data Section is not moved; it
- is copied; therefore, the original section of code remains
- in place, until it is removed deliberately or until the
- adaptive algorithm is removed from memory. There is no
- reason that instructions to remove that section could not be
- included in the adaptive algorithm. I did not include such
- instructions in this algorithm because the movable sections
- are replicated during each invocation of the trap;
- therefore, they must remain intact. The structure shown in
- figure 7.1E is valid for all future invocations of the
- adaptive algorithm. The initially installed AUT and
- attendant code is simply overwritten by future invocations.
-
- Figure 7.1E. Structure of the adaptive algorithm after the
- initial invocation and during all subsequent invocations.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- I have designed the adaptive algorithm as a custom trap
- so that it need simply be invoked by a program containing an
- algorithm for which the execution speed and requisite memory
- is desired. When I began to design this trap, I realized
- that I could not anticipate all possible ramifications of
- AUT's assembled in the various assembly modes that I have
- discussed, nor could I apriorily judge the most suitable
- mode of assembly for the adaptive algorithm. Therefore, I
- prepared two programs: one designed for PC-relative
- assembly, the other for Relocatable assembly. Program 34 is
- a listing of the adaptive algorithm designed for PC-relative
- assembly. The program installs the adaptive algorithm as
- custom trap #9. A routine with the same trap number was
- used during the development of the SPEEDTST utility, but its
- usefulness in that function has expired, so it would be best
- not to have the older trap #9 routine in a position that
- might cause confusion about which routine is installed.
-
- Program 34. The adaptive algorithm designed as a custom
- trap. This particular program has been designed for
- assembly in the PC-relative mode.
-
- ; Program Name: TRAP_9P.S
- ; Version 1.009
-
- ; Assembly Instructions:
-
- ; Assemble in PC-relative mode and save with a PRG extension. TRAP_9R.S
- ; is a version of this program that must be assembled in Relocatable mode.
- ; I have prepared two versions because I can't predict the precise requirements
- ; of all possible algorithms which might invoke the custom trap that is
- ; installed by this program.
-
- ; Program Function:
-
- ; This is a LSR program that establishes a user defined trap #9. It may
- ; be executed from the desktop, but you may prefer to copy it to the AUTO
- ; folder of your boot partition or floppy disk so that it will execute
- ; automatically during boot.
-
- program_start: ; Calculate program size and retain result.
- lea program_end, a3 ; Fetch program end address.
- suba.l 4(a7), a3 ; Subtract basepage address.
-
- enter_supervisor_mode:
- move.l #0, -(sp) ; The zero turns on supervisor mode.
- move.w #$20, -(sp) ; Function = super = GEMDOS $20.
- trap #1 ; Supervisor stack pointer (SSP) returned in D0.
- addq.l #6, sp
- movea.l d0, a5 ; Save SSP in scratch register.
-
- install_trap_9_routine: ; Note: pointer = vector = pointer.
- lea trap_9_routine, a0 ; Fetch address of trap #9 routine.
- move.l a0, $A4 ; Store trap address at pointer address.
-
- enter_user_mode:
- pea (a5) ; Restore supervisor stack pointer.
- move.w #$20, -(sp) ; Function = super = GEMDOS $20.
- trap #1
- addq.l #6, sp
-
- relinquish_processor_control: ; Maintain memory residency.
- move.w #0, -(sp) ; Exit code.
- move.l a3, -(sp) ; Program size.
- move.w #$31, -(sp) ; Function = ptermres = GEMDOS $31.
- trap #1
-
- trap_9_routine:
-
- ; NOTE: You may want to execute a program that invokes this trap from the
- ; AssemPro debugger and single step through the trap instructions, or
- ; you many want to set breakpoints and use the "Run program" button to
- ; execute to a breakpoint. It the exercise of this latter option that
- ; can cause problems. If you want to set a breakpoint at an instruction
- ; that occurs before the "reserved space", no precautions need be taken,
- ; except that you must realize that you could halt at a position after
- ; the start time has been initialized, and the execution time will be
- ; accumulating during the halt.
-
- ; But, if you want to set a breakpoint at at an instruction that occurs
- ; after the "reserved space", you should not do so until the instructions
- ; that perform the bulk move have been executed. If you set a breakpoint
- ; at an instruction in the program space which follows the "reserved
- ; space" and then execute the bulk move instructions, the breakpoint you
- ; have set will cause an illegal command error.
-
- ; This custom trap must be invoked by a program which contains one or more
- ; algorithms for which performance data, in the form of execution time and
- ; requisite memory, is desired. The trap must be invoked once for each
- ; algorithm under test (AUT). The trap is intended to be a performance
- ; report generator so that performance data for specific algorithms can be
- ; compared.
-
- ; If the program invoking the trap is spawned by SPAWN.TTP, the performance
- ; data will be printed to a file that is created by SPAWN.TTP. The name of
- ; the file will be the name of the spawned program with a DAT suffix.
-
- ; If the program invoking the trap is executed from the desktop, the
- ; performance data will be printed to the screen.
-
- ; The AUT can be assembled in PC-relative or Relocatable mode.
-
- ; The custom trap expects the following parameters to be contained in the
- ; specified registers prior to invocation:
-
- ; A0 = The address of a null terminated header to precede the reported
- ; AUT execution time.
-
- ; A1 = The address of a null terminated header to precede the reported
- ; AUT requisite memory.
-
- ; A3 = The starting address of an algorithm containing one or more instructions.
- ; This algorithm becomes the AUT upon invocation of the trap.
-
- ; A4 = The ending address of the AUT.
-
- ; A5 = At the initial invocation by a specific program, A5 is expected to
- ; contain the address of a null terminated heading that is a brief
- ; description of the data that is to be reported by the trap. During
- ; all subsequent invocations by that program, A5 must contain the
- ; address of a null string.
-
- ; D7 = A word length decimal value which specifies the number of times the
- ; the AUT is to be executed in a loop in order to generate the execution
- ; time data. Register D7 is the only register which cannot be used by
- ; the AUT. That's because D7 is used as the AUT execution loop counter.
-
- ; The custom trap uses the information passed in A3, A4 and D7 to construct
- ; a loop around the AUT. The loop is executed the number of times specified
- ; in register D7. The time required to execute the loop is calculated and
- ; printed along with the AUT's requisite memory.
-
- ; How Trap #9 Works:
-
- ; The custom trap is an intelligent, self-modifying algorithm. Prior
- ; to an initial invocation, the trap #9 routine is composed of three sections.
- ; The first section contains instructions which are permanently located. The
- ; second section is a 1,024 byte block of reserved space into which the AUT
- ; and the third section is to be copied. The third section contains the custom
- ; trap instructions that must be copied so that the first instruction in the
- ; third section immediately follows the last instruction of the algorithm
- ; under test. The size that I chose for the reserved space was arbitrary.
-
- ; After an invocation, the custom trap calculates the number of bytes in
- ; the algorithm under test. Then it initializes register D7 for the AUT's
- ; execution loop. After saving the addresses of the headers that were passed
- ; as parameters so that the registers in which they were passed can be used
- ; by trap #9 and/or the algorithm under test, the trap prints the AUT's
- ; primary heading and the execution time header. Then it copies the AUT into
- ; the reserved space.
-
- ; After the AUT has been copied into the reserved space, the instructions
- ; and data in the third section of the custom trap are moved into immediate
- ; proximity of the AUT so that the first instruction of section three follows
- ; immediately after the last instruction of the algorithm under test.
-
- ; The AUT is executed the number of times requested via the value passed
- ; in D7, then the execution time and requisite memory of the AUT are printed.
-
- ; After the custom trap has been invoked once, the reserved space will
- ; not be completely clear during the remaining portion of the power-up cycle.
- ; When the trap is subsequently invoked, the instructions that are already in
- ; the reserved space will simply be overwritten by the instructions of the
- ; new algorithm under test and those of the third section.
-
- calculate_length_of_AUT:
- suba.l a3, a4 ; Subtract start address from end address.
-
- save_heading_addresses:
- lea time_header, a2 ; Header to precede reported execution time.
- move.l a0, (a2)
- lea memory_header, a2 ; Header to precede reported requisite memory.
- move.l a1, (a2)
-
- transfer_stack_pointer:
-
- ; Upon entrance into this custom trap, A7 contains the address of the system
- ; stack pointer. Because the system stack may be too short for use by the AUT,
- ; the invoking program must provide a user stack, and, here, I am going to
- ; store the address of that stack in A7. Before returning to the invoking
- ; program, the system stack pointer must be restored so that the return address
- ; can be found by the processor.
-
- lea _SSP, a0 ; Fetch address of declared variable.
- move.l a7, (a0) ; Save system stack pointer.
- move USP, a7 ; Transfer to invoking program's stack.
-
- print_heading:
-
- ; NOTE: When two or more algorithms are to invoke the trap from a program,
- ; the heading should be printed only once, at the beginning of the
- ; report; therefore, the string address in A5 should point to a
- ; null terminated string during the first invocation of the trap, but
- ; it should point to a null string on subsequent invocations from the
- ; same program.
-
- ; For that reason, the test for NULL string is included below, so that
- ; only one attempt to print the heading will be made.
-
- tst.b (a5) ; NULL string test.
- beq.s print_time_header ; Branch if null.
- movea.l a5, a0 ; Print heading.
- bsr print_string
-
- print_time_header:
- movea.l time_header, a0 ; Fetch address of time header.
- bsr print_string
-
- build_algorithm:
- lea reserved_space, a6 ; Fetch address of reserved space.
- move.w a4, d3 ; AUT length initializes counter for copy.
- lea memory, a0 ; Store AUT's requisite memory.
- move.w d3, (a0) ; AUT length stored for report.
- subq.w #1, d3 ; Initialize counter for loop execution.
- move_loop: ; Move AUT, byte by byte, to the area
- move.b (a3)+, (a6)+ ; declared as "reserved_space".
- dbra d3, move_loop ; Branch until D3 = -1.
-
- ; A6 is now pointing to the location at which the AUT execution loop's DBRA
- ; instruction must be copied. The DBRA copy must be processed apart from the
- ; other instructions in the custom trap which must be copied to "close up" the
- ; gap between the last statement of the algorithm under test and the rest
- ; of the custom trap's instructions.
-
- lea algorithm_end, a0 ; Calculate number of bytes in this program
- lea program_end, a1 ; which must be copied into "reserved_space",
- suba.l a0, a1 ; then initialize counter D3 with the number
- move.w a1, d3 ; of bytes to copy.
- subq.w #4, d3 ; Subtract the DBRA requisite memory.
- movea.l a0, a1 ; Calculate amount the DBRA instruction must
- suba.l a6, a1 ; be moved and save the new location in scratch
- movea.l a6, a5 ; register for displacement correction.
- move.w 2(a0), d0 ; Fetch current DBRA displacement value.
- move.l (a0)+, (a6)+ ; Move the DBRA instruction.
- add.w a1, d0 ; Correct DBRA displacement to the value it
- move.w d0, 2(a5) ; should be after the move.
- subq.w #1, d3 ; Initialize counter for bulk copy.
- _move_loop: ; Move rest of algorithm in a bulk copy.
- move.b (a0)+, (a6)+
- dbra d3, _move_loop
- subq.w #1, d7 ; Initialize D7 for AUT loop execution.
- lea start_time, a0 ; Fetch address of declared variable.
- suba.l a1, a0 ; Correct address to after-move location.
-
- ; NOTE: Since processor is in supervisor mode, there is no reason to use
- ; trap #3 to fetch time. The 200hz vector can be referenced directly.
-
- move.l $4BA, (a0) ; Fetch and store start time.
-
- algorithm_start: ; new location.
- reserved_space: ds.b 1024 ; Space for loop construction = 1024 bytes.
- algorithm_end:
- dbra d7, algorithm_start
- move.l $4BA, d0 ; Fetch end time.
- sub.l start_time, d0 ; Subtract start time from end time.
- trap #10 ; Convert to decimal milliseconds and print.
- print_memory_header:
- movea.l memory_header, a0 ; Fetch address of memory header.
- bsr.s print_string
- print_requisite_memory:
- move.w memory, d1 ; Fetch AUT's requisite memory.
- trap #4 ; Returns address of decimal string in A0.
- bsr.s print_string
- lea header_1, a0 ; Print requisite memory units label.
- bsr.s print_string
-
- ; NOTE: I had thought that I might have to restore the user stack pointer
- ; before returning to the calling program because I can't know to what
- ; extent the AUT has corrupted the user stack.
-
- ; But, as it turns out, since the value in USP is stored in register
- ; A7 at the beginning of the program, and since the system stack pointer
- ; is then active and remains so for the duration of the trap invocation,
- ; only the value in register A7 (which is the system stack pointer, but
- ; which points to the user stack) is altered; the value in USP remains
- ; uncorrupted, therefore, nothing needs be restored but the system stack
- ; pointer (so that it again points to the system stack).
-
- ; On subsequent invocations the value in USP, which remains stable, will
- ; simply be restored in A7, as often as the trap is invoked.
-
- movea.l _SSP, a7 ; Restore system stack pointer so that it
- rte ; points to the system stack.
-
- ;
- ; SUBROUTINES
- ;
-
- print_string:
- pea (a0)
- move.w #9, -(sp)
- trap #1
- addq.l #6, sp
- rts
-
- data
- header_1: dc.b ' bytes',$D,$A,0
- bss
- align
- _SSP: ds.l 1 ; Address of the system stack will be stored here.
- start_time: ds.l 1 ; Variable for time at start of AUT processing.
- memory: ds.w 1 ; Variable for AUT's requisite memory.
- time_header: ds.l 1 ; Points to address of header which is to precede
- ; the reported AUT processing time.
- memory_header: ds.l 1 ; Points to address of header which is to precede
- ; the reported AUT requisite memory.
- program_end: ds.l 0
- end
-
-
- Comprehension of program 34 will be facilitated with
- reference to a disassembly listing of the program as it
- resides in memory. Figure 7.2 is a portion of that listing.
- A section of the reserved space has been deleted in order to
- reduce the size of the listing. This listing shows the code
- for the trap before any program has invoked it.
-
- Figure 7.2. Disassembly listing of TRAP_9P.PRG as it resides
- in memory before an initial invocation.
-
- TRAP_9P.PRG Before First Invocation
-
- 01FA36 99CB SUBA.L A3,A4
- 01FA38 45FA04B4 LEA $1FEEE(PC),A2
- 01FA3C 2488 MOVE.L A0,(A2)
- 01FA3E 45FA04B2 LEA $1FEF2(PC),A2
- 01FA42 2489 MOVE.L A1,(A2)
- 01FA44 41FA049E LEA $1FEE4(PC),A0
- 01FA48 208F MOVE.L A7,(A0)
- 01FA4A 4E6F MOVE.L USP,A7
- 01FA4C 4A15 TST.B (A5)
- 01FA4E 6706 BEQ.S $1FA56
- 01FA50 204D MOVEA.L A5,A0
- 01FA52 6100047A BSR $1FECE
- 01FA56 207A0496 MOVEA.L $1FEEE(PC),A0
- 01FA5A 61000472 BSR $1FECE
- 01FA5E 4DFA0046 LEA $1FAA6(PC),A6
- 01FA62 360C MOVE.W A4,D3
- 01FA64 41FA0486 LEA $1FEEC(PC),A0
- 01FA68 3083 MOVE.W D3,(A0)
- 01FA6A 5343 SUBQ.W #1,D3
- 01FA6C 1CDB MOVE.B (A3)+,(A6)+
- 01FA6E 51CBFFFC DBRA D3,$1FA6C
- 01FA72 41FA0432 LEA $1FEA6(PC),A0
- 01FA76 43FA047E LEA $1FEF6(PC),A1
- 01FA7A 93C8 SUBA.L A0,A1
- 01FA7C 3609 MOVE.W A1,D3
- 01FA7E 5943 SUBQ.W #4,D3
- 01FA80 2248 MOVEA.L A0,A1
- 01FA82 93CE SUBA.L A6,A1
- 01FA84 2A4E MOVEA.L A6,A5
- 01FA86 30280002 MOVE.W 2(A0),D0
- 01FA8A 2CD8 MOVE.L (A0)+,(A6)+
- 01FA8C D049 ADD.W A1,D0
- 01FA8E 3B400002 MOVE.W D0,2(A5)
- 01FA92 5343 SUBQ.W #1,D3
- 01FA94 1CD8 MOVE.B (A0)+,(A6)+
- 01FA96 51CBFFFC DBRA D3,$1FA94
- 01FA9A 5347 SUBQ.W #1,D7
- 01FA9C 41FA044A LEA $1FEE8(PC),A0
- 01FAA0 91C9 SUBA.L A1,A0
- 01FAA2 20B804BA MOVE.L $4BA,(A0)
- 01FAA6 00000000 ORI.B #0,D0
- 01FAAA 00000000 ORI.B #0,D0
-
- NOTE: Section of reserved space omitted
- to reduce size of listing.
-
- 01FE9E 00000000 ORI.B #0,D0
- 01FEA2 00000000 ORI.B #0,D0
- 01FEA6 51CFFBFE DBRA D7,$1FAA6
- 01FEAA 203804BA MOVE.L $4BA,D0
- 01FEAE 90BA0038 SUB.L $1FEE8(PC),D0
- 01FEB2 4E4A TRAP #$A
- 01FEB4 207A003C MOVEA.L $1FEF2(PC),A0
- 01FEB8 6114 BSR.S $1FECE
- 01FEBA 323A0030 MOVE.W $1FEEC(PC),D1
- 01FEBE 4E44 TRAP #4
- 01FEC0 610C BSR.S $1FECE
- 01FEC2 41FA0016 LEA $1FEDA(PC),A0
- 01FEC6 6106 BSR.S $1FECE
- 01FEC8 2E7A001A MOVEA.L $1FEE4(PC),A7
- 01FECC 4E73 RTE
- 01FECE 4850 PEA (A0)
- 01FED0 3F3C0009 MOVE.W #9,-(A7)
- 01FED4 4E41 TRAP #1
- 01FED6 5C8F ADDQ.L #6,A7
- 01FED8 4E75 RTS
- 01FEDA 2020 MOVE.L -(A0),D0
- 01FEDC 6279 BHI.S $1FF57
- 01FEDE 7465 MOVEQ #$65,D2
- 01FEE0 730D DC.W $730D
- 01FEE2 0A00FF0A EORI.B #$A,D0
- 01FEE6 00000000 ORI.B #0,D0
- 01FEEA 00000000 ORI.B #0,D0
- 01FEEE 00000000 ORI.B #0,D0
- 01FEF2 00000000 ORI.B #0,D0
-
-
- Analysis of TRAP_9P
-
- The analysis of program 34 begins at the label
- algorithm_start in the source listing, at address 01FAA6 in
- the disassembly listing. The label reserved_space in the
- source listing is superfluous, serving only to indicate that
- the 1024 bytes of defined space is sandwiched between the
- two labels algorithm_start and algorithm_end, which locate
- the boundaries of the algorithm under test after it has been
- installed. Note that neither the data nor the bss assembler
- pseudo-ops are used before the ds.b 1024 pseudo-op. If the
- data and bss pseudo-ops are used in conjunction with the
- ds.b pseudo-op to define the reserved space, AssemPro will
- locate the 1024 bytes of defined space in the program's data
- section; therefore, the space reserved would not be between
- the labels that mark the boundaries of the AUT. That
- relocation of the reserved space is not necessarily
- prejudicial to the structure of an adaptive algorithm if it
- is considered during the design of the algorithm; but it is
- important to know the precise effect of every instruction
- and pseudo-op when dealing with self-modifying algorithms.
-
- Moving the DBRA Instruction
-
- Refer to the dbra instruction at address 01FEA6 in the
- disassembly listing. The disassembler displays the entire
- instruction in a format that is pleasing to the eye, DBRA
- D7, $1FAA6, because it enables us to readily discern the
- value that will be loaded into the program counter when the
- branch is taken. But this convenience tends to obscure the
- fact that the instruction exists in memory as 51CFFBFE,
- which is not as pretty, but which more accurately portrays
- the activity involved in assembling and executing the dbra
- instruction. To wit: the assembler calculates a signed
- displacement between the dbra instruction and a label and
- stores this displacement in the extension word which follows
- the instruction word. In this particular case, the
- displacement is $FBFE = -1026 (decimal).
- The important thing to realize is that the displacement
- is fixed during assembly and will not change when the dbra
- instruction is copied into the reserved space, following the
- insertion of the AUT. But the distance between address
- 01FAA6 and the dbra instruction will be less then than it is
- now; therefore, something must be done to alter the value in
- the dbra extension word after the instruction has been
- copied into the reserved space.
- Within the source listing, following the two
- instructions after the move_loop label, there is a note
- concerning the movement of the dbra instruction into the
- reserved space. It is the necessity for the displacement
- correction which requires that the dbra move be accomplished
- in a single transaction, apart from the movement of the
- other code comprising the movable sections. The source
- listing documentation which accompanies the instructions
- following the note explains the steps involved in
- calculating a new displacement value, but since some of
- those instructions are specific to the movement of the code
- following the dbra instruction, I will repeat the steps here
- so that cognition may be confirmed.
-
- 1. Calculate the total number of bytes of code which
- must be copied into the reserved space after the AUT
- has been installed. This value is the number of
- bytes existing between the label algorithm_end and
- the end of the program. It is calculated in register
- A1, then copied into register D3, which is used as a
- loop counter when all but the dbra instruction
- portion of the movable sections are copied into the
- reserved space.
-
- 2. Because the dbra instruction will be moved apart
- from the other instruction in the section, its
- requisite memory (4 bytes) must be subtracted from
- the counter, D3, before the bulk copy takes place.
-
- 3. Calculate the displacement between the dbra
- instruction's current location and the location at
- which it is to be copied. The dbra instruction's
- current location is marked by the label
- algorithm_end, which is still stored in register A0.
- The value in A0 must be preserved for future
- calculations, therefore, it is copied into A1. The
- value in register A6 is the location to which the
- dbra instruction must be moved. Subtracting A6 from
- A1 puts the number of bytes that the dbra
- instruction must be moved into A1. The extension
- word correction to follow requires that the value in
- A6 be preserved, so it is copied into register A5.
-
- 4. Fetch the current dbra displacement value. In the
- source listing, you can see that this value is
- stored in register D0.
-
- 5. Move (or copy) the dbra instruction into its new
- location.
-
- 6. Correct the dbra instruction's displacement value.
- The correction must be a positive value because the
- distance between the branch instruction and the
- branch location has decreased. Register A1 contains
- the previously calculated correction, which is
- simply the number of bytes that the dbra instruction
- was moved. In the source listing, you can see that
- the positive value in A1 is added to the negative
- value in D0 to obtain the new displacement value.
- Then the contents of D0 are written to the dbra
- extension word's new location at 2(A5) with a self-
- modifying instruction.
-
- The effect of the above steps can be seen in figure
- 7.3, a disassembly listing of program 34 after a single
- instruction AUT, MULU #5,D0, has been installed in the
- reserved space, at address 01FAA6. The new dbra
- displacement is shown in the instruction following; it is
- $FFFA = -6 (decimal). Within the listing are several notes,
- shown in boldface, that explain specific instruction groups.
-
- Figure 7.3 Disassembly listing of the adaptive algorithm
- portion of TRAP_9P.PRG as it resides in memory after the
- initial invocation.
-
- TRAP_9P.PRG After First Invocation
-
- 01FA36 99CB SUBA.L A3,A4
- 01FA38 45FA04B4 LEA $1FEEE(PC),A2
- 01FA3C 2488 MOVE.L A0,(A2)
- 01FA3E 45FA04B2 LEA $1FEF2(PC),A2
- 01FA42 2489 MOVE.L A1,(A2)
- 01FA44 41FA049E LEA $1FEE4(PC),A0
- 01FA48 208F MOVE.L A7,(A0)
- 01FA4A 4E6F MOVE.L USP,A7
- 01FA4C 4A15 TST.B (A5)
- 01FA4E 6706 BEQ.S $1FA56
- 01FA50 204D MOVEA.L A5,A0
- 01FA52 6100047A BSR $1FECE
- 01FA56 207A0496 MOVEA.L $1FEEE(PC),A0
- 01FA5A 61000472 BSR $1FECE
- 01FA5E 4DFA0046 LEA $1FAA6(PC),A6
- 01FA62 360C MOVE.W A4,D3
- 01FA64 41FA0486 LEA $1FEEC(PC),A0
- 01FA68 3083 MOVE.W D3,(A0)
- 01FA6A 5343 SUBQ.W #1,D3
- 01FA6C 1CDB MOVE.B (A3)+,(A6)+
- 01FA6E 51CBFFFC DBRA D3,$1FA6C
- 01FA72 41FA0432 LEA $1FEA6(PC),A0
- 01FA76 43FA047E LEA $1FEF6(PC),A1
- 01FA7A 93C8 SUBA.L A0,A1
- 01FA7C 3609 MOVE.W A1,D3
- 01FA7E 5943 SUBQ.W #4,D3
- 01FA80 2248 MOVEA.L A0,A1
- 01FA82 93CE SUBA.L A6,A1
- 01FA84 2A4E MOVEA.L A6,A5
- 01FA86 30280002 MOVE.W 2(A0),D0
- 01FA8A 2CD8 MOVE.L (A0)+,(A6)+
- 01FA8C D049 ADD.W A1,D0
- 01FA8E 3B400002 MOVE.W D0,2(A5)
- 01FA92 5343 SUBQ.W #1,D3
- 01FA94 1CD8 MOVE.B (A0)+,(A6)+
- 01FA96 51CBFFFC DBRA D3,$1FA94
- 01FA9A 5347 SUBQ.W #1,D7
- 01FA9C 41FA044A LEA $1FEE8(PC),A0
- 01FAA0 91C9 SUBA.L A1,A0
- 01FAA2 20B804BA MOVE.L $4BA,(A0)
- 01FAA6 C0FC0005 MULU #5,D0 ; AUT
- 01FAAA 51CFFFFA DBRA D7,$1FAA6 ; FFFA = -6
- 01FAAE 203804BA MOVE.L $4BA,D0
- 01FAB2 90BA0038 SUB.L $1FAEC(PC),D0
- 01FAB6 4E4A TRAP #$A
- 01FAB8 207A003C MOVEA.L $1FAF6(PC),A0
- 01FABC 6114 BSR.S $1FAD2
- 01FABE 323A0030 MOVE.W $1FAF0(PC),D1
- 01FAC2 4E44 TRAP #4
- 01FAC4 610C BSR.S $1FAD2
- 01FAC6 41FA0016 LEA $1FADE(PC),A0
- 01FACA 6106 BSR.S $1FAD2
- 01FACC 2E7A001A MOVEA.L $1FAE8(PC),A7
- 01FAD0 4E73 RTE
-
- Below is the print_string subroutine.
-
- 01FAD2 4850 PEA (A0)
- 01FAD4 3F3C0009 MOVE.W #9,-(A7)
- 01FAD8 4E41 TRAP #1
- 01FADA 5C8F ADDQ.L #6,A7
- 01FADC 4E75 RTS
-
- The data section appears below.
-
- 01FADE 2020 MOVE.L -(A0),D0 ;' '
- 01FAE0 6279 BHI.S $1FB5B ;by
- 01FAE2 7465 MOVEQ #$65,D2 ;te
- 01FAE4 730D DC.W $730D ;s
-
- _SSP => $01FAE8
-
- 01FAE6 0A000003 EORI.B #3,D0
- 01FAEA EF3A ROL.B D7,D2
-
- start_time (new address) = 01FAEC
-
- 01FAEC 00000000 ORI.B #0,D0
-
- memory => 01FAF0
- time_header => 01FAF2
-
- 01FAF0 00040006 ORI.B #6,D4
-
- memory_header => 01FAF6
-
- 01FAF4 DBB90006DC1F ADD.L D5,$6DC1F
- 01FAFA 00000000 ORI.B #0,D0
- 01FAFE 00000000 ORI.B #0,D0
-
- NOTE: Section of reserved space omitted
- to reduce size of listing.
-
- 01FE9E 00000000 ORI.B #0,D0
- 01FEA2 00000000 ORI.B #0,D0
-
- Below is the original movable block. It has not
- been effected by the move.
-
- 01FEA6 51CFFBFE DBRA D7,$1FAA6
- 01FEAA 203804BA MOVE.L $4BA,D0
- 01FEAE 90BA0038 SUB.L $1FEE8(PC),D0
- 01FEB2 4E4A TRAP #$A
- 01FEB4 207A003C MOVEA.L $1FEF2(PC),A0
- 01FEB8 6114 BSR.S $1FECE
- 01FEBA 323A0030 MOVE.W $1FEEC(PC),D1
- 01FEBE 4E44 TRAP #4
- 01FEC0 610C BSR.S $1FECE
- 01FEC2 41FA0016 LEA $1FEDA(PC),A0
- 01FEC6 6106 BSR.S $1FECE
- 01FEC8 2E7A001A MOVEA.L $1FEE4(PC),A7
- 01FECC 4E73 RTE
- 01FECE 4850 PEA (A0)
- 01FED0 3F3C0009 MOVE.W #9,-(A7)
- 01FED4 4E41 TRAP #1
- 01FED6 5C8F ADDQ.L #6,A7
- 01FED8 4E75 RTS
- 01FEDA 2020 MOVE.L -(A0),D0
- 01FEDC 6279 BHI.S $1FF57
- 01FEDE 7465 MOVEQ #$65,D2
- 01FEE0 730D DC.W $730D
- 01FEE2 0A000003 EORI.B #3,D0
- 01FEE6 EF3A ROL.B D7,D2
-
- start_time (old address) = 01FEE8
-
- 01FEE8 00000000 ORI.B #0,D0
- 01FEEC 00040006 ORI.B #6,D4
- 01FEF0 DBB90006DC1F ADD.L D5,$6DC1F
-
-
- The Bulk Move
-
- The rest of the code in the movable section can be
- moved as a block. This transaction takes place at the
- _move_loop label in the source listing, at address 01FA94 in
- the disassembly listings. Note that immediately after the
- block move, the address of the declared variable start_time
- is loaded into register A0, and that, before the time is
- fetched, the contents of register A1 are subtracted from
- those of A0 to correct the address in register A0. These
- transactions take place at addresses 01FA9C and 01FAA0 of
- the disassembly listings.
-
- Why The Address Of start_time Must BE Corrected
-
- From time to time, we must remind ourselves that pc-
- relative addressing can be used discriminatingly, by
- including the (pc) notation in the source operand, or it can
- be forced by the PC-relative assembly mode. Now, although
- it may not be obvious from discussions in the references,
- when pc-relative addressing is used in the source operand of
- the LEA instruction, it, like the DBRA instruction, performs
- its transaction with a displacement stored in its extension
- word. This is evident from the machine code at address
- 01FA9C, where the extension word can be seen to be $044A =
- 1098 (decimal). To confirm that this is indeed the
- displacement between the LEA instruction and the label
- start_time we need only subtract their addresses:
-
- $01FEE8 - $01FA9C = $44C
-
- and from this subtract the size of the LEA instruction:
-
- $44C - 2 bytes = $44A.
-
- Now, to decide whether or not a correction to this
- displacement is necessary, we need only determine whether or
- not the displacement between the two addresses is altered by
- the bulk move. Well, of course it does. The instruction
- and the label will be closer after the move, and, since the
- label will be moved the same distance as was the dbra
- instruction, the same correction factor, which is still in
- A1, must be applied. The displacement in the LEA
- instruction's extension word must be corrected at this point
- in the program because the location in which the start time
- is stored will be accessed to compute the AUT's execution
- time after the algorithm_start loop has been executed. You
- can see that transaction taking place at address 01FAB2 of
- figure 7.3. There you can also see that the new
- displacement is $0038, and that start_time's new address is
- 01FAEC.
- Program 35 is a source listing of the program designed
- to confirm the rudimentary reliability of the adaptive
- algorithm in program 34. The performance data for the AUTs
- in program 35 should (and does) match that obtained with
- program 32 because the algorithms were copied from that
- program. The data generated by program 35 follows its
- listing.
-
- Program 35. This program is used to verify that program 34
- functions correctly.
-
- ; Program Name: TRAP9TST.S
- ; Version: 1.001
-
- ; Assembly Instructions:
-
- ; Assemble in PC-relative mode and save with a TOS extension.
-
- ; Execution Instructions:
-
- ; Execute from the desktop; or execute SPAWN.TTP, type TRAP9TST.TOS on
- ; its command line and view this program's output in TRAP9TST.DAT.
-
- ; Program Function:
-
- ; Measures the speed of multiplication algorithms. Verifies that
- ; TRAP_9P functions correctly.
-
- calculate_program_size:
- lea -$102(pc), a1 ; Fetch basepage start address.
- lea program_end, a0 ; Fetch program end = array address.
- trap #6 ; Return unused memory to op system.
- lea stack, a7
-
- the_mulu_instruction:
- lea header_1, a0
- lea header_4, a1
- lea mulu_algorithm_start, a3
- lea mulu_algorithm_end, a4
- lea heading, a5
- move.w #50000, d7
-
- invoke_trap_9:
- trap #9
-
- addition:
- lea header_2, a0
- lea header_5, a1
- lea addition_algorithm_start, a3
- lea addition_algorithm_end, a4
- lea heading, a5
- move.b #0, (a5) ; Store a NULL in first byte to create a
- move.w #50000, d7 ; null string so that heading gets printed
- trap #9 ; only once.
-
- shift_and_add:
- lea header_3, a0
- lea header_6, a1
- lea shift_and_add_algorithm_start, a3
- lea shift_and_add_algorithm_end, a4
- lea heading, a5
- move.w #50000, d7
- trap #9
-
- terminate:
- trap #8
-
- mulu_algorithm_start:
- mulu #5, d0 ; Instruction in the loop.
- mulu_algorithm_end:
-
- addition_algorithm_start:
- move.l d0, d2 ; To add one.
- add.l d0, d0 ; To double to two.
- add.l d0, d0 ; To double to four.
- add.l d2, d0 ; To complete multiplication by 5.
- addition_algorithm_end:
-
- shift_and_add_algorithm_start:
- move.l d0, d2 ; Save a copy to add.
- asl.l #2, d0 ; Shift to multiply by 4.
- add.l d2, d0 ; To complete multiplication by 5.
- shift_and_add_algorithm_end:
-
- data
- heading: dc.b "TRAP9TST Execution Results",$D,$A,$D,$A,0
- header_1: dc.b " MULU time: ",0
- header_2: dc.b " Addition time: ",0
- header_3: dc.b " Shift and add time: ",0
- header_4: dc.b " MULU requisite memory: ",0
- header_5: dc.b " Addition requisite memory: ",0
- header_6: dc.b " Shift and add requisite memory: ",0
- bss
- align
- ds.l 96
- stack: ds.l 0
- program_end: ds.l 0
- end
-
- TRAP9TST Execution Results
-
- MULU time: 360 milliseconds
- MULU requisite memory: 4 bytes
- Addition time: 260 milliseconds
- Addition requisite memory: 8 bytes
- Shift and add time: 235 milliseconds
- Shift and add requisite memory: 6 bytes
-
-
- A Relocatable Version of the Adaptive Algorithm
-
- The adaptive algorithm may also be installed as a trap
- that has been assembled in AssemPro's Relocatable mode.
- According to definitions previously established, program 36
- is probably more appropriately designated as a Combo version
- of program 34, because pc-relative addressing is used as
- circumstances permit. The choice of assembly modes for LSR
- programs is a bit more engaging than it is for other types
- of programs because the length of time that the program is
- resident is so much longer than the load time. It may be
- that one decides to accept the longer load time as a
- condition of receiving some other benefit via Relocatable
- assembly, or it may be that circumstances dictate the
- assembly mode.
- From the beginning of the book, my emphasis has been on
- the quest for speed. To me it seems that all other
- desirable software attributes precipitate naturally from the
- quest for this single feature. Consider the reason we use
- computers in the first place. Were speed not the most
- compelling issue, computers would probably not have been
- invented. That's why, in formulating my thoughts for the
- current discussion, I anticipated that the relative speed of
- the PC-relative assembled adaptive algorithm versus that of
- the Relocatable assembled algorithm would be the most
- obvious primary topic. But, as it turns out, my attention
- was drawn to a more compelling consideration: that of the
- AUT's addressing and assembly modes.
- As I mentioned in program 34's documentation, at the
- time that I designed that program, I could not predict the
- requirements of all possible AUTs, but I did guess that a
- Relocatable version of the adaptive algorithm might be
- desirable, or required, for some invoking algorithms.
- Simultaneously, I guessed that there would be times when one
- or the other assembly modes might be preferable for the AUT.
- It was while I was comparing the first few lines of the
- disassembly listing for the Relocatable version of the
- custom trap to that of the PC-relative version that I was
- struck with the realization that the assembly mode chosen
- for the AUT would at times be dictated by the instructions
- comprising that algorithm. Then, if instructions in the AUT
- were to store data in locations being referenced via a
- displacement, that data could be written into the adaptive
- algorithm's program area, thereby disruptively modifying the
- adaptive algorithm. To illustrate this phenomenon, I have
- chosen a suitable AUT, but first, let me present the source
- file and disassembly listing for program 36.
-
- Program 36. The adaptive algorithm designed for Relocatable
- Assembly. Execution results follow the listing.
-
- ; Program Name: TRAP_9R.S
- ; Version 1.005
-
- ; Assembly Instructions:
-
- ; Assemble in Relocatable mode and save with a PRG extension. TRAP_9P.S
- ; is a version of this program that can be assembled in PC-relative mode.
- ; I have prepared two versions because I can't predict the precise requirements
- ; of all possible algorithms which might invoke the custom trap that is
- ; installed by this program.
-
- ; Program Function:
-
- ; This is a LSR program that establishes a user defined trap #9. It may
- ; be executed from the desktop, but you may prefer to copy it to the AUTO
- ; folder of your boot partition or floppy disk so that it will execute
- ; automatically during boot.
-
- program_start: ; Calculate program size and retain result.
- lea program_end(pc), a3 ; Fetch program end address.
- suba.l 4(a7), a3 ; Subtract basepage address.
-
- enter_supervisor_mode:
- move.l #0, -(sp) ; The zero turns on supervisor mode.
- move.w #$20, -(sp) ; Function = super = GEMDOS $20.
- trap #1 ; Go to supervisor mode.
- addq.l #6, sp ; Supervisor stack pointer (SSP) returned in D0.
- movea.l d0, a5 ; Save SSP in scratch register.
-
- install_trap_9_routine: ; Store trap address at pointer address.
- move.l #trap_9_routine, $A4
-
- enter_user_mode:
- pea (a5) ; Restore supervisor stack pointer.
- move.w #$20, -(sp) ; Function = super = GEMDOS $20.
- trap #1
- addq.l #6, sp
-
- relinquish_processor_control: ; Maintain memory residency.
- move.w #0, -(sp) ; Exit code.
- move.l a3, -(sp) ; Program size.
- move.w #$31, -(sp) ; Function = ptermres = GEMDOS $31.
- trap #1
-
- trap_9_routine:
-
- ; See documentation in program TRAP_9P.S.
-
- calculate_length_of_AUT:
- suba.l a3, a4 ; Subtract start address from end address.
-
- save_heading_addresses:
- move.l a0, time_header ; Header to precede reported execution time.
- move.l a1, memory_header ; Header to precede reported requisite memory.
-
- transfer_stack_pointer:
-
- ; See documentation in program TRAP_9P.S.
-
- move.l a7, _SSP ; Save system stack pointer.
- move USP, a7 ; Transfer to invoking program's stack.
-
- print_heading:
-
- ; See documentation in program TRAP_9P.S.
-
- tst.b (a5) ; NULL string test.
- beq.s print_time_header ; Branch if null.
- movea.l a5, a0 ; Print heading.
- bsr print_string
-
- print_time_header:
- movea.l time_header(pc), a0 ; Fetch address of time header.
- bsr print_string
-
- build_algorithm:
- lea reserved(pc), a6 ; Fetch address of reserved space.
- move.w a4, d3 ; AUT length initializes counter for copy.
- move.w d3, memory ; AUT length stored for report.
- subq.w #1, d3 ; Initialize counter for loop execution.
- move_loop: ; Move AUT, byte by byte, to the area
- move.b (a3)+, (a6)+ ; declared as "reserved".
- dbra d3, move_loop ; Branch until D3 = -1.
-
- ; See documentation in program TRAP_9P.S.
-
- lea aut_end(pc), a0 ; Calculate number of bytes in this program
- lea program_end(pc), a1 ; which must be moved into "reserved space",
- suba.l a0, a1 ; then initialize counter D3 with the number
- move.w a1, d3 ; of bytes to copy.
- subq.w #4, d3 ; Subtract the DBRA requisite memory.
- movea.l a0, a1 ; Calculate amount the DBRA instruction must
- suba.l a6, a1 ; be moved and save the new location in scratch
- movea.l a6, a5 ; register for displacement correction.
- move.w 2(a0), d0 ; Fetch current DBRA displacement value.
- move.l (a0)+, (a6)+ ; Move the DBRA instruction.
- add.w a1, d0 ; Correct DBRA displacement to the value it
- move.w d0, 2(a5) ; should be after the move.
- subq.w #1, d3 ; Initialize counter for bulk copy.
- _move_loop: ; Move rest of algorithm in a bulk copy.
- move.b (a0)+, (a6)+
- dbra d3, _move_loop
- subq.w #1, d7 ; Initialize D7 for AUT loop execution.
- lea start_time(pc), a0 ; Fetch address of declared variable.
-
- ; NOTE: No correction is necessary for the start time variable, as it was
- ; in the PC-relative assembled program TRAP_9P, if pc-relative
- ; addressing is not used for the label in the instruction below that
- ; subtracts the contents of start_time from d0. That instruction is
- ; marked by asterisks below. In the TRAP_9P program AssemPro used
- ; pc-relative addressing for that label, as in does for all labels
- ; when a program is assembled in PC-relative mode, that's why the
- ; correction for that label was necessary in that program.
-
- ; In the instruction above, pc-relative addressing can be used because
- ; that instruction is not relocated by the bulk move.
-
- ; Now, the address that is used in the instruction below is not the
- ; relocated address of the variable; the original location of the
- ; variable is used. When you look at the disassembly listing for this
- ; program, you will see that the extension longword (not word) contains
- ; the actual address of the variable, not a displacement.
-
- ; Of course, in the instruction above, the extension word does contain
- ; a displacement, but this displacement remains valid because the
- ; instruction is not moved.
-
- ; NOTE: Since processor is in supervisor mode, there is no reason to use
- ; trap #3 to fetch time. The 200hz vector can be referenced directly.
-
- move.l $4BA, (a0) ; Fetch and store start time.
- aut_start:
- reserved: ds.b 1024 ; Space for loop construction = 1024 bytes.
- aut_end:
- dbra d7, aut_start
- move.l $4BA, d0 ; Fetch end time.
-
- ;****
- sub.l start_time, d0 ; Subtract start time from end time.
- ;****
-
- trap #10 ; Convert to decimal milliseconds and print.
- print_memory_header:
- movea.l memory_header(pc), a0
- bsr.s print_string
- print_requisite_memory:
- move.w memory(pc), d1 ; Fetch AUT's requisite memory.
- trap #4 ; Returns address of decimal string in A0.
- bsr.s print_string
- lea header_1(pc), a0 ; Print requisite memory units label.
- bsr.s print_string
-
- ; See documentation in program TRAP_9P.S.
-
- movea.l _SSP, a7 ; Restore system stack pointer.
- rte
-
- ;
- ; SUBROUTINES
- ;
-
- print_string:
- pea (a0)
- move.w #9, -(sp)
- trap #1
- addq.l #6, sp
- rts
-
- data
- header_1: dc.b ' bytes',$D,$A,0
- bss
- align
- _SSP: ds.l 1 ; Address of the system stack will be stored here.
- start_time: ds.l 1 ; Variable for time at start of AUT processing.
- memory: ds.w 1 ; Variable for AUT's requisite memory.
- time_header: ds.l 1 ; Points to address of header which is to precede
- ; the reported AUT processing time.
- memory_header: ds.l 1 ; Points to address of header which is to precede
- ; the reported AUT requisite memory.
- program_end: ds.l 0
- end
-
-
- TRAP9TST Execution Results With TRAP_9R.PRG Resident.
-
- These results match those obtained with the PC-relative
- assembled adaptive algorithm.
-
- MULU time: 360 milliseconds
- MULU requisite memory: 4 bytes
- Addition time: 255 milliseconds
- Addition requisite memory: 8 bytes
- Shift and add time: 230 milliseconds
- Shift and add requisite memory: 6 bytes
-
-
- Program 34 is sufficiently similar to program 36 so
- that most of the documentation for the first serves to
- document the second; nevertheless, I still want to point out
- a few important differences between the programs. Under the
- label save_heading_addresses, near the beginning of the
- trap_9_routine, note that the addresses are stored directly
- to the labels in program 36 using absolute addressing, as
- opposed to the indirect addressing method used in program
- 34. The absolute addressing mode is also used to store the
- system stack pointer. But, as you can see, wherever
- possible, pc-relative addressing is indicated in most of the
- instructions that contain labels in the source operand.
- That mode is indicated with the (pc) notation.
- There is one particular instruction in which I
- deliberately avoided pc-relative addressing so that I could
- illustrate a specific advantage of using the absolute
- addressing mode in the Relocatable assembled adaptive
- algorithm. That is the instruction which subtracts the
- start time from the end time:
-
- sub.l start_time, d0.
-
- The instruction has been made conspicuous by the ;**** group
- being placed before and after it. Using absolute addressing
- in this instruction obviates the inconvenience of a
- correction to access the start_time variable, as was done in
- program 34.
-
- Figure 7.4. Disassembly listing of TRAP_9R.PRG as it resides
- in memory after the initial invocation of the adaptive
- algorithm.
-
- TRAP_9R.PRG After First Invocation
-
- 01F9FE 47FA04FC LEA $1FEFC(PC),A3
- 01FA02 97EF0004 SUBA.L 4(A7),A3
- 01FA06 2F3C00000000 MOVE.L #0,-(A7)
- 01FA0C 3F3C0020 MOVE.W #$20,-(A7)
- 01FA10 4E41 TRAP #1
- 01FA12 5C8F ADDQ.L #6,A7
- 01FA14 2A40 MOVEA.L D0,A5
-
- Note that absolute addressing is used in both operands
- of the instruction that stores the adaptive algorithm's
- address in the trap #9 vector.
-
- 01FA16 23FC0001FA360000 MOVE.L #$1FA36,$A4
- 00A4
- 01FA20 4855 PEA (A5)
- 01FA22 3F3C0020 MOVE.W #$20,-(A7)
- 01FA26 4E41 TRAP #1
- 01FA28 5C8F ADDQ.L #6,A7
- 01FA2A 3F3C0000 MOVE.W #0,-(A7)
- 01FA2E 2F0B MOVE.L A3,-(A7)
- 01FA30 3F3C0031 MOVE.W #$31,-(A7)
- 01FA34 4E41 TRAP #1
-
- trap_9 routine starts here
-
- 01FA36 99CB SUBA.L A3,A4
- 01FA38 23C80001FEF4 MOVE.L A0,$1FEF4
- 01FA3E 23C90001FEF8 MOVE.L A1,$1FEF8
- 01FA44 23CF0001FEEA MOVE.L A7,$1FEEA
- 01FA4A 4E6F MOVE.L USP,A7
- 01FA4C 4A15 TST.B (A5)
- 01FA4E 6706 BEQ.S $1FA56
- 01FA50 204D MOVEA.L A5,A0
- 01FA52 61000480 BSR $1FED4
- 01FA56 207A049C MOVEA.L $1FEF4(PC),A0
- 01FA5A 61000478 BSR $1FED4
- 01FA5E 4DFA0046 LEA $1FAA6(PC),A6
- 01FA62 360C MOVE.W A4,D3
- 01FA64 33C30001FEF2 MOVE.W D3,$1FEF2
- 01FA6A 5343 SUBQ.W #1,D3
- 01FA6C 1CDB MOVE.B (A3)+,(A6)+
- 01FA6E 51CBFFFC DBRA D3,$1FA6C
- 01FA72 41FA0432 LEA $1FEA6(PC),A0
- 01FA76 43FA0484 LEA $1FEFC(PC),A1
- 01FA7A 93C8 SUBA.L A0,A1
- 01FA7C 3609 MOVE.W A1,D3
- 01FA7E 5943 SUBQ.W #4,D3
- 01FA80 2248 MOVEA.L A0,A1
- 01FA82 93CE SUBA.L A6,A1
- 01FA84 2A4E MOVEA.L A6,A5
- 01FA86 30280002 MOVE.W 2(A0),D0
- 01FA8A 2CD8 MOVE.L (A0)+,(A6)+
- 01FA8C D049 ADD.W A1,D0
- 01FA8E 3B400002 MOVE.W D0,2(A5)
- 01FA92 5343 SUBQ.W #1,D3
- 01FA94 1CD8 MOVE.B (A0)+,(A6)+
- 01FA96 51CBFFFC DBRA D3,$1FA94
- 01FA9A 5347 SUBQ.W #1,D7
- 01FA9C 41FA0450 LEA $1FEEE(PC),A0
- 01FAA0 20B9000004BA MOVE.L $4BA,(A0)
- 01FAA6 C0FC0005 MULU #5,D0
- 01FAAA 51CFFFFA DBRA D7,$1FAA6
- 01FAAE 2039000004BA MOVE.L $4BA,D0
- 01FAB4 90B90001FEEE SUB.L $1FEEE,D0
- 01FABA 4E4A TRAP #$A
- 01FABC 207A003E MOVEA.L $1FAFC(PC),A0
- 01FAC0 6116 BSR.S $1FAD8
- 01FAC2 323A0032 MOVE.W $1FAF6(PC),D1
- 01FAC6 4E44 TRAP #4
- 01FAC8 610E BSR.S $1FAD8
- 01FACA 41FA0018 LEA $1FAE4(PC),A0
- 01FACE 6108 BSR.S $1FAD8
- 01FAD0 2E790001FEEA MOVEA.L $1FEEA,A7
- 01FAD6 4E73 RTE
-
- print_string subroutine
-
- 01FAD8 4850 PEA (A0)
- 01FADA 3F3C0009 MOVE.W #9,-(A7)
- 01FADE 4E41 TRAP #1
- 01FAE0 5C8F ADDQ.L #6,A7
- 01FAE2 4E75 RTS
- 01FAE4 2020 MOVE.L -(A0),D0
- 01FAE6 6279 BHI.S $1FB61
- 01FAE8 7465 MOVEQ #$65,D2
- 01FAEA 730D DC.W $730D
- 01FAEC 0A000003 EORI.B #3,D0
- 01FAF0 EF40 ASL.W #7,D0
- 01FAF2 00000000 ORI.B #0,D0
- 01FAF6 00040006 ORI.B #6,D4
- 01FAFA DBBF DC.W $DBBF
- 01FAFC 0006DC25 ORI.B #$25,D6
- 01FB00 00000000 ORI.B #0,D0
- 01FB04 00000000 ORI.B #0,D0
-
- 01FE9C 00000000 ORI.B #0,D0
- 01FEA0 00000000 ORI.B #0,D0
-
- The movable section. Starting address is $01FEA6
-
- 01FEA4 000051CF ORI.B #-$31,D0
- 01FEA8 FBFE DC.W $FBFE
- 01FEAA 2039000004BA MOVE.L $4BA,D0
- 01FEB0 90B90001FEEE SUB.L $1FEEE,D0 ; sub.l start_time, d0
- 01FEB6 4E4A TRAP #$A
- 01FEB8 207A003E MOVEA.L $1FEF8(PC),A0
- 01FEBC 6116 BSR.S $1FED4
- 01FEBE 323A0032 MOVE.W $1FEF2(PC),D1
- 01FEC2 4E44 TRAP #4
- 01FEC4 610E BSR.S $1FED4
- 01FEC6 41FA0018 LEA $1FEE0(PC),A0
- 01FECA 6108 BSR.S $1FED4
- 01FECC 2E790001FEEA MOVEA.L $1FEEA,A7
- 01FED2 4E73 RTE
- 01FED4 4850 PEA (A0)
- 01FED6 3F3C0009 MOVE.W #9,-(A7)
- 01FEDA 4E41 TRAP #1
- 01FEDC 5C8F ADDQ.L #6,A7
- 01FEDE 4E75 RTS
- 01FEE0 2020 MOVE.L -(A0),D0
- 01FEE2 6279 BHI.S $1FF5D
- 01FEE4 7465 MOVEQ #$65,D2
- 01FEE6 730D DC.W $730D
- 01FEE8 0A000003 EORI.B #3,D0
- 01FEEC EF40 ASL.W #7,D0
- 01FEEE 00000000 ORI.B #0,D0
- 01FEF2 00040006 ORI.B #6,D4
- 01FEF6 DBBF DC.W $DBBF
- 01FEF8 0006DC25 ORI.B #$25,D6
-
-
- I don't know if I've mentioned it before, but even if I
- have, it is worth pointing out again that the addresses and
- the instructions contained therin are not displayed in
- disassembly listings as neat as one might like. If you
- refer to the disassembly listing just above this paragraph
- you can see that the address for the variable _SSP, 01FEEA,
- is not displayed. To locate the contents of that address,
- it is necessary to locate the nearest address that is
- displayed. In this case, that is address 01FEE8. There you
- see that the byte stored in that address is the ASCII code
- for a linefeed. Then, displayed on the same line, at
- address 01FEE9 is the 00 byte inserted by the align pseudo-
- op. Finally, also displayed on the same line, at address
- 01FEEA, is the first byte of the value stored in _SSP. That
- byte is 00. The next byte of _SSP, at address 01FEEB, is 03
- and is displayed on that same line. To locate the other two
- bytes stored in _SSP one must look at the next line, which
- is address 01FEEC.
- The value stored in the variable start_time is neatly
- displayed on the next line, at address 01FEEE, but on the
- line following, at address 01FEF2, the first word displayed
- is the contents of the variable memory, while the second
- word shown on that line is the first word of the value
- stored in time_header (address = 01FEF4). The second word
- of time-header is displayed on the line labeled 01FEF6. The
- last address, 01FEF8, is the variable memory_header. After
- a while, you get used to all of this, but I wanted to take
- the opportunity to mention something that is trivial once
- you know the "secret", but which can be a source of
- consternation to one who is not accoustomed to reading the
- listings.
- To the extent that the major difference between the PC-
- relative version and the Relocatable version of the custom
- trap is the addressing mode used in particular instructions,
- it seems reasonable to question the advantages, or
- disadvantages, of each addressing mode. One advantage of
- absolute addressing has already been discussed: it relieves
- one of the address correction burden in some instances.
- Naturally, there is the 16 bit displacement limitation
- attached to pc-relative addressing, but I find this to be
- rather insignificant for two reasons. I don't write many
- assembly language programs in which instructions and labels
- are 32,767 bytes apart, and when I do, I can find ways to
- circumvent that limitation.
- In maintaining the theme of the book, however, it seems
- prudent to compare the execution speed and requisite memory
- for instructions using the two addressing modes when that
- 16-bit limitation of pc-relative addressing is neglected.
- Program 37 was designed to yield the appropriate comparative
- data for this discussion, and, in addition, to serve as a
- vehicle to illustrate possible consequences of the
- disruptive modification phenomenon previously mentioned.
-
- Program 37. A program that compares the execution speed and
- requisite memory of two algorithms that write data to a
- location.
-
- ; Program Name: LEA_MOVE.S
- ; Version: 1.003
-
- ; Assembly Instructions:
-
- ; Assemble in Relocatabe mode and save with a TOS extension.
-
- ; Program Function:
-
- ; Compares the relative speed and memory requirements of
-
- ; lea label(pc), Am
- ; move.l An, (Am)
-
- ; and lea label, Am
- ; move.l An, (Am)
-
- ; to move.l An, label
-
- ; The execution time reported is for 50,000 executions of each algorithm.
-
- ; In addition, this program's first AUT can disrupt the adaptive algorithm
- ; because the address that is loaded in register A2 depends on the displacement
- ; between the instruction "lea label(pc), a2" and the label it references.
-
- ; When that instruction is executed in the adaptive algorithm, whatever
- ; address is the "displacement" distance away from the instruction will be
- ; loaded into register A2. Then the instruction following, "move.l a0, (a2)
- ; will write whatever is in register A0 into that memory location.
-
- ; If the address stored in register A2 while the AUT is being executed by
- ; the adaptive algorithm happens to be within the adaptive algorithm's program
- ; or data space, the results could be unpleasant.
-
- ; In fact, an undisciplined store of this type is capable of corrupting any
- ; instruction or data that happens to be separated from the LEA instruction by
- ; an amount equal to the displacement, and the results could be catastrophic.
-
- calculate_program_size:
- lea -$102(pc), a1 ; Fetch basepage start address.
- lea program_end, a0 ; Fetch program end = array address.
- trap #6 ; Return unused memory to op system.
- lea stack, a7
-
- initialize_registers_1:
- lea header_1, a0
- lea header_2, a1
- lea lea_move_start, a3
- lea lea_move_end, a4
- lea heading, a5
- move.w #50000, d7
- trap #9
-
- initialize_registers_2:
- lea header_3, a0
- lea header_4, a1
- lea long_lea_start, a3
- lea long_lea_end, a4
- lea heading, a5
- move.b #0, (a5) ; Store a NULL in first byte to create a
- move.w #50000, d7 ; null string so that heading gets printed
- trap #9 ; only once.
-
- initialize_registers_3:
- lea header_5, a0
- lea header_6, a1
- lea move_start, a3
- lea move_end, a4
- lea heading, a5
- move.w #50000, d7
- trap #9
-
- terminate:
- trap #8
-
- lea_move_start: ; This algorithm is potentially nocuous to
- lea label(pc), a2 ; the adaptive algorithm because a displacement
- move.l a0, (a2) ; will be stored in A2, not an address.
- lea_move_end:
-
- long_lea_start: ; This algorithm poses no threat because an
- lea label, a2 ; address in this program's data area will be
- move.l a0, (a2) ; stored in A2.
- long_lea_end:
-
- move_start: ; This algorithm poses no threat because an
- move.l a0, label ; address in this program's data area will be
- move_end: ; stored in A2.
-
- data
- heading: dc.b "LEA_MOVE Program Results",$D,$A,$D,$A,0
- header_1: dc.b " Elapsed time for lea label(pc), Am ",$D,$A
- dc.b " move.l An, (Am): ",0
- header_2: dc.b " Memory required for first algorithm: ",0
- header_3: dc.b $D,$A," Elapsed time for lea label, Am ",$D,$A
- dc.b " move.l An, (Am): ",0
- header_4: dc.b " Memory required for second algorithm: ",0
- header_5: dc.b $D,$A," Elapsed time for move.l An, label: ",0
- header_6: dc.b " Memory required for third algorithm: ",0
- bss
- align
- label: ds.l 1
- ds.l 96
- stack: ds.l 0
- program_end: ds.l 0
- end
-
-
- Before proceeding with the discussion initiated above,
- I want to show you the alterations that I made to the
- TRAPS.S program so that the adaptive algorithm would be
- installed automatically during boot. First, I renamed the
- trap installation program from TRAPS.S to CUSTOM.S, then I
- inserted the TRAP_9P.S program into that source file before
- assembling the entire conglomeration as CUSTOM.PRG. I don't
- want to repeat the entire CUSTOM.S source file, therefore, I
- have listed only the relevant portions in figure 7.5.
-
- Figure 7.5. Alterations to TRAPS.S to convert it to
- CUSTOM.S. The listing below is the contents of a file
- called CUSTOM.DOC. The listing shows only the changes that
- must be made to TRAPS.S to convert it to CUSTOM.S. Of
- course, the alterations to TRAPS.S can be made by copying
- the contents of CUSTOM.DOC to that file using a suitable
- editor, such as TEMPUS or 1ST WORD PLUS.
-
-
- ; Program Name: CUSTOM.S
- ; Version 1.006
-
- ; Algorithms for trap #9, the intelligent algorithm, and trap #10 are included,
- ; along with the other custom traps from the TRAPS.S program.
-
- ; Assembly Instructions:
-
- ; Assemble in PC-relative mode and save with a PRG extension.
-
- install_trap_9_routine: ; Note: pointer = vector = pointer.
- lea trap_9_routine, a0 ; Fetch address of trap #9 routine.
- move.l a0, $A4 ; Store custom trap address in pointer.
-
- trap_9_routine:
-
- ; NOTE: You may want to execute a program that invokes this trap from the
- ; AssemPro debugger and single step through the trap instructions, or
- ; you many want to set breakpoints and use the "Run program" button to
- ; execute to a breakpoint. It the exercise of this latter option that
- ; can cause problems. If you want to set a breakpoint at an instruction
- ; that occurs before the "reserved space", no precautions need be taken,
- ; except that you must realize that you could halt at a position after
- ; the start time has been initialized, and the execution time will be
- ; accumulating during the halt.
-
- ; But, if you want to set a breakpoint at at an instruction that occurs
- ; after the "reserved space", you should not do so until the instructions
- ; that perform the bulk move have been executed. If you set a breakpoint
- ; at an instruction in the program space which follows the "reserved
- ; space" and then execute the bulk move instructions, the breakpoint you
- ; have set will cause an illegal command error.
-
- ; This custom trap must be invoked by a program which contains one or more
- ; algorithms for which performance data, in the form of execution time and
- ; requisite memory, is desired. The trap must be invoked once for each
- ; algorithm under test (AUT). The trap is intended to be a performance
- ; report generator so that performance data for specific algorithms can be
- ; compared.
-
- ; If the program invoking the trap is spawned by SPAWN.TTP, the performance
- ; data will be printed to a file that is created by SPAWN.TTP. The name of
- ; the file will be the name of the spawned program with a DAT suffix.
-
- ; If the program invoking the trap is executed from the desktop, the
- ; performance data will be printed to the screen.
-
- ; The AUT can be assembled in PC-relative or Relocatable mode.
-
- ; The custom trap expects the following parameters to be contained in the
- ; specified registers prior to invocation:
-
- ; A0 = The address of a null terminated header to precede the reported
- ; AUT execution time.
-
- ; A1 = The address of a null terminated header to precede the reported
- ; AUT requisite memory.
-
- ; A3 = The starting address of an algorith containing one or more instructions.
- ; This algorithm becomes the AUT upon invocation of the trap.
-
- ; A4 = The ending address of the AUT.
-
- ; A5 = At the initial invocation by a specific program, A5 is expected to
- ; contain the address of a null terminated heading that is a brief
- ; description of the data that is to be reported by the trap. During
- ; all subsequent invocations by that program, A5 must contain the
- ; address of a null string.
-
- ; D7 = A word length decimal value which specifies the number of times the
- ; the AUT is to be executed in a loop in order to generate the execution
- ; time data. Register D7 is the only register which cannot be used by
- ; the AUT. That's because D7 is used as the AUT execution loop counter.
-
- ; The custom trap uses the information passed in A3, A4 and D7 to construct
- ; a loop around the AUT. The loop is executed the number of times specified
- ; in register D7. The time required to execute the loop is calculated and
- ; printed along with the AUT's requisite memory.
-
- ; How Trap #9 Works:
-
- ; The custom trap is an intelligent, self-modifying algorithm. Prior
- ; to an initial invocation, the trap #9 routine is composed of three sections.
- ; The first section contains instructions which are permanently located. The
- ; second section is a 1,024 byte block of reserved space into which the AUT
- ; and the third section is to be copied. The third section contains the custom
- ; trap instructions that must be copied so that the first instruction in the
- ; third section immediately follows the last instruction of the algorithm
- ; under test. The size that I chose for the reserved space was arbitrary.
-
- ; After an invocation, the custom trap calculates the number of bytes in
- ; the algorithm under test. Then it initializes register D7 for the AUT's
- ; execution loop. After saving the addresses of the headers that were passed
- ; as parameters so that the registers in which they were passed can be used
- ; by trap #9 and/or the algorithm under test, the trap prints the AUT's
- ; primary heading and the execution time header. Then it copies the AUT into
- ; the reserved space.
-
- ; After the AUT has been copied into the reserved space, the instructions
- ; and data in the third section of the custom trap are moved into immediate
- ; proximity of the AUT so that the first instruction of section three follows
- ; immediately after the last instruction of the algorithm under test.
-
- ; The AUT is executed the number of times requested via the value passed
- ; in D7, then the execution time and requisite memory of the AUT are printed.
-
- ; After the custom trap has been invoked once, the reserved space will
- ; not be completely clear during the remaining portion of the power-up cycle.
- ; When the trap is subsequently invoked, the instructions that are already in
- ; the reserved space will simply be overwritten by the instructions of the
- ; new algorithm under test and those of the third section.
-
- calculate_length_of_AUT:
- suba.l a3, a4 ; Subtract start address from end address.
-
- save_heading_addresses:
- lea time_header, a2 ; Header to precede reported execution time.
- move.l a0, (a2)
- lea memory_header, a2 ; Header to precede reported requisite memory.
- move.l a1, (a2)
-
- transfer_stack_pointer:
-
- ; Upon entrance into this custom trap, A7 contains the address of the system
- ; stack pointer. Because the system stack may be too short for use by the AUT,
- ; the invoking program must provide a user stack, and, here, I am going to
- ; store the address of that stack in A7. Before returning to the invoking
- ; program, the system stack pointer must be restored so that the return address
- ; can be found by the processor.
-
- lea _SSP, a0 ; Fetch address of declared variable.
- move.l a7, (a0) ; Save system stack pointer.
- move USP, a7 ; Transfer to invoking program's stack.
-
- print_heading:
-
- ; NOTE: When two or more algorithms are to invoke the trap from a program,
- ; the heading should be printed only once, at the beginning of the
- ; report; therefore, the string address in A5 should point to a
- ; null terminated string during the first invocation of the trap, but
- ; it should point to a null string on subsequent invocations from the
- ; same program.
-
- ; For that reason, the test for NULL string is included below, so that
- ; only one attempt to print the heading will be made.
-
- tst.b (a5) ; NULL string test.
- beq.s print_time_header ; Branch if null.
- movea.l a5, a0 ; Print heading.
- bsr print_string_9
-
- print_time_header:
- movea.l time_header, a0 ; Fetch address of time header.
- bsr print_string_9
-
- build_algorithm:
- lea reserved_space, a6 ; Fetch address of reserved space.
- move.w a4, d3 ; AUT length initializes counter for copy.
- lea memory, a0 ; Store AUT's requisite memory.
- move.w d3, (a0) ; AUT length stored for report.
- subq.w #1, d3 ; Initialize counter for loop execution.
- move_loop: ; Move AUT, byte by byte, to the area
- move.b (a3)+, (a6)+ ; declared as "reserved_space".
- dbra d3, move_loop ; Branch until D3 = -1.
-
- ; A6 is now pointing to the location at which the AUT execution loop's DBRA
- ; instruction must be copied. The DBRA copy must be processed apart from the
- ; other instructions in the custom trap which must be copied to "close up" the
- ; gap between the last statement of the algorithm under test and the rest
- ; of the custom trap's instructions.
-
- lea algorithm_end, a0 ; Calculate number of bytes in this program
- lea trap_9_end, a1 ; which must be copied into "reserved_space",
- suba.l a0, a1 ; then initialize counter D3 with the number
- move.w a1, d3 ; of bytes to copy.
- subq.w #4, d3 ; Subtract the DBRA requisite memory.
- movea.l a0, a1 ; Calculate amount the DBRA instruction must
- suba.l a6, a1 ; be moved and save the new location in scratch
- movea.l a6, a5 ; register for displacement correction.
- move.w 2(a0), d0 ; Fetch current DBRA displacement value.
- move.l (a0)+, (a6)+ ; Move the DBRA instruction.
- add.w a1, d0 ; Correct DBRA displacement to the value it
- move.w d0, 2(a5) ; should be after the move.
- subq.w #1, d3 ; Initialize counter for bulk copy.
- _move_loop: ; Move rest of algorithm in a bulk copy.
- move.b (a0)+, (a6)+
- dbra d3, _move_loop
- subq.w #1, d7 ; Initialize D7 for AUT loop execution.
- lea start_time, a0 ; Fetch address of declared variable.
- suba.l a1, a0 ; Correct address to after-move location.
-
- ; NOTE: Since processor is in supervisor mode, there is no reason to use
- ; trap #3 to fetch time. The 200hz vector can be referenced directly.
-
- move.l $4BA, (a0) ; Fetch and store start time.
- algorithm_start:
- reserved_space: ds.b 1024 ; Space for loop construction = 1024 bytes.
- algorithm_end:
- dbra d7, algorithm_start
- move.l $4BA, d0 ; Fetch end time.
- sub.l start_time, d0 ; Subtract start time from end time.
- trap #10 ; Convert to decimal milliseconds and print.
- print_memory_header:
- movea.l memory_header, a0 ; Fetch address of memory header.
- bsr print_string_9
- print_requisite_memory:
- move.w memory, d1 ; Fetch AUT's requisite memory.
- trap #4 ; Returns address of decimal string in A0.
- bsr print_string_9
- lea header_1, a0 ; Print requisite memory units label.
- bsr print_string_9
-
- ; NOTE: I had thought that I might have to restore the user stack pointer
- ; before returning to the calling program because I can't know to what
- ; extent the AUT has corrupted the user stack.
-
- ; But, as it turns out, since the value in USP is stored in register
- ; A7 at the beginning of the program, and since the system stack pointer
- ; is then active and remains so for the duration of the trap invocation,
- ; only the value in register A7 (which is the system stack pointer, but
- ; which points to the user stack) is altered; the value in USP remains
- ; uncorrupted, therefore, nothing needs be restored but the system stack
- ; pointer (so that it again points to the system stack).
-
- ; On subsequent invocations the value in USP, which remains stable, will
- ; simply be restored in A7, as often as the trap is invoked.
-
- movea.l _SSP, a7 ; Restore system stack pointer so that it
- rte ; points to the system stack.
-
- trap_9_data_and_subroutine:
-
- ; NOTE: Trap #9 must have its own print_string subroutine because the
- ; displacements in the bsr print_string instructions in trap #9 are
- ; not altered by the move of this section into the reserved space.
-
- ; Therefore, the displacement between the instructions and the
- ; print_string subroutine must remained constant. The only way to
- ; maintain the displacement (without corrections) is to move the
- ; subroutine along with the rest of this section.
-
- ; Then what happens is this: the bsr print_string instructions located
- ; above the reserved space cause a branch to the pre-move location of the
- ; subroutine; those instructions located below the reserved space cause
- ; a branch to the after-move location of the subroutine. If you say all
- ; of this to yourself very quickly, it is less confusing. I think.
-
- print_string_9: ; Expects address of string to be in A0.
- pea (a0) ; Push address of string onto stack.
- move.w #9, -(sp) ; Function = c_conws = GEMDOS $9.
- trap #1 ; GEMDOS call
- addq.l #6, sp ; Reset stack pointer to top of stack.
- rts
-
- header_1: dc.b ' bytes',$D,$A,0
- align
- _SSP: ds.l 1 ; Address of the system stack will be stored here.
- start_time: ds.l 1 ; Variable for time at start of AUT processing.
- memory: ds.w 1 ; Variable for AUT's requisite memory.
- time_header: ds.l 1 ; Points to address of header which is to precede
- ; the reported AUT processing time.
- memory_header: ds.l 1 ; Points to address of header which is to precede
- ; the reported AUT requisite memory.
- trap_9_end: ds.l 1 ; Marks end of trap #9 section.
-
- ; This statement and the items in boldface below are not part of the trap_9
- ; routine. They are there to provide orientation so that the appropriate
- ; sections of this document may be copied into the new CUSTOM.S document.
-
- trap_10_routine:
-
- print_string: ; Expects address of string to be in A0.
- pea (a0) ; Push address of string onto stack.
- move.w #9, -(sp) ; Function = c_conws = GEMDOS $9.
- trap #1 ; GEMDOS call
- addq.l #6, sp ; Reset stack pointer to top of stack.
- rts
-
- data
- subtrahend: dc.l $3B9ACA00,$5F5E100,$989680,$F4240,$186A0,$2710
- dc.l $3E8,$64,$A,$1,$0
- hex_table: dc.b '0123456789ABCDEF'
- spawned: dc.b $0
- space: dc.b " ",0
-
- units_label: dc.b " milliseconds", $D,$A,0
-
- bss
- align ; Align storage on a word boundary.
- after_load_time: ds.w 1 ; Spawned program's after load time.
- decimal: ds.l 3 ; Output buffer. Must be NULL terminated.
- hexadecimal: ds.l 3 ; Output buffer. Must be NULL terminated.
-
- program_end: ds.l 0
- end
-
-
- The execution results of program 37 are shown in figure
- 7.6. The data illustrates that there is no speed nor
- requisite memory difference between the first and third
- algorithms. The addressing modes used in those algorithms
- is pc-relative for the first and absolute for the third.
- The first algorithm may seem to be hampered by the extra
- instruction needed to indirectly store the data in An, but
- it is only when pc-relative addressing cannot be used with
- the LEA instruction that the extra instruction becomes a
- handicap. This is evident from the data for the second
- algorithm, where the execution time and requisite memory are
- greater than that for the other two algorithms. But, in
- general, the algorithms being considered will be those with
- the appearance of algorithms one and two. Clearly, the
- execution speed and requisite memory of those are identical.
-
- Figure 7.6. Execution results of program 37 with the newly
- created CUSTOM.PRG resident.
-
- LEA_MOVE Program Results
-
- Elapsed time for lea label(pc), Am
- move.l An, (Am): 205 milliseconds
- Memory required for first algorithm: 6 bytes
-
- Elapsed time for lea label, Am
- move.l An, (Am): 235 milliseconds
- Memory required for second algorithm: 8 bytes
-
- Elapsed time for move.l An, label: 205 milliseconds
- Memory required for third algorithm: 6 bytes
-
-
- A partial disassembly of the adaptive algorithm during
- invocation, but before AUT loop execution, by each of
- program 37's three AUT's is shown in figure 7.7. Although
- the addresses here are different than those shown in
- previous disassembly listings, because the custom trap was
- installed by CUSTOM.PRG during boot, it is easy to see that
- the displacement in the extension word of the first AUT's
- LEA instruction causes an address to be loaded into A2 that
- is not located in the invoking program, but is located
- within the custom trap's space. No problem occurs because
- it just so happens that the address $17D54 falls within the
- unused reserved space. But the folly of depending on this
- kind of luck should be apparent. This problem does not
- occur with the second and third AUTs because relative
- addressing is not used in those instructions; therefore the
- address loaded into A2 during invocation by the second and
- third AUTs is located within the invoking program's address
- space.
-
- Figure 7.7. A partial disassembly of the adaptive algorithm
- during invocation by program 37's three AUTs.
-
- LEA_MOVE's 1st AUT in CUSTOM's adaptive
-
- 017BCA 45FA0188 LEA $17D54(PC),A2
- 017BCE 2488 MOVE.L A0,(A2)
- 017BD0 51CFFFF8 DBRA D7,$17BCA
-
- LEA_MOVE's 2nd AUT in CUSTOM's adaptive
-
- 017BCA 45F90006D730 LEA $6D730,A2
- 017BD0 2488 MOVE.L A0,(A2)
- 017BD2 51CFFFF6 DBRA D7,$17BCA
-
- LEA_MOVE's 3rd AUT in CUSTOM's adaptive
-
- 017BCA 23C80006D730 MOVE.L A0,$6D730
- 017BD0 51CFFFF8 DBRA D7,$17BCA
-
-
- I must not leave you with the impression that the
- problem with the address being loaded in the first AUT has
- much to do with the fact that it is not located within the
- boundaries of the invoking program. No, it is because the
- address it does load is within the boundaries of the
- adaptive algorithm and that address is going to be corrupted
- by the second statement of the first algorithm. The purpose
- of the three AUT's is to compare the execution speed and
- requisite memory of three algorithms which store data in an
- address that is specified by a label. As far as the test is
- concerned, the location of the label can be anyplace in
- memory.
- Of course, I would not leave you to suffer this
- dilemma. I have prepared another version of program 37 and
- another version of the adaptive algorithm to illustrate a
- method of dealing with this problem of referencing labels in
- an AUT's data area from within the execution loop located in
- the adaptive algorithm. Program 38 is the program that will
- exercise the new adaptive algorithm. AUT_DATA's execution
- results are shown in figure 7.8.
-
- Program 38. In this program, data areas are declared for
- each AUT, and the variables label_1, label_2 and label_3 are
- the labels for those data areas.
-
- ; Program Name: AUT_DATA.S
- ; Version: 1.001
-
- ; Assembly Instructions:
-
- ; Assemble in Relocatabe mode and save with a TOS extension.
-
- ; Program Function:
-
- ; Compares the relative speed and memory requirements of
-
- ; lea label(pc), Am
- ; move.l An, (Am)
-
- ; and lea label, Am
- ; move.l An, (Am)
-
- ; to move.l An, label
-
- ; The execution time reported is for 50,000 executions of each algorithm.
-
- ; In this program, a data area is declared before the AUT's so that the
- ; first AUT can't disrupt the adaptive algorithm. The address of the data
- ; area is passed to the adaptive algorithm in register A2.
-
- ; A branch statement is included in each AUT so that the adaptive algorithm
- ; will jump over the data area.
-
- ; Of course, this calls for a redesign of the adaptive algorithm.
-
- calculate_program_size:
- lea -$102(pc), a1 ; Fetch basepage start address.
- lea program_end, a0 ; Fetch program end = array address.
- trap #6 ; Return unused memory to op system.
- lea stack, a7
-
- initialize_registers_1:
- lea header_1, a0
- lea header_2, a1
- lea lea_data_area, a2
- lea lea_move_start, a3
- lea lea_move_end, a4
- lea heading, a5
- move.w #50000, d7
- trap #9
-
- initialize_registers_2:
- lea header_3, a0
- lea header_4, a1
- lea long_data_area, a2
- lea long_lea_start, a3
- lea long_lea_end, a4
- lea heading, a5
- move.b #0, (a5) ; Store a NULL in first byte to create a
- move.w #50000, d7 ; null string so that heading gets printed
- trap #9 ; only once.
-
- initialize_registers_3:
- lea header_5, a0
- lea header_6, a1
- lea move_data_area, a2
- lea move_start, a3
- lea move_end, a4
- lea heading, a5
- move.w #50000, d7
- trap #9
-
- terminate:
- trap #8
-
- lea_data_area:
- bra.s lea_move_start
- label_1: ds.l 1
- lea_move_start:
- lea label_1(pc), a2
- move.l a0, (a2)
- lea_move_end:
-
- long_data_area:
- bra.s long_lea_start
- label_2: ds.l 1
- long_lea_start:
- lea label_2, a2
- move.l a0, (a2)
- long_lea_end:
-
- move_data_area:
- bra.s move_start
- label_3: ds.l 1
- move_start:
- move.l a0, label_3
- move_end:
-
- data
- heading: dc.b "AUT_DATA Program Results",$D,$A,$D,$A,0
- header_1: dc.b " Elapsed time for lea label(pc), Am ",$D,$A
- dc.b " move.l An, (Am): ",0
- header_2: dc.b " Memory required for first algorithm: ",0
- header_3: dc.b $D,$A," Elapsed time for lea label, Am ",$D,$A
- dc.b " move.l An, (Am): ",0
- header_4: dc.b " Memory required for second algorithm: ",0
- header_5: dc.b $D,$A," Elapsed time for move.l An, label: ",0
- header_6: dc.b " Memory required for third algorithm: ",0
- bss
- align
- ds.l 96
- stack: ds.l 0
- program_end: ds.l 0
- end
-
-
- Program 39. A new version of the program that installs the
- adaptive algorithm. This version is designed to process
- AUT's in which an area of memory has been set aside to be
- used as a data area. Note that the pseudo-op data is not
- used.
-
- ; Program Name: TRAP9DAT.S
- ; Version 1.002
-
- ; Assembly Instructions:
-
- ; Assemble in PC-relative mode and save with a PRG extension.
-
- ; Program Function:
-
- ; This is a LSR program that establishes a user defined trap #9. It may
- ; be executed from the desktop, but you may prefer to copy it to the AUTO
- ; folder of your boot partition or floppy disk so that it will execute
- ; automatically during boot.
-
- ; This is a special, single-purpose trap that is used to illustrate a
- ; method of preventing corruption of the adaptive algorithm.
-
- program_start: ; Calculate program size and retain result.
- lea program_end, a3 ; Fetch program end address.
- suba.l 4(a7), a3 ; Subtract basepage address.
-
- enter_supervisor_mode:
- move.l #0, -(sp) ; The zero turns on supervisor mode.
- move.w #$20, -(sp) ; Function = super = GEMDOS $20.
- trap #1 ; Supervisor stack pointer (SSP) returned in D0.
- addq.l #6, sp
- movea.l d0, a5 ; Save SSP in scratch register.
-
- install_trap_9_routine: ; Note: pointer = vector = pointer.
- lea trap_9_routine, a0 ; Fetch address of trap #9 routine.
- move.l a0, $A4 ; Store trap address at pointer address.
-
- enter_user_mode:
- pea (a5) ; Restore supervisor stack pointer.
- move.w #$20, -(sp) ; Function = super = GEMDOS $20.
- trap #1
- addq.l #6, sp
-
- relinquish_processor_control: ; Maintain memory residency.
- move.w #0, -(sp) ; Exit code.
- move.l a3, -(sp) ; Program size.
- move.w #$31, -(sp) ; Function = ptermres = GEMDOS $31.
- trap #1
-
- trap_9_routine:
-
- ; The custom trap expects the following parameters to be contained in the
- ; specified registers prior to invocation:
-
- ; A0 = The address of a null terminated header to precede the reported
- ; AUT execution time.
-
- ; A1 = The address of a null terminated header to precede the reported
- ; AUT requisite memory.
-
- ; A2 = The address of a data area for the algorithm to be tested. The ending
- ; address for the data area must be the address stored in register A3;
- ; that is, it must be the starting address for the algorithm.
-
- ; A3 = The starting address of an algorithm containing one or more instructions.
- ; This algorithm becomes the AUT upon invocation of the trap.
-
- ; A4 = The ending address of the AUT.
-
- ; A5 = At the initial invocation by a specific program, A5 is expected to
- ; contain the address of a null terminated heading that is a brief
- ; description of the data that is to be reported by the trap. During
- ; all subsequent invocations by that program, A5 must contain the
- ; address of a null string.
-
- ; D7 = A word length decimal value which specifies the number of times the
- ; the AUT is to be executed in a loop in order to generate the execution
- ; time data. Register D7 is the only register which cannot be used by
- ; the AUT. That's because D7 is used as the AUT execution loop counter.
-
- calculate_length_of_AUT_plus_data_area:
- suba.l a2, a4 ; Subtract data start address from AUT start
- ; address.
- calculate_length_of_AUT_data_area:
- suba.l a2, a3 ; Subtract AUT start address from AUT end
- ; address.
- save_heading_addresses:
- lea time_header, a6 ; Header to precede reported execution time.
- move.l a0, (a6)
- lea memory_header, a6 ; Header to precede reported requisite memory.
- move.l a1, (a6)
- lea heading, a6 ; Null terminated heading or NULL string.
- move.l a5, (a6)
-
- transfer_AUT_start_address: ; Because contents of A2 will be altered soon.
- movea.l a2, a5
-
- transfer_stack_pointer:
- lea _SSP, a0 ; Fetch address of declared variable.
- move.l a7, (a0) ; Save system stack pointer.
- move USP, a7 ; Transfer to invoking program's stack.
-
- print_heading:
- movea.l heading, a0 ; Fetch heading address.
- tst.b (a0) ; NULL string test.
- beq.s print_time_header ; Branch if null.
- bsr print_string
-
- print_time_header:
- movea.l time_header, a0 ; Fetch address of time header.
- bsr print_string
-
- move_AUT:
- lea reserved_space, a6 ; Fetch address of reserved space.
- move.w a4, d3 ; AUT length initializes counter for copy.
- lea memory, a0 ; Store AUT's requisite memory.
- move.w d3, d0 ; Copy total AUT length to D0 for correction.
- sub.w a3, d0 ; Subtract length of data area.
- move.w d0, (a0) ; AUT length stored for report.
- subq.w #1, d3 ; Initialize counter for loop execution.
- move_loop: ; Move data area and AUT, byte by byte, to the
- move.b (a5)+, (a6)+ ; area declared as "reserved_space".
- dbra d3, move_loop ; Branch until D3 = -1.
-
- move_adaptive_algorithm:
- lea algorithm_end, a0 ; Calculate number of bytes in this program
- lea program_end, a1 ; which must be copied into "reserved_space",
- suba.l a0, a1 ; then initialize counter D3 with the number
- move.w a1, d3 ; of bytes to copy.
- subq.w #4, d3 ; Subtract the DBRA requisite memory.
- movea.l a0, a1 ; Calculate amount the DBRA instruction must
- suba.l a6, a1 ; be moved and save the new location in scratch
- movea.l a6, a5 ; register for displacement correction.
- move.w 2(a0), d0 ; Fetch current DBRA displacement value.
- move.l (a0)+, (a6)+ ; Move the DBRA instruction.
-
- ; In this version of the adaptive algorithm, when the DBRA displacement is
- ; corrected, the size of the AUT's data space must be added to the
- ; correction so that the branch will go to the start of the AUT's executable
- ; statements, not to the start of the data space.
-
- add.w a3, d0 ; Add length of AUT's data space.
- add.w a1, d0 ; Correct DBRA displacement to the value it
- move.w d0, 2(a5) ; should be after the move.
- subq.w #1, d3 ; Initialize counter for bulk copy.
- _move_loop: ; Move rest of algorithm in a bulk copy.
- move.b (a0)+, (a6)+
- dbra d3, _move_loop
- subq.w #1, d7 ; Initialize D7 for AUT loop execution.
- lea start_time, a0 ; Fetch address of declared variable.
- suba.l a1, a0 ; Correct address to after-move location.
-
- ; NOTE: Since processor is in supervisor mode, there is no reason to use
- ; trap #3 to fetch time. The 200hz vector can be referenced directly.
-
- move.l $4BA, (a0) ; Fetch and store start time.
- algorithm_start:
- reserved_space: ds.b 1024 ; Space for loop construction = 1024 bytes.
- algorithm_end:
- dbra d7, algorithm_start
- move.l $4BA, d0 ; Fetch end time.
- sub.l start_time, d0 ; Subtract start time from end time.
- trap #10 ; Convert to decimal milliseconds and print.
- print_memory_header:
- movea.l memory_header, a0 ; Fetch address of memory header.
- bsr.s print_string
- print_requisite_memory:
- move.w memory, d1 ; Fetch AUT's requisite memory.
- trap #4 ; Returns address of decimal string in A0.
- bsr.s print_string
- lea header_1, a0 ; Print requisite memory units label.
- bsr.s print_string
- movea.l _SSP, a7 ; Restore system stack pointer so that it
- rte ; points to the system stack.
-
- ;
- ; SUBROUTINES
- ;
-
- print_string:
- pea (a0)
- move.w #9, -(sp)
- trap #1
- addq.l #6, sp
- rts
-
- data
- header_1: dc.b ' bytes',$D,$A,0
- bss
- align
- _SSP: ds.l 1 ; Address of the system stack will be stored here.
- start_time: ds.l 1 ; Variable for time at start of AUT processing.
- memory: ds.w 1 ; Variable for AUT's requisite memory.
- time_header: ds.l 1 ; Points to address of header which is to precede
- ; the reported AUT processing time.
- memory_header: ds.l 1 ; Points to address of header which is to precede
- ; the reported AUT requisite memory.
- heading: ds.l 1 ; Null terminated heading or NULL string.
- program_end: ds.l 0
- end
-
-
- Figure 7.8. AUT_DATA's execution results.
-
- AUT_DATA Program Results
-
- Elapsed time for lea label(pc), Am
- move.l An, (Am): 205 milliseconds
- Memory required for first algorithm: 6 bytes
-
- Elapsed time for lea label, Am
- move.l An, (Am): 230 milliseconds
- Memory required for second algorithm: 8 bytes
-
- Elapsed time for move.l An, label: 205 milliseconds
- Memory required for third algorithm: 6 bytes
-
-
- Figure 7.9. Disassembly listing. The first section is a
- complete listing of the adaptive algorithm during invocation
- by the first AUT. Only partial listings are shown for the
- invocations by the second and third AUTs. All listings were
- prepared before execution of the adaptive algorithm's
- performance loop.
-
- AUT_DATA's 1st AUT in TRAP9DAT
-
- 01FA36 99CA SUBA.L A2,A4
- 01FA38 97CA SUBA.L A2,A3
- 01FA3A 4DFA04C4 LEA $1FF00(PC),A6
- 01FA3E 2C88 MOVE.L A0,(A6)
- 01FA40 4DFA04C2 LEA $1FF04(PC),A6
- 01FA44 2C89 MOVE.L A1,(A6)
- 01FA46 4DFA04C0 LEA $1FF08(PC),A6
- 01FA4A 2C8D MOVE.L A5,(A6)
- 01FA4C 2A4A MOVEA.L A2,A5
- 01FA4E 41FA04A6 LEA $1FEF6(PC),A0
- 01FA52 208F MOVE.L A7,(A0)
- 01FA54 4E6F MOVE.L USP,A7
- 01FA56 207A04B0 MOVEA.L $1FF08(PC),A0
- 01FA5A 4A10 TST.B (A0)
- 01FA5C 6704 BEQ.S $1FA62
- 01FA5E 61000480 BSR $1FEE0
- 01FA62 207A049C MOVEA.L $1FF00(PC),A0
- 01FA66 61000478 BSR $1FEE0
- 01FA6A 4DFA004C LEA $1FAB8(PC),A6
- 01FA6E 360C MOVE.W A4,D3
- 01FA70 41FA048C LEA $1FEFE(PC),A0
- 01FA74 3003 MOVE.W D3,D0
- 01FA76 904B SUB.W A3,D0
- 01FA78 3080 MOVE.W D0,(A0)
- 01FA7A 5343 SUBQ.W #1,D3
- 01FA7C 1CDD MOVE.B (A5)+,(A6)+
- 01FA7E 51CBFFFC DBRA D3,$1FA7C
- 01FA82 41FA0434 LEA $1FEB8(PC),A0
- 01FA86 43FA0484 LEA $1FF0C(PC),A1
- 01FA8A 93C8 SUBA.L A0,A1
- 01FA8C 3609 MOVE.W A1,D3
- 01FA8E 5943 SUBQ.W #4,D3
- 01FA90 2248 MOVEA.L A0,A1
- 01FA92 93CE SUBA.L A6,A1
- 01FA94 2A4E MOVEA.L A6,A5
- 01FA96 30280002 MOVE.W 2(A0),D0
- 01FA9A 2CD8 MOVE.L (A0)+,(A6)+
- 01FA9C D04B ADD.W A3,D0
- 01FA9E D049 ADD.W A1,D0
- 01FAA0 3B400002 MOVE.W D0,2(A5)
- 01FAA4 5343 SUBQ.W #1,D3
- 01FAA6 1CD8 MOVE.B (A0)+,(A6)+
- 01FAA8 51CBFFFC DBRA D3,$1FAA6
- 01FAAC 5347 SUBQ.W #1,D7
-
- 01FAAE 41FA044A LEA $1FEFA(PC),A0
- 01FAB2 91C9 SUBA.L A1,A0
- 01FAB4 20B804BA MOVE.L $4BA,(A0)
- 01FAB8 6004 BRA.S $1FABE
- 01FABA 00000000 ORI.B #0,D0
- 01FABE 45FAFFFA LEA $1FABA(PC),A2
- 01FAC2 2488 MOVE.L A0,(A2)
- 01FAC4 51CFFFF8 DBRA D7,$1FABE
- 01FAC8 203804BA MOVE.L $4BA,D0
- 01FACC 90BA0038 SUB.L $1FB06(PC),D0
-
- 01FAD0 4E4A TRAP #$A
- 01FAD2 207A003C MOVEA.L $1FB10(PC),A0
- 01FAD6 6114 BSR.S $1FAEC
- 01FAD8 323A0030 MOVE.W $1FB0A(PC),D1
- 01FADC 4E44 TRAP #4
- 01FADE 610C BSR.S $1FAEC
- 01FAE0 41FA0016 LEA $1FAF8(PC),A0
- 01FAE4 6106 BSR.S $1FAEC
- 01FAE6 2E7A001A MOVEA.L $1FB02(PC),A7
- 01FAEA 4E73 RTE
- 01FAEC 4850 PEA (A0)
- 01FAEE 3F3C0009 MOVE.W #9,-(A7)
- 01FAF2 4E41 TRAP #1
- 01FAF4 5C8F ADDQ.L #6,A7
- 01FAF6 4E75 RTS
- 01FAF8 2020 MOVE.L -(A0),D0
- 01FAFA 6279 BHI.S $1FB75
- 01FAFC 7465 MOVEQ #$65,D2
- 01FAFE 730D DC.W $730D
- 01FB00 0A000004 EORI.B #4,D0
- 01FB04 0EB000000000 BCLR #0,0(A0,D0.W)
- 01FB0A 00060006 ORI.B #6,D6
- 01FB0E FB75 DC.W $FB75
- 01FB10 0006FBC9 ORI.B #-$37,D6
- 01FB14 0006FB58 ORI.B #$58,D6
- 01FB18 040A DC.W $40A
- 01FB1A 00000000 ORI.B #0,D0
- 01FB1E 00000000 ORI.B #0,D0
-
- 01FEAE 00000000 ORI.B #0,D0
- 01FEB2 00000000 ORI.B #0,D0
- 01FEB6 000051CF ORI.B #-$31,D0
- 01FEBA FBFE DC.W $FBFE
- 01FEBC 203804BA MOVE.L $4BA,D0
- 01FEC0 90BA0038 SUB.L $1FEFA(PC),D0
- 01FEC4 4E4A TRAP #$A
- 01FEC6 207A003C MOVEA.L $1FF04(PC),A0
- 01FECA 6114 BSR.S $1FEE0
- 01FECC 323A0030 MOVE.W $1FEFE(PC),D1
- 01FED0 4E44 TRAP #4
- 01FED2 610C BSR.S $1FEE0
- 01FED4 41FA0016 LEA $1FEEC(PC),A0
- 01FED8 6106 BSR.S $1FEE0
- 01FEDA 2E7A001A MOVEA.L $1FEF6(PC),A7
- 01FEDE 4E73 RTE
- 01FEE0 4850 PEA (A0)
- 01FEE2 3F3C0009 MOVE.W #9,-(A7)
- 01FEE6 4E41 TRAP #1
- 01FEE8 5C8F ADDQ.L #6,A7
- 01FEEA 4E75 RTS
- 01FEEC 2020 MOVE.L -(A0),D0
- 01FEEE 6279 BHI.S $1FF69
- 01FEF0 7465 MOVEQ #$65,D2
- 01FEF2 730D DC.W $730D
- 01FEF4 0A000004 EORI.B #4,D0
- 01FEF8 0EB000000000 BCLR #0,0(A0,D0.W)
- 01FEFE 00060006 ORI.B #6,D6
- 01FF02 FB75 DC.W $FB75
- 01FF04 0006FBC9 ORI.B #-$37,D6
- 01FF08 0006FB58 ORI.B #$58,D6
-
- AUT_DATA's 2nd AUT in TRAP9DAT
-
- 01FAAE 41FA044A LEA $1FEFA(PC),A0
- 01FAB2 91C9 SUBA.L A1,A0
- 01FAB4 20B804BA MOVE.L $4BA,(A0)
- 01FAB8 6004 BRA.S $1FABE
- 01FABA 00000000 ORI.B #0,D0
- 01FABE 45F90006FB40 LEA $6FB40,A2
- 01FAC4 2488 MOVE.L A0,(A2)
- 01FAC6 51CFFFF6 DBRA D7,$1FABE
- 01FACA 203804BA MOVE.L $4BA,D0
-
- AUT_DATA's 3rd AUT in TRAP9DAT
-
- 01FAAE 41FA044A LEA $1FEFA(PC),A0
- 01FAB2 91C9 SUBA.L A1,A0
- 01FAB4 20B804BA MOVE.L $4BA,(A0)
- 01FAB8 6004 BRA.S $1FABE
- 01FABA 00000000 ORI.B #0,D0
- 01FABE 23C80006FB4E MOVE.L A0,$6FB4E
- 01FAC4 51CFFFF8 DBRA D7,$1FABE
- 01FAC8 203804BA MOVE.L $4BA,D0
-
-
- In figure 7.9, you can see that only in the performance
- loop for the first AUT does the AUT reference an address
- that is located within the adaptive algorithm's space. The
- instructions in the other two AUT's reference an address
- that is located within the invoking program's space.
- Knowing how the declared data space will be referenced, you
- can choose the addressing mode that forces the reference to
- be performed in the manner most suited to specific
- applications. The most important difference between this
- adaptive algorithm and the previous is that this one permits
- instructions to write to a data area without disrupting the
- adaptive algorithm.
-
- Wrapping It Up
-
- I am going to wrap up this chapter with two more
- comparative programs. Both programs were executed after the
- custom traps were installed with CUSTOM.PRG. Program 40 was
- designed to confirm or refute the statement on page 113 of
- the Stan Kelly-Bootie book which asserts that MOVEA.Z
- #<label>, A0 is equivalent to LEA <label>, A0, and faster.
- Program 40 should be assembled in PC-relative mode, saved
- with a TOS extension; then the name of the file should be
- changed to LEAMOVER in the assempro editor, and that file
- should be assembled in Relocatable mode. Figure 7.10 shows
- the results of LEAMOVEA.TOS; figure 7.11 shows those of
- LEAMOVER.TOS.
-
- Program 40. A program that compares the execution speed and
- requisite memory of two instructions that load an address
- into an address register.
-
- ; Program Name: LEAMOVEA.S
- ; Version: 1.001
-
- ; Assembly Instructions:
-
- ; Assemble with ASSEMPRO and save with a TOS extension.
-
- ; Execution Instructions:
-
- ; Execute from the desktop; or execute SPAWN.TTP, type LEAMOVEA.TOS on
- ; its command line and view this program's output in LEAMOVEA.DAT.
-
- ; Program Function:
-
- ; Confirms (or refutes) the statement on page 113 of the Stan Kelly-
- ; Bootle book, to wit: MOVEA.Z #<label>, A0 is equivalent to LEA, and
- ; faster.
-
- ; For each of the two methods of loading the address of a string into
- ; and address register, calculates the memory occupied by each method, and
- ; calculates the execution time in milliseconds required for 50,000
- ; executions of each.
-
- ; PROGRAM RESULTS:
-
- ; When the program containing the two instructions under discussion
- ; is assembled in AssemPro's Relocatable mode, the instructions are
- ; equivalent in speed and requisite memory.
-
- ; However, when the program is assembled in AssemPro's PC-relative
- ; mode, the LEA instruction is faster and requires less memory. The claim
- ; that the MOVEA.Z instruction is faster than the LEA instruction is,
- ; therefore, soundly refuted.
-
- ; But, when the program is assembled in PC-relative mode, the MOVEA.Z
- ; instruction does not move the correct value into the address register,
- ; so there is really no choice when either pc-relative addressing or
- ; PC-relative assembly is used--LEA must be used then.
-
- calculate_program_size:
- lea -$102(pc), a1 ; Fetch basepage start address.
- lea program_end, a0 ; Fetch program end = array address.
- trap #6 ; Return unused memory to op system.
- lea stack, a7
-
- initialize_registers_1:
- lea header_1, a0
- lea header_2, a1
- lea lea_start, a3
- lea lea_end, a4
- lea heading, a5
- move.w #50000, d7
- trap #9
-
- initialize_registers_2:
- lea header_3, a0
- lea header_4, a1
- lea movea_start, a3
- lea movea_end, a4
- lea heading, a5
- move.b #0, (a5) ; Store a NULL in first byte to create a
- move.w #50000, d7 ; null string so that heading gets printed
- trap #9 ; only once.
-
- terminate:
- trap #8
-
- lea_start:
- lea newline, a2 ; Instruction in the loop.
- lea_end:
-
- movea_start:
- movea.l #newline, a2 ; Instruction in the loop.
- movea_end:
-
- data
- heading: dc.b 'LEAMOVEA Execution Results',$D,$A,$D,$A,0
- header_1: dc.b ' Elapsed time for LEA: ',0
- header_2: dc.b ' Memory required for LEA: ',0
- header_3: dc.b $D,$A,' Elapsed time for MOVEA: ',0
- header_4: dc.b ' Memory required for MOVEA: ',0
- newline: dc.b $D,$A,0
- bss
- align
- ds.l 96
- stack: ds.l 0
- program_end: ds.l 0
- end
-
-
- Figure 7.10. Execution results for LEAMOVEA.TOS.
-
- LEAMOVEA Execution Results
-
- Elapsed time for LEA: 130 milliseconds
- Memory required for LEA: 4 bytes
-
- Elapsed time for MOVEA: 155 milliseconds
- Memory required for MOVEA: 6 bytes
-
-
- Figure 7.11. Execution results for LEAMOVER.TOS.
-
- LEAMOVER Execution Results
-
- Elapsed time for LEA: 155 milliseconds
- Memory required for LEA: 6 bytes
-
- Elapsed time for MOVEA: 155 milliseconds
- Memory required for MOVEA: 6 bytes
-
-
- The data shown in figure 7.11 confirms that the
- execution speed and requisite memory for the two
- instructions are equivalent when the program in which they
- reside is assembled in the Relocatable mode. However, the
- data in figure 7.10 shows that when the program is assembled
- in PC-relative mode, or when pc-relative addressing is used
- in the source operand of the LEA instruction, that
- instruction is faster and uses less memory.
- But there is more to this comparison then is readily
- apparent. Figure 7.12 is a partial disassembly of the
- adaptive algorithm during invocation of the two PC-relative
- AUT's. There you can see the futility and danger of trying
- to use the movea instruction to load an address in a PC-
- relative assembled program. The address that is actually
- loaded is located beyond the space of both the invoking
- program and the adaptive algorithm. Figure 7.13 shows that
- either instruction can be used to load an address that is
- located within the invoking algorithm's space.
-
- Figure 7.12. Partial disassembly of the adaptive algorithm
- during invocation by the two AUT's in LEAMOVEA.TOS.
-
- LEAMOVEA.TOS partial disassembly in trap
-
- LEAMOVEA.TOS was prepared by assembling LEAMOVEA.S in AssemPro's PC-relative
- mode. The address being loaded into A2 by the LEA instruction is located
- within the space of the adaptive algorithm.
-
- 017BCA 45FA009F LEA $17C6B(PC),A2
- 017BCE 51CFFFFA DBRA D7,$17BCA
-
- The address being loaded into A2 by the MOVEA instruction decimal 233 and is
- located beyond the space of the adaptive algorithm and that of the invoking
- program.
-
- 017BCA 247C000000E9 MOVEA.L #$E9,A2
- 017BD0 51CFFFF8 DBRA D7,$17BCA
-
-
- Figure 7.13. Partial disassembly of the adaptive algorithm
- during invocation by the two AUT's in LEAMOVER.TOS.
-
- LEAMOVER.TOS partial disassembly in trap
-
- LEAMOVER.TOS was prepared by assembling LEAMOVEA.S in AssemPro's Relocatable
- mode. The address being loaded into A2 by the LEA instruction is located
- within the space of the invoking algorithm.
-
- 017BCA 45F90006D6AF LEA $6D6AF,A2
- 017BD0 51CFFFF8 DBRA D7,$17BCA
-
- The address being loaded into A2 by the MOVEA instruction is located within
- the space of the invoking algorithm.
-
- 017BCA 247C0006D6AF MOVEA.L #$6D6AF,A2
- 017BD0 51CFFFF8 DBRA D7,$17BCA
-
-
- Based on the data produced and shown in figures 7.10,
- 7.11, 7.12 and 7.13, I conclude that Mr. Kelly-Bootle's
- claim is soundly refuted, and furthermore, when the program
- in which the instructions reside is assembled in PC-relative
- mode, the MOVEA instruction does not move the correct value
- into the address register, so there is really no choice when
- either pc-relative addressing or PC-relative assembly is
- used. In either of those two cases, the LEA instruction
- must be used.
-
- Just One More, I Promise
-
- Program 41 compares pea (a0) to move.l a0, -(sp). I am
- including this program for two reasons. The first reason is
- to confirm that they are equivalent; the second is to give
- you an example of the kind of information that is contained
- in the Thomas P. Skinner book--the kind of information that
- prompted me to recommend it. The results of program 41's
- execution follow the listing. Personally, I think that the
- PEA instruction is the more elegant.
-
- Program 41. Compares two ways of pushing the contents
- of an address register onto the stack.
-
- ; Program Name: PEA_MOVE.S
- ; Version: 1.001
-
- ; Assembly Instructions:
-
- ; Assemble in PC-relative mode and save with a TOS extension.
-
- ; Program Function:
-
- ; Compares the relative speed and memory requirements of "pea (a0)" to
- ; that of "move.l a0, -(sp)". The execution time reported is for
- ; 50,000 executions of each instruction.
-
- ; Execution Note:
-
- ; Invokes traps that are installed by CUSTOM.PRG during boot.
-
- calculate_program_size:
- lea -$102(pc), a1 ; Fetch basepage start address.
- lea program_end, a0 ; Fetch program end = array address.
- adda.l #200512, a0 ; Add in extra large stack space.
- movea.l a0, a7 ; Point A7 to this program's stack.
- trap #6 ; Return unused memory to op system.
-
- initialize_registers_1:
- lea header_1, a0
- lea header_2, a1
- lea pea_algorithm_start, a3
- lea pea_algorithm_end, a4
- lea heading, a5
- move.w #50000, d7
- trap #9
-
- initialize_registers_2:
- lea header_3, a0
- lea header_4, a1
- lea move_algorithm_start, a3
- lea move_algorithm_end, a4
- lea heading, a5
- move.b #0, (a5) ; Store a NULL in first byte to create a
- move.w #50000, d7 ; null string so that heading gets printed
- trap #9 ; only once.
-
- terminate:
- trap #8
-
- pea_algorithm_start:
- pea (a0) ; Instruction in the loop.
- pea_algorithm_end:
-
- move_algorithm_start:
- move.l a0, -(sp) ; Instruction in the loop.
- move_algorithm_end:
-
- data
- heading: dc.b "PEA_MOVE Program Results",$D,$A,$D,$A,0
- header_1: dc.b " Elapsed time for pea (a0): ",0
- header_2: dc.b " Memory required for pea (a0): ",0
- header_3: dc.b $D,$A," Elapsed time for move.l a0, -(sp): ",0
- header_4: dc.b " Memory for move.l a0, -(sp) ",0
- bss
- align
- program_end: ds.l 0
- end
-
- PEA_MOVE Program Results
-
- Elapsed time for pea (a0): 155 milliseconds
- Memory required for pea (a0): 2 bytes
-
- Elapsed time for move.l a0, -(sp): 155 milliseconds
- Memory for move.l a0, -(sp) 2 bytes
-
-
- Conclusion
-
- The thrust of this chapter was twofold. First, I
- wanted to present an algorithm that would permit shorter
- versions of programs designed to obtain performance or
- comparative data. Second, in doing that, I wanted to
- illustrate the inherent power of self-modifying algorithms.
- There are many more things that I could say about such
- algorithms, but I can't allow myself to be drawn into that
- provocative subject--lest I abandon the rest of the book. I
- leave you with this thought. Suppose that you learned to do
- truly powerful things with your personal computer. With
- whom would you share your secrets? Is it possible that you
- might even discourage others from traveling the routes that
- led to your own success?
-
-